| CSC 105 | The Digital Age | Spring 2009 |
Although computers are often considered to work very quickly, only some algorithms proceed rapidly; others take much longer. This laboratory exercise provides an intuitive framework for the consideration of algorithm effectiveness, including the amount of time and memory required for an algorithm. When algorithms are applied to successively larger data sets, some algorithms may scale up nicely while others require more time than may be feasible. Also, when several algorithms are available to solve a problem, it is natural to wonder if one solution is better than another. Altogether, we might use many criteria to evaluate such solutions, including:
Accuracy: Of course, any program should produce correct answers. (If we were satisfied with wrong results, it is trivial to produce many such answers very quickly.) However, it may not be immediately clear just how accurate results should be in a specific instance. For example, one algorithm may be simple to program and may run quickly, but it only may be accurate to 5 decimal places. A second algorithm may be more complex and much slower, but may give 15 place accuracy. If 5-place accuracy is adequate for a specific application, the first algorithm is the better choice. However, if 10 or 12 place accuracy is required, the slower algorithm must be used.
Efficiency: Efficiency can be measured in many ways: programmer time, algorithm execution time, memory used, and so on. If a program is to be used once, then programmer time may be a major consideration, and a simple algorithm might be preferred. If a program is to be used many times, however, then it may be worth spending more development time with a complex algorithm, so the procedure will run very quickly.
Use of Memory: One algorithm may require more computer memory in which to execute. If "space" (i.e., memory) is a scarce resource, then the amount of space an algorithm requires should be taken into consideration when comparing algorithms.
Ease of Modification: It is common practice to modify old programs to solve new problems. A very obscure algorithm that is difficult to understand, therefore, is usually less desirable than one which can be easily read and modified.
For this laboratory exercise, we focus on algorithm execution time.
In determining algorithm execution time, we may proceed in several ways:
Each of these approaches has advantages, but each also has drawbacks. Execution times on a specific machine normally depend upon details of the machine and on the specific data used. Timings may vary from data set to data set and from machine to machine, so experiments from one machine and one data set may not be very helpful in general.
For many purposes, it turns out that a high-level analysis provides adequate information to compare algorithms. For the most part, we follow that approach here.
In class, we have discussed both the linear search and the binary search.
Write a brief description (a few sentences) of how a linear search works.
Write a brief description (a few sentences) of how a binary search works. Be sure to include any assumptions the algorithm makes.
In class, we have discussed how efficient each of these algorithms might be. Now we take a more concrete, experimental approach.
We will search for random numbers in a sequence of even integers:
Technically, the structure holding such a collection of data is called an array. With the array data just described, a search should be successful if we are looking for an even, nonnegative integer that is not too large. The search should fail if the desired integer is negative, odd, or too large. In our experiment, we will pick 60 integers as candidates for the search. We pick the first 30 from within the array, to ensure that the search will be successful. Then we then pick the last 30 from the set of odd integers, so that the search will fail. In each case, we will select numbers at random from the given set.
The program searchTest performs both a linear search and a binary search for randomly-selected items, and records the (approximate) number of comparisons performed by each search algorithm.
The mechanics of running these search experiments are as follows:
cd 105
cp ~weinman/courses/CSC105/labs/searchTest .
./searchTest
Note: Do not use commas when entering numbers.
Run searchTest for a several array sizes between 1000 and 50,000.
Use a spreadsheet to record the results from each program run (i.e., for each array size in your experiment). I suggest you make separate tables to record each of the following:
Do double-check your data entry! Typographical errors can add more spice than is intended to your results.
Now use your spreadsheet to plot the data for each of the four tables you just created. (Make a separate graph for each table.) The horizontal axis should indicate the array size, and the vertical axis should indicate the average number of steps required. You should conduct sufficient experiments, so that a fairly consistent pattern emerges.
Recall that, in OpenOffice Calc you can create a graph by selecting Chart... from the Insert menu. You will probably find that an "XY Chart" is appropriate for this type of data. (Its icon has several squares scattered around the graph.) Once you have selected that, you may also find the subtype called "Symbols Only" to be appropriate.
Finally, I suggest that you use the same y-axis scale for each pair of graphs that depict results from the same search algorithm. This will make a visual comparison of the graphs more meaningful. (In OpenOffice Calc, you can set the y-axis scale as follows. Double-click on the chart, then right-click and select "axis" and then "y-axis" from the menu.)
Describe some conclusions you can draw from your experiments. For example, you might compare and contrast the various graphs you have made.