Elements of Efficiency

Although computers are often considered to work very quickly, only some algorithms proceed rapidly; others take much longer. This laboratory exercise provides an intuitive framework for the consideration of algorithm effectiveness, including the amount of time and memory required for an algorithm. When algorithms are applied to successively larger data sets, some algorithms may scale up nicely while others require more time than may be feasible. Also, when several algorithms are available to solve a problem, it is natural to wonder if one solution is better than another. Altogether, we might use many criteria to evaluate such solutions, including:

Accuracy: Of course, any program should produce correct answers. (If we were satisfied with wrong results, it is trivial to produce many such answers very quickly.) However, it may not be immediately clear just how accurate results should be in a specific instance. For example, one algorithm may be simple to program and may run quickly, but it only may be accurate to 5 decimal places. A second algorithm may be more complex and much slower, but may give 15 place accuracy. If 5-place accuracy is adequate for a specific application, the first algorithm is the better choice. However, if 10 or 12 place accuracy is required, the slower algorithm must be used.
Efficiency: Efficiency can be measured in many ways: programmer time, algorithm execution time, memory used, and so on. If a program is to be used once, then programmer time may be a major consideration, and a simple algorithm might be preferred. If a program is to be used many times, however, then it may be worth spending more development time with a complex algorithm, so the procedure will run very quickly.
Use of Memory: One algorithm may require more computer memory in which to execute. If "space" (i.e., memory) is a scarce resource, then the amount of space an algorithm requires should be taken into consideration when comparing algorithms.
Ease of Modification: It is common practice to modify old programs to solve new problems. A very obscure algorithm that is difficult to understand, therefore, is usually less desirable than one which can be easily read and modified.

For this laboratory exercise, we focus on algorithm execution time.

Algorithm Execution Time

In determining algorithm execution time, we may proceed in several ways:

We may time how long an algorithm takes on a specific machine.
We may analyze the algorithm at a detailed level to determine how many instructions a computer must execute to solve the problem.
We may analyze the algorithm at a high level to approximate time factors, independent of a specific machine.

Each of these approaches has advantages, but each also has drawbacks. Execution times on a specific machine normally depend upon details of the machine and on the specific data used. Timings may vary from data set to data set and from machine to machine, so experiments from one machine and one data set may not be very helpful in general.

For many purposes, it turns out that a high-level analysis provides adequate information to compare algorithms. For the most part, we follow that approach here.

Linear and Binary Searches

In class, we have discussed both the linear search and the binary search.

Write a brief description (a few sentences) of how a linear search works.
Write a brief description (a few sentences) of how a binary search works. Be sure to include any assumptions the algorithm makes.

In class, we have discussed how efficient each of these algorithms might be. Now we take a more concrete, experimental approach.

Experiments Regarding Search

We will search for random numbers in a sequence of even integers:

0, 2, 4, 6, ..., maximum

Technically, the structure holding such a collection of data is called an array. With the array data just described, a search should be successful if we are looking for an even, nonnegative integer that is not too large. The search should fail if the desired integer is negative, odd, or too large. In our experiment, we will pick 60 integers as candidates for the search. We pick the first 30 from within the array, to ensure that the search will be successful. Then we then pick the last 30 from the set of odd integers, so that the search will fail. In each case, we will select numbers at random from the given set.

The `searchTest` program

The program searchTest performs both a linear search and a binary search for randomly-selected items, and records the (approximate) number of comparisons performed by each search algorithm.

The mechanics of running these search experiments are as follows:

Open a terminal window and move to your 105 directory, with the command:
```
  cd 105
```
Copy the program to your 105 directory, with the command below. (Note that the final period is necessary.)
```
  cp ~weinman/courses/CSC105/labs/searchTest .
```
Run the program with the command:
```
  ./searchTest
```
The program will ask you to enter the size of the sequence to be searched. I suggest you use values in the range from 1000 to 50,000, but you can also choose smaller or larger sizes for the array. The program then will report on each of the 60 trials (values searched for), and give aggregate results for each search algorithm and each search outcome.
Note: Do not use commas when entering numbers.
Note: After you run the above command the first time, you can use the up-arrow key at your keyboard to retrieve the same command again. After pressing the up-arrow key to get the desired command, just press Enter to run the program again.

Collecting Experimental Data

In this part of the lab, you will gather experimental data regarding the search times for both linear and binary searches, using various array sizes.

Run searchTest for a several array sizes between 1000 and 50,000.

Use a spreadsheet to record the results from each program run (i.e., for each array size in your experiment). I suggest you make separate tables to record each of the following:
1. array sizes and steps for successful searches with linear search,
2. array sizes and steps for unsuccessful searches with linear search,
3. array sizes and steps for successful searches using binary search, and
4. array sizes and steps for unsuccessful searches using binary search.
Do double-check your data entry! Typographical errors can add more spice than is intended to your results.
Now use your spreadsheet to plot the data for each of the four tables you just created. (Make a separate graph for each table.) The horizontal axis should indicate the array size, and the vertical axis should indicate the average number of steps required. You should conduct sufficient experiments, so that a fairly consistent pattern emerges.

Recall that, in OpenOffice Calc you can create a graph by selecting Chart... from the Insert menu. You will probably find that an "XY Chart" is appropriate for this type of data. (Its icon has several squares scattered around the graph.) Once you have selected that, you may also find the subtype called "Symbols Only" to be appropriate.

Finally, I suggest that you use the same y-axis scale for each pair of graphs that depict results from the same search algorithm. This will make a visual comparison of the graphs more meaningful. (In OpenOffice Calc, you can set the y-axis scale as follows. Double-click on the chart, then right-click and select "axis" and then "y-axis" from the menu.)
Describe some conclusions you can draw from your experiments. For example, you might compare and contrast the various graphs you have made.