Summary

Previously, you have considered the efficiency of searching algorithms -- programs to search a collection of data for a specified item. Our work proceeded in three main steps:

We discussed in class how the linear search and binary search algorithms worked.
We analyzed the algorithms theoretically, to determine how the run time of the algorithms might change as the size of the data sets increased.
You conducted experiments to determine how long these algorithms took with actual data.

We now consider another common computing task: placing a collection of data in order -- in this case, in ascending order. Reordering of data as necessary, so the result is in ascending order, is called sorting. Here, we consider three sorting algorithms:

insertion sort
merge sort
permutation sort

Part I: Insertion Sort and Merge Sort

When analyzing sorting algorithms, it often happens that the analysis of efficiency depends considerably on the nature of the data. For example, if the original data set already is almost ordered, a sorting algorithm may behave rather differently than if the data set originally contains random data or is ordered in the reverse direction. For this reason, the analysis for sorting algorithms often considers separate cases, depending on the original nature of the data.

Exercise 1: Efficiency

Consider the insertion sort:

Would the insertion sort be most efficient when the original data were already in ascending order, in random order, or in descending order? Briefly justify your answer.
Would the insertion sort be least efficient when the original data were already in ascending order, in random order, or in descending order? Briefly justify your answer.

Part II: Experiments Regarding Insertion Sort and Merge Sort

The program sortTest performs both an insertion sort and merge sort for three types of data sets: data initially already in ascending order, data in random order, and data initially in descending order. In each case, the user specifies the number of data items to be sorted. The program then sorts the three data sets and records the (approximate) number of comparisons required for this work.

Exercise 1: Getting Started

The mechanics of running these search experiments are similar to those for the searching experiments:

In a terminal window, move to the directory containing our test program
```
  cd ~weinman/courses/CSC105/labs/
```
Run the test program with the command:
```
  ./sortTest
```
The program will ask you to enter the size of the sequence to be searched. Suggested values might be selected from the range from 500 to 10,000, although you can choose smaller or larger sizes for the array. (See the experiments on searching for some well-spaced base-2 sizes.)
As noted in the lab on searching, after you run the above command the first time, you can use the ↑ (upward-arrow) key at your keyboard to retrieve the same command again. After hitting the upward-arrow key to get the desired command, type Return or Enter to run the program again.

Exercise 2: Gathering Data

In this part of the lab, you are to gather experimental data regarding the times for sorting various size arrays using both the insertion sort and merge sort. In running your experiments, you should separately record the results for the different types of data sets.

Run sortTest for a variety of array sizes between 500 and 10,000, recording your results in each case in a spreadsheet. (See the experiments on searching for some well-spaced base-2 sizes.)

Exercise 3: Visualizing Data

Use the spreadsheet to plot the number of comparisons for various array sizes. The horizontal axis should indicate the size of the array, and the vertical axis should indicate time. Construct separate graphs for sorting data initially in ascending order, in random order, and in descending order. You should conduct sufficient experiments, so that a fairly consistent pattern emerges.

Describe (in words) the nature of the graphs you have observed.

Exercise 4: Analyzing Data

Examine the graphs to determine how the efficiency of the two sorting algorithms varies with different types of initial data.

Which algorithm is better for data already almost ordered?
Which algorithm is better for random data?
Which algorithm is better for data initially in descending order?
Which algorithm seems best overall? That is, which algorithm would you choose if you did not know what type of data might be encountered?

In each case, explain your answer based on what you know about how the algorithms work.

Part III: The Permutation Sort (BogoSort)

The permutation sort generates and checks all permutations of a data set until it finds one that is ordered. If the permutation sort is lucky, it may happen upon an ordered arrangement quickly. If the algorithm is unlucky, the permutation sort may try all permutations of the data set before finding the ordered one.

If a data set has N elements, then the number of permutations may be computed as follows:

Any of the N items might come first.
With one item selected to come first, any of the remaining N-1 items might come second.
With the first two items selected for the start, any of the remaining N-2 items might come third.
etc.

Altogether, the number of permutations possible for a data set with N elements is N*(N-1)*(N-2)*(N-3)*...*3*2*1. This value is called N factorial, written N!. Altogether, a permutation sort might have to generate N! permutations of data before finding the one that is ordered.

For your information, the first 10 values of N! are given in the following table:

N		N!
1		1
2		2
3		6
4		24
5		120
6		720
7		5040
8		40,320
9		362,880
10		3,628,800

As you can see, the factorial function increases very rapidly!

In the permutation sort, once a permutation is generated, then it must be examined to determine if it is ordered. This typically requires about N steps -- one for each item in the array.

Altogether, with generating and checking permutations, the amount of work for a permutation sort can be proportional to N * N!

Program bogoSortTest performs a permutation sort on different types of data, given the size of the array. Note that this program does not do this work multiple times, in contrast to the previous experiments you have run. The number of comparisons you record in these experiments reflect the number required to perform the sort just once.

Exercise 1

Run the program bogoSortTest, following the same steps followed for sortTest above. In collecting data, use array sizes of 3 to 12. (You can try 13 as well, but have something else to do as you are waiting.)
```
  ./bogoSortTest
```
As with insertion sort and merge sort, plot the comparisons you witness using a spreadsheet.
In reviewing your data, explain how the permutation sort provides an example of an algorithm that "hits the wall" with an exponential-time algorithm.

PostScript: Computational Thinking

Why does this matter?

Sorting: The following story is taken verbatim from an email sent by Roger Dannenberg, associate research professor of computer science and professional trumpeter. "I showed up to a big band gig, and the band leader passed out books with maybe 200 unordered charts and a set list with about 40 titles we were supposed to get out and place in order, ready to play. Everyone else started searching through the stack, pulling out charts one-at-a-time. I decided to sort the 200 charts alphabetically O(N log(N)) and then pull the charts O(M log(N)). I was still sorting when other band members were halfway through their charts, and I started to get some funny looks, but in the end, I finished first. That's computational thinking" (Wing, 2011, p. 22).
Wing, J. M. (2011, Spring). Computational Thinking—What and Why? In The Link: News from the School of Computer Science, (6), 20–23. http://link.cs.cmu.edu/files/11-399_The_Link_Newsletter-3.pdf

Created: Henry Walker, December 31, 2003
Revised: Henry Walker, February 27, 2006
Revision: Marge Coahran, March 14, 2007
Revision: Marge Coahran, April 3, 2008
Revision: Jerod Weinman, March 16, 2009
Revision: Jerod Weinman, March 12, 2014
Revision: Jerod Weinman, March 25, 2015