Pointers, Functions, and Arrays

Summary

You will write a program to graph the distribution of squared values using functions, pointers, and arrays.

Objectives

Learn to apply good functional decomposition
Practice working with arrays and pointers
Apply development principles in support of testing
Have fun choosing your own adventure

Background

A histogram is a useful statistical tool that helps us visualize how often a value from a sample falls into a certain category or numerical range. We also sometimes like to know what percentage of values in a sample is at or below a particular threshold, which is another graph called the cumulative distribution.

In this assignment you'll be writing a program that randomly generates a sample, or set of "unpredictable" values, and quantizes these random values into bins or buckets so we can visually "graph" how many values fall into each bucket. To emphasize the ease of code reuse when practicing proper functional decomposition (and do some more array processing), you'll also create and visualize a transformation of your original data view.

You won't be surprised to find out there is a mechanism for generating unpredictable (aka random) numbers between 0 and 1 in most programming languages, including C. One might ask whether the numbers produced are reasonably well-distributed over the intended range. If they are, we would expect to see a roughly equal number of values in each of several equally sized bins.

For example, if we divided the values between 0.0 and 1.0 (inclusive) into ten bins corresponding to the ranges [0,0.1), [0.1,0.2), ..., [0.9,1.0], where a square bracket indicates the value it bounds is included in the bin, but a parenthesis indicates it is excluded (so that every value has a unique destination bin), we can count how many times a random value falls into each bin and display the final counts with a screen marker.

Of course, if we placed a mark for each value in the sample, we would not get a very good visualization because each "bin" would have a bar that far exceeded our screen's dimensions. Instead, we figure out the maximum size of our graph and scale the data so that the fullest bin occupies the most space, and the rest of the bin counts are scaled relative to the fullest bin.

Taking this approach, such a histogram of uniform random numbers with a sample size of just 100 and ten bins looks like the following.

***************************************************
*********************************************************************
**********************************************
********************************************************************************
***************************************************************
**************************************************************************
*****************************
***************************************************
**********************************************
***************************************************************

This graph does not look very uniform; that is, the counts vary widely. This variance is probably due to our very small sample size. What happens if we increase the number of samples one-hundred-fold? While we're at it, let's up the granularity of our measurements by halving our bin size (increasing the number of bins). The result looks like this.

*******************************************************************************
********************************************************************************
*************************************************************************
****************************************************************************
*****************************************************************************
***************************************************************************
**************************************************************************
*****************************************************************************
****************************************************************************
*******************************************************************************
**************************************************************************
*************************************************************************
**************************************************************************
*******************************************************************************
***********************************************************************
*******************************************************************************
***********************************************************************
****************************************************************************
************************************************************************
***************************************************************************

You're not going to get gold stars in your lab notebook for that graph, but it looks much closer to what we expect. Simulation for the win!

Another interesting quantity is the so-called cumulative distribution. What's that? Well, the first thing you graph would be the the count in the first bin. The second thing you graph would be the sum of the first and second bins. The third thing you graph is the sum of the first, second, and third, bins. The fourth ... well, you get the picture. Because each of our bins should be roughly the same size, we should expect a nice linear relationship in our cumulative distribution graph. Let's visualize the cumulative of the sample above, also with 20 bins.

****
********
************
****************
********************
************************
****************************
********************************
************************************
*****************************************
********************************************
************************************************
****************************************************
********************************************************
************************************************************
****************************************************************
********************************************************************
************************************************************************
****************************************************************************
********************************************************************************

Looking good! Now, what do we need to do? Listing the tasks in the order they might come to mind, we need to

Draw graphs. Two of 'em. Better make a function for that.
Generate samples. How do we get random values? Better look that up.
Quantize values. That is, given a number in [0,1], how do we figure out which bin it belongs in?
Make a cumulative sum. Sounds like another good opportunity for a function to abstract this specific task.
Test everything. Of course, we'll want to rigorously test all these pieces to make sure they work correctly.

We'll probably think of some other stuff along the way, but often writing out comments and working from the top down to the details as we stub out functions helps us figure that out. Now, let's go to work.

Assignment

Place all of your code in a single file called graph.c. However, rather than graph the uninteresting uniform distribution shown above, you will graph the distribution when the uniformly distributed value is squared. (That is, get a random number in [0,1] and square it; look at the histogram and cumulative distribution of those squared values.) Any idea what that graph will look like? Ponder it for a moment, then let's get started.

Getting Started

Begin by writing out comments within main for the major top-level steps indicated above:
- run a simulation that generates counts of samples
- calculate the cumulative distribution of those counts
- draw two graphs
We don't want any magic numbers in your program (values that appear as constant, devoid of context or explanation). Thus, in main define constant variables (a misnomer if there ever was one) in main using the const keyword for things like the sample size and maximum bar width (or the number of characters on a line). For example:
```
const unsigned int NUM_PLANETS = 8; /* Don't get Pluto started! */
```
Likewise, the number of bins is going to control your array variable length, so anywhere you declare an array of that length you will want to use a preprocessor macro with a #define directive. (Recall this helps the compiler know exactly how much space to reserve for your array on the stack, whereas even a constant variable won't allow the compiler to know the size.)
What other data will you need to store in your progam? Looks like you'll need two arrays: one to hold the counts and one to hold the cumulative sums. Declare those in main now with clear, sensible names and use the right types (Will they hold decimal values? Negative values?)
Good. Now let's give the top-level tasks some sensible (required) function names, determine their parameters, and create stubs for them. Don't forget to write a descriptive comment above each stub declaration.
```
void
runSimulation (int counts[], int numBins, int numSamples)

void
cumulativeSum (const int data[], int sums[], int size)

void
drawGraph(const int data[], int size, int screenWidth)
```
Now we are getting somewhere. Document pre-conditions and post-conditions for these procedures to the best of your ability. Once you're done with that, use the assert procedure to verify the preconditions.
Finish up by adding calls to these procedures where they belong in main.

Time to sanity check. Compile and run your program skeleton so far and make sure things are on the up and up.

Diving Deeper

You've now got a basic outline, but we need to do some more stubbing (no toes, please) before we really get to work.

We discovered above we need a way to generate a random number in the range [0,1] (sometimes called a standard uniform random number. Let's keep our focus on organizing the solution first and worry about details later. Stub out a function (with comments) to get that random number. Document it, write any pre/post conditions and use assertions for any verifiable pre-conditions.
```
double
getUniform (void)
```
We also need a way to take a number (say in the range [0,1] because squaring any such number keeps it inside that range) and figure out which bin it belongs to. (That is, which array index gets incremented in the histogram.) Stub out a function to make that determination, and do that whole documenting and asserting thing here, too.
```
int
quantize(double value, int numBins)
```
How will you scale your graph to fit on the screen? You need to find out how many items are in the fullest bucket. To do this, stub out a function that takes an array and returns the address of the largest element in the array (sometimes called the arg max by mathematicians). (And: Document! Assert! Do NOT implement! Remember that hours of coding can save minutes of planning.)
```
const int *
getArgMax (const int data[], int size)
```
Note that we have the procedure return a const pointer to signal to the programmer (and compiler) that the pointee should not be modified.

Compile and run your newly enhanced skeleton of stubs and make sure things are still running smoothly.
Now let's start to fill in details of one function. So, about those random numbers. See page 686 in King for info on the rand function and RAND_MAX macro, which you can combine to get the double you'll need. Armed with information, complete the details of the function you stubbed in #7 above.

Testing

But wait. Did you do it right? Let's set up a regime for at least running through some tests. (We don't have a lovely unit testing harness like you hopefully used in Scheme, but we'll do our best.)

Along with documentation, one might arguably want to write tests for your function before you write the function itself, because they both require you to understand what you want from your function before you set about actually writing it. Such a process is called test-driven development.

In the remainder of the assignment, you might consider writing a stub of the required functions, then writing your tests, before returning to finally implement the functions themselves.

Test Macros

An exercise in the C preprocessor lab gives a simple technique for constructing unit tests similar to the following:

/* Macro for testing. Requires variable numErrors be defined in scope of use. */
#define TEST(EXPR,RESULT)                                                     \
        if ((EXPR) != (RESULT)) {                                             \
          ++numErrors;                                                        \
          printf("Did not get expected result %s for %s.\n", #RESULT, #EXPR); \
        }

Add this macro to your program.

Test Functions

We can now begin to assemble a test harness for our procedures, beginning with the following functions.

/* Print a report of a collection of unit tests.
 * Prints OK when no errors and FAIL with a count otherwise.
 */
void
reportTests (int numErrors)
{
  if (numErrors>0)
    printf (" FAIL: %d errors\n",numErrors);
  else
    printf (" OK\n");
} // reportTests

/* Test all functionality by running test functions.
 * Returns the total number of errors.
*/
int
testAll (void)
{
  int numErrors = 0;

  numErrors += testGetUniform();

  if (numErrors==0)
    printf ("TESTS PASSED\n");
  else
    printf ("TOTAL ERRORS: %d\n",numErrors);
  return numErrors;
} // testAll

Include the functions reportTests and testAll in your program, then also add and complete the function below, which should run several tests of your getUniform function using your test macro and return the total number of errors.

(Remember to do the things we need to do to convince ourselves this code is correct: enumerate the circumstances, list the test cases and outcomes. Also: Randomness is tricky, so consider using srand to make your results repeatable.)
```
int
testGetUniform (void)
{
  int numErrors = 0;
  printf ("--GETUNIFORM TESTS--");

  // Add your test code here
  // For example...
  TEST( getUniform() >= 0, true );
	  
  reportTests (numErrors);
  return numErrors;
} // testGetUniform
```
For now, your main procedure can simply consist of a call to testAll. For example:
```
int
main (void)
{
#ifdef TESTING	  
  return testAll();
#endif
	  
  // ...
  return 0;	  
} // main
```
Note that a return value of 0 from main indicates to the caller (i.e., the user who typed the command in the terminal) that the program completed successfully.

Finally, now would be a great time to compile and run your program and see if everything is (still) aces.
This time, let's try things backwards to help us think through all the scenarios of our quantization function for #8.
1. Stub out a new procedure to hold the discretization test suite for the function in #8.
2. Add this function call to testAll, the function with your overall testing regime.
3. Now, come up with the tests that will convince you your quantization process is correct. That is, enumerate the circumstances and list some specific inputs and outputs to try. Be sure you test the special cases and boundaries!
Hopefully thinking about the tests helped you understand better how to write the function. Now, (finally!) go ahead and fill in the details to implement the calculations in your quantiation function in #8. Then, hold your breath and run your tests to see if all your code remains up to snuff.
Repeat the test-driven development approach for the maximum-finding procedure you stubbed in #9. That is, write the tests first, then write the function.

The Final Details

The last pieces of your infrastructure are finally coming together. Soon we'll start to really see some results.

Write and test your graph-drawing function. Draw one horizontal bar for every entry in the array, and make sure to fill the screen (some pre-set maximum width) with the largest bar. For example, the array {1,2,3,4} should look like the following (if the width is 80).
```
********************
****************************************
************************************************************
********************************************************************************
```
Now test and implement (yes! in that order! write tests then implement) your cumulative distribution function.
All the pieces are almost there. Go ahead and fill in the details (using the functions you've written) your simulation function from all the way back in #4. Remember, this is the function that should populate the histogram array, and that you're going to histogram the squared uniform values.

Did we forget anything? It seems you should pretty much be there. Try it out and see whether the shape of the distribution matches your expectations.

Make sure to add some titles to each graph so we know what we're looking at and they don't just run together.

Next time, it will be up to you to think about how to practice good program development techniques and incrementally solve the problem all on your own.

Grading

In addition to the general grading guidelines for evaluation, the assignment is worth 28 points.

Submit files graph.c, references.txt, and tests.txt.

[13 points] Main program

[1 point] Function produces random number in [0,1]
[2 points] Function correctly quantizes numbers
[2 points] Function correctly returns address of array maximum
[2 points] Function correctly calculates cumulative array sums
[2 points] Function correctly generates a sample histogram
[2 points] Function correctly draws a scaled graph from array
[2 points] Program graphs labeled histogram and cumulative

Evaluation of program Structure, Comments, Design, and Testing.
(Points not given, but points can be deducted.)
[15 points] Testing and Development
- [2 points] Functions give clear and complete pre- and post-conditions
- [4 points] Assertions verify pre-conditions
- [2 points] Test plan enumerates the ranges of problem circumstances
- [4 points] Tests compare cases to be considered with expected outcomes
- [2 points] Transcript records actual test runs
- [1 points] Summary statement explains why the program is correct
[≤5 points] Extra Credit
You may earn extra credit by enhancing your program in one significant way. Good possibilities include:
- horizontal/x-axis labels (i.e., evenly divide the range of the function being graphed and print labels along the bottom of the graph, centered at appropriate places)
- vertical/y-axis labels (print the bound of each line's bin to the left of the bar and be sure to adjust the bar width accordingly to avoid line-wrapping)
- use function pointers to run simulations and graphs for a variety of values

A five point penalty will apply to any submission that includes personally identifying information anywhere other than the references/honesty file.