Pointers, Functions, and Arrays
- Summary
- You will write a program to graph the distribution of squared values using functions, pointers, and arrays.
- Objectives
-
- Learn to apply good functional decomposition
- Practice working with arrays and pointers
- Apply development principles in support of testing
- Have fun choosing your own adventure
Background
A histogram is a useful statistical tool that helps us visualize how often a value from a sample falls into a certain category or numerical range. We also sometimes like to know what percentage of values in a sample is at or below a particular threshold, which is another graph called the cumulative distribution.
In this assignment you'll be writing a program that randomly generates a sample, or set of "unpredictable" values, and quantizes these random values into bins or buckets so we can visually "graph" how many values fall into each bucket. To emphasize the ease of code reuse when practicing proper functional decomposition (and do some more array processing), you'll also create and visualize a transformation of your original data view.
You won't be surprised to find out there is a mechanism for generating unpredictable (aka random) numbers between 0 and 1 in most programming languages, including C. One might ask whether the numbers produced are reasonably well-distributed over the intended range. If they are, we would expect to see a roughly equal number of values in each of several equally sized bins.
For example, if we divided the values between 0.0 and 1.0 (inclusive) into ten bins corresponding to the ranges [0,0.1), [0.1,0.2), ..., [0.9,1.0], where a square bracket indicates the value it bounds is included in the bin, but a parenthesis indicates it is excluded (so that every value has a unique destination bin), we can count how many times a random value falls into each bin and display the final counts with a screen marker.
Of course, if we placed a mark for each value in the sample, we would not get a very good visualization because each "bin" would have a bar that far exceeded our screen's dimensions. Instead, we figure out the maximum size of our graph and scale the data so that the fullest bin occupies the most space, and the rest of the bin counts are scaled relative to the fullest bin.
Taking this approach, such a histogram of uniform random numbers with a sample size of just 100 and ten bins looks like the following.
*************************************************** ********************************************************************* ********************************************** ******************************************************************************** *************************************************************** ************************************************************************** ***************************** *************************************************** ********************************************** ***************************************************************
This graph does not look very uniform; that is, the counts vary widely. This variance is probably due to our very small sample size. What happens if we increase the number of samples one-hundred-fold? While we're at it, let's up the granularity of our measurements by halving our bin size (increasing the number of bins). The result looks like this.
******************************************************************************* ******************************************************************************** ************************************************************************* **************************************************************************** ***************************************************************************** *************************************************************************** ************************************************************************** ***************************************************************************** **************************************************************************** ******************************************************************************* ************************************************************************** ************************************************************************* ************************************************************************** ******************************************************************************* *********************************************************************** ******************************************************************************* *********************************************************************** **************************************************************************** ************************************************************************ ***************************************************************************
You're not going to get gold stars in your lab notebook for that graph, but it looks much closer to what we expect. Simulation for the win!
Another interesting quantity is the so-called cumulative distribution. What's that? Well, the first thing you graph would be the the count in the first bin. The second thing you graph would be the sum of the first and second bins. The third thing you graph is the sum of the first, second, and third, bins. The fourth ... well, you get the picture. Because each of our bins should be roughly the same size, we should expect a nice linear relationship in our cumulative distribution graph. Let's visualize the cumulative of the sample above, also with 20 bins.
**** ******** ************ **************** ******************** ************************ **************************** ******************************** ************************************ ***************************************** ******************************************** ************************************************ **************************************************** ******************************************************** ************************************************************ **************************************************************** ******************************************************************** ************************************************************************ **************************************************************************** ********************************************************************************
Looking good! Now, what do we need to do? Listing the tasks in the order they might come to mind, we need to
- Draw graphs. Two of 'em. Better make a function for that.
- Generate samples. How do we get random values? Better look that up.
- Quantize values. That is, given a number in [0,1], how do we figure out which bin it belongs in?
- Make a cumulative sum. Sounds like another good opportunity for a function to abstract this specific task.
- Test everything. Of course, we'll want to rigorously test all these pieces to make sure they work correctly.
We'll probably think of some other stuff along the way, but often writing out comments and working from the top down to the details as we stub out functions helps us figure that out. Now, let's go to work.
Assignment
Place all of your code in a single file
called graph.c. However, rather than graph the
uninteresting uniform distribution shown above, you will graph the
distribution when the uniformly distributed value
is squared. (That is, get a random number in [0,1] and
square it; look at the histogram and cumulative distribution of those
squared values.) Any idea what that graph will look like? Ponder it
for a moment, then let's get started.
Getting Started
-
Begin by writing out comments within
mainfor the major top-level steps indicated above:- run a simulation that generates counts of samples
- calculate the cumulative distribution of those counts
- draw two graphs
-
We don't want any magic numbers in your program (values
that appear as constant, devoid of context or explanation). Thus,
in
maindefine constant variables (a misnomer if there ever was one) inmainusing theconstkeyword for things like the sample size and maximum bar width (or the number of characters on a line). For example:const unsigned int NUM_PLANETS = 8; /* Don't get Pluto started! */
Likewise, the number of bins is going to control your array variable length, so anywhere you declare an array of that length you will want to use a preprocessor macro with a#definedirective. (Recall this helps the compiler know exactly how much space to reserve for your array on the stack, whereas even a constant variable won't allow the compiler to know the size.) - What other data will you need to store in your progam? Looks like you'll need two arrays: one to hold the counts and one to hold the cumulative sums. Declare those in main now with clear, sensible names and use the right types (Will they hold decimal values? Negative values?)
-
Good. Now let's give the top-level tasks some sensible (required)
function names, determine their parameters, and create stubs for
them. Don't forget to write a descriptive comment above each stub
declaration.
void runSimulation (int counts[], int numBins, int numSamples) void cumulativeSum (const int data[], int sums[], int size) void drawGraph(const int data[], int size, int screenWidth) -
Now we are getting somewhere. Document pre-conditions and
post-conditions for these procedures to the best of your
ability. Once you're done with that, use the
assertprocedure to verify the preconditions. -
Finish up by adding calls to these procedures where they belong
in
main.
Time to sanity check. Compile and run your program skeleton so far and make sure things are on the up and up.
Diving Deeper
You've now got a basic outline, but we need to do some more stubbing (no toes, please) before we really get to work.
-
We discovered above we need a way to generate a random number
in the range [0,1] (sometimes called a standard uniform
random number. Let's keep our focus on organizing the solution
first and worry about details later. Stub out a function (with
comments) to get that random number. Document it, write any
pre/post conditions and use assertions for any verifiable
pre-conditions.
double getUniform (void) -
We also need a way to take a number (say in the range [0,1] because
squaring any such number keeps it inside that range) and figure out
which bin it belongs to. (That is, which array index gets
incremented in the histogram.) Stub out a function to make that
determination, and do that whole documenting and asserting thing
here, too.
int quantize(double value, int numBins) -
How will you scale your graph to fit on the screen? You need to
find out how many items are in the fullest bucket. To do this, stub
out a function that takes an array and returns the address
of the largest element in the array (sometimes called the arg
max by mathematicians). (And: Document! Assert!
Do NOT implement! Remember that hours of coding can save
minutes of planning.)
const int * getArgMax (const int data[], int size)Note that we have the procedure return a
constpointer to signal to the programmer (and compiler) that the pointee should not be modified.Compile and run your newly enhanced skeleton of stubs and make sure things are still running smoothly.
-
Now let's start to fill in details of one function. So, about those
random numbers. See page 686 in King for info on the
randfunction andRAND_MAXmacro, which you can combine to get thedoubleyou'll need. Armed with information, complete the details of the function you stubbed in #7 above.
Testing
But wait. Did you do it right? Let's set up a regime for at least running through some tests. (We don't have a lovely unit testing harness like you hopefully used in Scheme, but we'll do our best.)
Along with documentation, one might arguably want to write tests for your function before you write the function itself, because they both require you to understand what you want from your function before you set about actually writing it. Such a process is called test-driven development.
In the remainder of the assignment, you might consider writing a stub of the required functions, then writing your tests, before returning to finally implement the functions themselves.
Test Macros
An exercise in the C preprocessor lab gives a simple technique for constructing unit tests similar to the following:
/* Macro for testing. Requires variable numErrors be defined in scope of use. */
#define TEST(EXPR,RESULT) \
if ((EXPR) != (RESULT)) { \
++numErrors; \
printf("Did not get expected result %s for %s.\n", #RESULT, #EXPR); \
}
Add this macro to your program.
Test Functions
We can now begin to assemble a test harness for our procedures, beginning with the following functions.
/* Print a report of a collection of unit tests.
* Prints OK when no errors and FAIL with a count otherwise.
*/
void
reportTests (int numErrors)
{
if (numErrors>0)
printf (" FAIL: %d errors\n",numErrors);
else
printf (" OK\n");
} // reportTests
/* Test all functionality by running test functions.
* Returns the total number of errors.
*/
int
testAll (void)
{
int numErrors = 0;
numErrors += testGetUniform();
if (numErrors==0)
printf ("TESTS PASSED\n");
else
printf ("TOTAL ERRORS: %d\n",numErrors);
return numErrors;
} // testAll
-
Include the functions
reportTestsandtestAllin your program, then also add and complete the function below, which should run several tests of yourgetUniformfunction using your test macro and return the total number of errors.(Remember to do the things we need to do to convince ourselves this code is correct: enumerate the circumstances, list the test cases and outcomes. Also: Randomness is tricky, so consider using
srandto make your results repeatable.)int testGetUniform (void) { int numErrors = 0; printf ("--GETUNIFORM TESTS--"); // Add your test code here // For example... TEST( getUniform() >= 0, true ); reportTests (numErrors); return numErrors; } // testGetUniform -
For now, your
mainprocedure can simply consist of a call totestAll. For example:int main (void) { #ifdef TESTING return testAll(); #endif // ... return 0; } // mainNote that a return value of
0frommainindicates to the caller (i.e., the user who typed the command in the terminal) that the program completed successfully.Finally, now would be a great time to compile and run your program and see if everything is (still) aces.
-
This time, let's try things backwards to help us think through all the
scenarios of our quantization function for
#8.
- Stub out a new procedure to hold the discretization test suite for the function in #8.
-
Add this function call to
testAll, the function with your overall testing regime. - Now, come up with the tests that will convince you your quantization process is correct. That is, enumerate the circumstances and list some specific inputs and outputs to try. Be sure you test the special cases and boundaries!
- Hopefully thinking about the tests helped you understand better how to write the function. Now, (finally!) go ahead and fill in the details to implement the calculations in your quantiation function in #8. Then, hold your breath and run your tests to see if all your code remains up to snuff.
- Repeat the test-driven development approach for the maximum-finding procedure you stubbed in #9. That is, write the tests first, then write the function.
The Final Details
The last pieces of your infrastructure are finally coming together. Soon we'll start to really see some results.
-
Write and test your graph-drawing function. Draw one horizontal
bar for every entry in the array, and make sure to fill the screen
(some pre-set maximum width) with the largest bar. For example, the
array {1,2,3,4} should look like the following (if the width is 80).
******************** **************************************** ************************************************************ ********************************************************************************
- Now test and implement (yes! in that order! write tests then implement) your cumulative distribution function.
- All the pieces are almost there. Go ahead and fill in the details (using the functions you've written) your simulation function from all the way back in #4. Remember, this is the function that should populate the histogram array, and that you're going to histogram the squared uniform values.
Did we forget anything? It seems you should pretty much be there. Try it out and see whether the shape of the distribution matches your expectations.
Make sure to add some titles to each graph so we know what we're looking at and they don't just run together.
Next time, it will be up to you to think about how to practice good program development techniques and incrementally solve the problem all on your own.
Grading
In addition to the general grading guidelines for evaluation, the assignment is worth 28 points.
Submit files graph.c, references.txt, and tests.txt.
- [13 points] Main program
- [1 point] Function produces random number in [0,1]
- [2 points] Function correctly quantizes numbers
- [2 points] Function correctly returns address of array maximum
- [2 points] Function correctly calculates cumulative array sums
- [2 points] Function correctly generates a sample histogram
- [2 points] Function correctly draws a scaled graph from array
- [2 points] Program graphs labeled histogram and cumulative
-
Evaluation of program Structure, Comments, Design, and Testing.
(Points not given, but points can be deducted.) -
[15 points] Testing and Development
- [2 points] Functions give clear and complete pre- and post-conditions
- [4 points] Assertions verify pre-conditions
- [2 points] Test plan enumerates the ranges of problem circumstances
- [4 points] Tests compare cases to be considered with expected outcomes
- [2 points] Transcript records actual test runs
- [1 points] Summary statement explains why the program is correct
-
[≤5 points] Extra Credit
You may earn extra credit by enhancing your program in one significant way. Good possibilities include:
- horizontal/x-axis labels (i.e., evenly divide the range of the function being graphed and print labels along the bottom of the graph, centered at appropriate places)
- vertical/y-axis labels (print the bound of each line's bin to the left of the bar and be sure to adjust the bar width accordingly to avoid line-wrapping)
- use function pointers to run simulations and graphs for a variety of values
A five point penalty will apply to any submission that includes personally identifying information anywhere other than the references/honesty file.
