CSC 213, Fall 2008 : Schedule : Lab 11
Goals:
Reading:
Collaboration: You will complete this lab in teams of 2 of your choice. Since there is often an odd number, one may be a group of 3 (which must be different from the previous lab if there are multiple potential triples.) You may, of course, consult with other classmates on design and debugging.
Background: This lab is based on Foster [DBPP] Exercise 3.9 .
Overview: In this lab, you will gather empirical data to test the idealized model of communication performance given by Equation 3.1 and illustrated in Figure 3.3 of Foster [DBPP]. To do this, the round trip time to send an identical message from one process to another will be measured when processes are on the same and on different machines.
for len in 4 8 16 32 64 128
do
for ((i=0 ; i<30 ; i++))
do
echo -n "$len " >> timeFile
./dgping host port $len >> timeFile
done
done
The leading spaces are not required; they are only helpful for
visualization.
Data analysis. Recall that x is called the "independent" variable in regression, since it is the one we can set arbitrarily. In this experiment, it will be the length of the message sent. The dependent value y will be the communication time.
Our idealized communications model is a line. It asserts that there is some constant startup cost for initiating a message, and then a per byte cost for sending a message. Thus, after the startup cost is paid, the total time increases linearly with the cost per byte.
Follow the directions in the regression primer to fit lines to your data for both intra- and inter-processor communication times. The parameters m and b correspond to the cost per byte and the startup cost, respectively.
You may use a package other than gnuplot to do the fit, if you wish, but gnuplot is a very handy tool and you may wish to gain some familiarity with it by using for this specific task.
Answer the following questions based on your plots:
Is inter-processor communication time reasonably well-modeled by the ideal linear model? Based on the data and the fit, explain why or why not. If not, hypothesize some factors that may explain what you see. If so, is it well-modeled everywhere (i.e., for all message sizes)? Is there an alternative form to the model (something other than a line) you would propose? Why? Can you hypothesize any explanations for your proposed model?
(Hint: use something like "set xrange [0 : 5000]" and "set yrange [2000 : 4000]" to limit the viewing region of your plot and perhaps see greater detail.)