CSC 213, Fall 2008 : Schedule : Lab 11

Lab 10: Ping ... pong!

Goals:

Appreciate the difference between local and remote communication for parallel processing
Gain familiarity with datagram message passing using connectionless sockets
Use a regression technique common in systems analysis

Reading:

Read Foster [DBPP] Section 3.6, Evaluating Implementations
Review Foster [DBPP] Section 3.3.1, Execution Time
Review Nutt, 15.5 The Transport Layer, particularly the properties of datagrams
Broadwell, An Introduction to Sockets in C Through Annotated Examples , pp. 19-26.
A brief overview of linear regression in Gnuplot. (You may wish to read this after you read through the lab so you have some context for it.)

Collaboration: You will complete this lab in teams of 2 of your choice. Since there is often an odd number, one may be a group of 3 (which must be different from the previous lab if there are multiple potential triples.) You may, of course, consult with other classmates on design and debugging.

Background: This lab is based on Foster [DBPP] Exercise 3.9 .

Overview: In this lab, you will gather empirical data to test the idealized model of communication performance given by Equation 3.1 and illustrated in Figure 3.3 of Foster [DBPP]. To do this, the round trip time to send an identical message from one process to another will be measured when processes are on the same and on different machines.

Part A

Copy dgping.c and dgpong.c to your own directory and compile them. Review the code and run them so that you understand what each does.
Run dgping without setting up a "listening" server for the host and port that you query. Explain what happens.
We will add two enhancements to our ping program, one to handle lost messages and another to measure the message round trip time (RTT). Make the following modifications to dgping.c
1. UDP is not a reliable message transport, so it is possible we will never receive a return message from the server if either the original message was not delivered or the return message does not get delivered. Use alarm(2) and a signal handler for SIGALRM to set a reasonable "timeout" after sending a message and waiting for a response. Be sure to take care of all necessary cleanup in your handler.
2. Use gettimeofday(2) to measure the wall clock time for creating the socket and both sending and receiving messages. The should report the elapsed time in microseconds. Note that you will have to carefully handle the math for subtracting two times given in the timeval (sec, usec) format.
We will also add two enhancements to our pong program so that it can receive an indefinite number of ping requests. Make the following modifications to dgpong.c
1. Our pong server need not be concurrent, but it should handle requests from any particular client, service the request (bounce the message back to that client), and then accept a request from a different client. Modify the program to do this indefinitely.
2. Since the pong server should run indefinitely until the user (or some other process) interrupts it, install a handler to take care of clean up if and when a SIGINT occurs.

Part B

Data measurement. Set up your pong program to run on both the local machine (hostname localhost), and a different machine in the OS lab. In this part of the lab, you will conduct an experiment that measures RTT for intra- and inter- machine communication.
1. Measure the RTTs to send messages of the following lengths (in bytes): 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 11585, 16384, 23170, 32768, 46341, 65500.
2. To make the data interpretation part of the lab easier, you may wish to have your program output only the number (time measured). In addition, since we want to know the one-way message time, your program should divide the total elapsed time by two before printing it.
3. Perform the experiment with the client and server both on the same machine and when the server is on a different machine. You will be performing a statistical analysis of the performance to account for variations in the duration of each message time. To facilitate this, you will need record several (N≥30) times for each message length. Here is an example of a (bash) shell command that will generate a file with lines giving the measurements in the format
  msgLength time
  If your shell is something other than bash, you may need to modify this slightly to meet the syntax of your shell. You can augment the following command to include all the message lengths you must measure times for.
```
for len in 4 8 16 32 64 128
  do
    for ((i=0 ; i<30 ; i++))
      do
        echo -n "$len " >> timeFile
        ./dgping host port $len >> timeFile
      done
  done
        
```
  The leading spaces are not required; they are only helpful for visualization.
Data analysis. Recall that x is called the "independent" variable in regression, since it is the one we can set arbitrarily. In this experiment, it will be the length of the message sent. The dependent value y will be the communication time.

Our idealized communications model is a line. It asserts that there is some constant startup cost for initiating a message, and then a per byte cost for sending a message. Thus, after the startup cost is paid, the total time increases linearly with the cost per byte.

Follow the directions in the regression primer to fit lines to your data for both intra- and inter-processor communication times. The parameters m and b correspond to the cost per byte and the startup cost, respectively.

You may use a package other than gnuplot to do the fit, if you wish, but gnuplot is a very handy tool and you may wish to gain some familiarity with it by using for this specific task.

Answer the following questions based on your plots:
1. What are the values you found for start up and cost per byte for intra- and inter-processor communication?
2. Is intra-processor communication time particularly well-modeled by the ideal linear model? Based on the data and the fit, explain why or why not. If not, hypothesize some factors that may explain what you see.
3. Is inter-processor communication time reasonably well-modeled by the ideal linear model? Based on the data and the fit, explain why or why not. If not, hypothesize some factors that may explain what you see. If so, is it well-modeled everywhere (i.e., for all message sizes)? Is there an alternative form to the model (something other than a line) you would propose? Why? Can you hypothesize any explanations for your proposed model?
  
  (Hint: use something like "set xrange [0 : 5000]" and "set yrange [2000 : 4000]" to limit the viewing region of your plot and perhaps see greater detail.)

Work To Be Turned In

The lab is due in its entirety on Tuesday 11/18.

Code and test runs for part A.
A nice writeup including your plots of the data from B.1 and answers to B.2(a-c).

As always, please follow the instructions for submitting programs.

Jerod Weinman

Created on June 30, 2008