CSC 213, Fall 2008 : Schedule : Lab 10

Lab 9: Sockets for a Simple Web Client and Server

Goals:

To gain experience using the C/UNIX sockets interface.
To reinforce your learning about security, the file system, signals, and related system calls.
To learn a little about the HTTP protocol, a text-based application-layer protocol.
To be able to say that you've implemented a simple web client and server. :)

Background:

Broadwell, An Introduction to Sockets in C Through Annotated Examples , p. 1-18.
Nutt, Lab 15.1, p. 655-661 discusses sockets, though in the context of a different application.

Collaboration: You will complete this lab in teams of 2-3 of your choice. You may consult me or your classmates with proper attribution.

Part A: Experiments with IP addresses and hostnames

Download the files netserv.c and netclient.c to your account and compile them. Review the programs to be sure you understand them.
Compile both programs. In one terminal window, run netserv. In another, run netclient. Briefly explain what you see.
Type the command nslookup hostname, where hostname is the name of the computer you are working on, to learn the IP address of that computer \.
In netclient.c, change the definition of IP_ADDRESS so that it is the address of the computer you are working on. Recompile netclient.c.
Now, run netserv and netclient again. Explain what you see.
Suppose you run netclient when netserv is not running? Try this and explain what you see.
Wouldn't it be great if we could run netserv and netclient on any computer, without having to change the IP address and recompile the client? The way we will do this is by using the getaddrinfo(3) library call. This function lets us supply a hostname as a string, and it will resolve that hostname to a result of type struct sockaddr. The program my-nslookup.c illustrates the use of getaddrinfo(3). Download this program, compile it, run it a few times supplying different hostnames as arguments, and review the code to understand how it works.
Modify netclient.cso that it takes the hostname as a command-line argument. Use code from my-nslookup.c to resolve this hostname to a result of type struct sockaddr, and then use the result you obtain to connect to this address rather than the hard-coded IP_ADDRESS.

Part B: A simple web client

The wget(1) program allows you to fetch the contents of a URL and save it to a local file. If you've never used wget, try using it to fetch the file named by http://www.cs.grinnell.edu/~weinman/csc/213/2008F/index.html In this part of the lab, we will build a simple analog to wget. Our program will take a URL as a command-line argument and write the response from the web server to standard output.
One problem we will face in building our simple web client is that of parsing URLs. Luckily, we can build a very simple web client while only parsing a limited class of URLs: those of the form
<protocol>://<hostname>/<path>
or
<protocol>://<hostname>:<port>/<path>
The following, library, example, and Makefile,
illustrate a simple parser for URLs of this form. Download these, compile the library and test program with make testparse, and run the resulting parseurl program on a few different URLs. Review the code to understand what it does.
To put together your simple web client, make an appropriately named copy of your program from step 8 and then make the following changes.
1. The command-line argument should be a URL rather than a hostname.
2. Parse the URL and connect to the hostname and port specified.
3. The HTTP protocol, in its simplest form, is very, very simple. After connecting the socket, write a request of the following form (where ↵ indicates a newline character):
```
GET path HTTP/1.0↵
HOST: hostname↵
↵
        
```
4. At this point, data from the web server should start arriving on the socket. Repeatedly read chunks of data from the socket into a character array and then write the data to standard output. You will know all of the data has been read when the return value from the read(...) call is zero (indicating 0 bytes read). When all of the data has been read, close the socket. (Note that the data read from the socket will not end with a null byte (\0).
5. Note that if the <path> portion of the URL (as given above in 10) is empty, the parse_url routine will place an empty string (i.e., only a null byte) in the path field of the url_data_t struct. When this is the case, the HTTP protocol insists that the path of the request (as shown in 11.c) must be a forward slash, e.g., GET / HTTP/1.0, etc.
Examine the result of running your program from step 11 for the URL http://www.cs.grinnell.edu/. The HTTP header is separated from the contents of the file by a blank line. What information do you see in the header?
Examine the result of running your program from step 11 for the URL http://www.cs.grinnell.edu/does-not-exist. How is the HTTP header different from what you saw in step 12?

Part C: A simple web server

Getting started. Download the file wwwserv.c and put it int he same place with parseurl.c and the Makefile. Compile it with Make wwwserv and run the resulting program wwwserv. Point your web browser (e.g., Firefox) to the url http://localhost:8000/. Review wwwserv.c to understand what you observed.
Security.Note that we use the ROOT constant to specify the directory in which the web server should look for files. Does this guarantee that a malicious web client cannot access files outside of the web server's root directory? If so, explain why. If not, give an example of a request a client could make to obtain an "unauthorized" file.
A note on sockets. Run wwwserv again and use the client you wrote in part B to request the url http://localhost:8000/. Then, run the command "netstat --inet". You will see a list of all open Internet domain sockets on the computer, such as the following:
```
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost.localdom:8000 localhost.localdo:42526 TIME_WAIT
      
```
The TIME_WAIT state is used to ensure that "stray" packets from previous connections do not interfere with new connections, when a new socket is opened on the same port number as an old connection. In practice, it means you cannot open another socket using the same port number for about two minutes after the previous socket is closed. (You may have already discovered this in part A of the lab.)

For this reason, wwwserv accepts a port number as an optional command-line argument. This will allow you to easily cycle through a few different server port numbers (e.g., 8000, 8001, 8002, 8003) as you are testing your server.
Serving content. Modify wwwserv.c to write the actual contents of the file to the connection socket, rather than "Content of file goes here." To test the server, you may wish to use the WWW client program you wrote in Part B.

A good strategy is to iteratively read chunks of data from the file into a character array and then write them to the socket. (Much like you did in Lab 3.)
Servicing concurrent requests. Modify the program so that it can accept and service multiple connections, using fork. Note that the parent process should not wait for a child to complete its execution before accepting the next connection. If it waits for each connection to complete before starting the next, then it's not concurrent!

Work to be Turned In

Parts A & B: Due Friday 11/7

Brief commentary for steps 2, 5, 6, 12, and 13.
Programs for steps 8 and 11. Follow the instructions for submitting programs.

Part C: Due Monday, 11/10

Brief commentary for step 15.
The final program from steps 17-18. Follow the instructions for submitting programs.

Jerod Weinman

Created June 26, 2008
Based on CSC 213, Fall 2006 : Lab 10 : A Simple Web Client and Server
With thanks to Janet Davis and Henry Walker