CSC 213, Fall 2008 : Schedule : Lab 3


Lab 3: I/O Buffering

Goals: To understand and appreciate device abstractions and, in particular, I/O buffering.

Reading:

Collaboration: You will complete this lab in teams of 2, as assigned by the instructor. (You may, of course, consult with other classmates on design and debugging.)

Discussion: Unix provides two primary ways to do file I/O.

  1. Since device management on the system is designed to mimic the file interface, we expect the usual functions open(2), read(2), write(2), and close(2) to be available for file I/O. These are all system calls in the kernel, and are typically referred to as unbuffered I/O.

  2. The C standard library also provides buffered I/O routines -- fopen(3), getc(3), putc(3), and fclose(3) (among others) -- that are a further abstraction of the system calls above.

The examples head-1.c and head-2.c demonstrate usage of these functions for reading the first few bytes of a file and printing them to standard output.

Overview: In this lab, you will write three programs and collect some statistics to accomplish the following:

  1. Copy a file using unbuffered I/O routines with various buffer sizes, measuring the system and user CPU times.
  2. Copy a file byte-by-byte using buffered I/O routines, measuring times again.
  3. Compare performance of buffered and unbuffered I/O, finding and considering the filesystem's block size.

Part A

  1. Unbuffered I/O: Write a procedure

         int unbufcp(int bufSz, int fdsrc, int fddst)
    that copies the input file associated with the file descriptor fdsrc to an output file associated with the file descriptor fddst using the unbuffered I/O operations read(2) and write(2). In particular, your function should use bufSz as the number of bytes to be read each time a call to read(2) is made. The return value should be the total number of calls made to read(2) or -1 if an error occurs. If possible, print an appropriate error message to stderr (using fprintf(3) or perror(3)). (Note the Unix diff(1) command will reveal any differences between files.)

  2. Buffered I/O: Write a procedure

         int bufcp( FILE *src, FILE * dst)
    that copies the input file associated with the file stream src to an output file associated with the file stream dest using the buffered I/O operations getc(3) and putc(3). The return value should be the total number of calls made to getc(3) or -1 if an error occurs. As above, print an appropriate error message to stderr when possible.

Part B

  1. Resource Usage: Write one or two programs that use the system call getrusage(2) to query the resource usage caused by your copy functions. This system call can be used by any process to examine its own resource usage or that of its children. Use fork(2) to create a child process that calls the appropriate copy function, but open any files before the fork. The parent process should query the resource usage of the child after the copy. Your program(s) should report:

    Note that the times are reported as integer second and microsecond times. You must convert these to decimal seconds and be sure to print a sufficient number of digits. (Info on the struct timeval type can be found in the man page for gettimeofday(2).)

  2. Empirical Analysis:Use your program(s) above to copy a large file (>50MB) from your system's local file system to /dev/null. (On most systems, /tmp is a local fs, while your home directory is not.) Collect results for buffer sizes of 20 ... 20. That is: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, and 1048576.

    Also acquire resource usage results by running the same copy task using the buffered I/O functions.

    If times are very small, you may wish to consider running the experiment multiple (e.g. 5-10) times and record averages for each buffer size. Organize your results in a table or several graphs, and use it as the basis to answer the following questions.

    1. What is the performance trend in terms of time, and other measures, for different buffer sizes?

    2. How do the times of unbuffered (optimal and otherwise) and buffered I/O compare?

    3. How do you suppose the number of system calls compares between unbuffered and buffered I/O?

    4. Why might an application programmer prefer buffered or unbuffered I/O? (Consider both program performance and programming effort.)

    5. How can you explain the system time curve for unbuffered I/O?

      Hint: You may wish to write a very short program using stat(2) on your source file and examine the struct stat field st_blksize to help answer this question.

    6. Are there any values in struct rusage that don't seem to be filled in by the OS (i.e., are always 0) but you think would be interesting? Why?

    In your write-up, be sure to report the details of the file you used, and anything you may have learned about the filesystem on the machine used.

Work To Be Turned In

As always, please follow the instructions for submitting programs.

Jerod Weinman

Created on May 29, 2008 Last revised on September 16, 2008