Lab: Threads

CSC 213 - Operating Systems and Parallel Algorithms - Weinman



Summary:
Parallelize a multi-threaded matrix multiplication algorithm.
Assigned:
Tuesday 7 October
Due:
11:30 PM Tuesday 14 October
Objectives:
 
Resource:
Blaise Barney's POSIX Threads Programming, sections 1-5.
Collaboration:
You will complete this lab in teams as assigned by the instructor.

Preliminaries

Exercises

Part A: Threading Multiplication

  1. In matrixop.h, define a struct, called mtxThreadParam_t that will contain the parameters needed for a threaded matrix product. The struct should include pointers to the two factors, the product, as well as a particular thread number and the total number of threads over which the product is distributed:
    struct matrixThreadParam_t {
      const struct matrix_t *a;
      const struct matrix_t *b;
      struct matrix_t *c;
      int numThreads;
      int threadId;
    };
    Now that you have copied and pasted this code into your file, be sure to cite and attribute its origin properly.
  2. In matrixop.c, add the function
    void* threadMtxMultiplyMin(void *multParam)
    which performs the multiplication work for a single thread; it is the analog of mtxMultiplyMin. The function will take struct matrixThreadParam_t* as an argument and return nothing (NULL).
    While there are several ways to divy up the work, the easiest approach is to start the outermost loop (whether for a row or column) at the particular thread index and set the loop stride to the number of threads. For example, if there are a total of two threads, each will do the work for every other column (row, respectively); if three threads, each will do every third column (row, respectively). The inner loops would use a unit stride, as in the unthreaded versions.
  3. Add a function parMtxMultiply that computes the threaded, concurrent (and hopefully parallel) matrix product. In addition to the matrix pointer arguments, the function should take the number of threads to use:
    int parMtxMultiply( const struct matrix_t *a, 
                        const struct matrix_t *b,
                        struct matrix_t *c,
                        int numThreads)
    Your function should
    For good additional examples of how to do these things, see Example 2 - Thread Argument Passing and Example Code - Pthread Joining in the POSIX Threads Programming tutorial.

Evaluation

As before, more credit will be given for solutions that are careful to distribute all stages of the computation and that efficiently use pointers for all matrix data references (both reading and storing).

Part B: Performance Profiling

The program ptimemultiply.c takes N (matrix size) and T (number of threads) as arguments. It then takes the multi-threaded product of two N×N matrices three times, using your parMtxMultiply function. While the program does not include the set-up (memory allocation and matrix initialization), to simplify matters, it measures and prints the wall clock time using gettimeofday(2) and timersub(3). Because this puts your measurements somewhat at the mercy of the scheduler, it will be even more important to print multiple test runs, tab separated for each of three runs on a single line:
./ptimemultiply 512 4
  0.098640   0.098599   0.100816
./ptimemultiply 256 2
  0.032886   0.029050   0.028003
Of course, ptimemultiply.c requires you to have implemented parMtxMultiply. However, a precompiled version is already provided for you should you want to benchmark the (non)efficiency of your pointer arithmetic or use a trusted version for your experiments.
You can build your own from your modified matrixop.c with make
make ptimemultiply
In any case, be sure your report (below) documents the provenance of the program used to generate your data.
  1. Run the ptimemultiply program for values of N = [128, 192, 256, 384, 512, 768, 1024, 1536] with 1, 2, 3, and 4 threads. Collect the times and order them nicely in a text file or spreadsheet.
    Hint: Use bash for loops for easy data generation.
    If you implemented your function efficiently with pointers, you should be able (and are encouraged) to use N=2048 as well.
  2. Use a spreadsheet (or another tool of your choice, such as R, gnuplot, or MATLAB) to create the following plots:
    1. Matrix size N versus average total time with varying numbers of threads. You may collect the plots into axes in the way you deem most sensible for analysis. Be sure to label your axes (giving units), title your plot, and (if appropriate) provide a legend.
    2. Matrix size N versus the speedup1
      S= T1(N)

      TP(N)
      where T is the total time and P > 1 is the number of threads. Show the plots on the same axes. Title your plot, label your axes, and provide a legend.
    3. Matrix size N versus the efficiency
      E= T1(N)

      P×TP(N)
      .
      Show the plots for P > 1 on the same axes. Title your plot, label your axes, and provide a legend.
    4. Number of threads (P > 1) versus speedup using only the largest N you ran.
    5. Number of threads (P > 1) versus efficiency using only the largest N you ran.
  3. Analyze your results by answering the following questions:
    1. Describe and explain how the times vary (Plot 2a) with the number of threads.
    2. Describe and explain the trend in the speedup as N increases as well as any deviations from the trend when N is smaller.
    3. Describe and explain the trend in the efficiency as N increases as well as any deviations from the trend when N is smaller.
    4. Describe and explain the trend in the speedup as P increases.
    5. Describe and explain the trend in the efficiency as P increases.
    Be sure to include any relevant details about the machine on which you performed your experiments, so that your reader can understand the context for them.

What to turn in

Copyright © 2012, 2014 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Footnotes:

1See Foster, Efficiency and Speedup, or Wikipedia, Speedup