Lab: Threads and Multicore Locality

CSC 213 - Operating Systems and Parallel Algorithms - Weinman



Summary:
Test how the the principle of locality applies to parallel programs.
Assigned:
Tuesday 9 October
Due:
10:30 PM Monday 15 October
Objectives:
 
Resource:
Blaise Barney's POSIX Threads Programming, sections 1-5.
Collaboration:
You will complete this lab in teams as assigned by the instructor.

Preliminaries

Exercises

Part A: Threading Multiplication

  1. In matrixop.h, create a struct, called mtxThreadParam_t that will contain the parameters needed for a threaded matrix product. It should include pointers to the two multiplicands, the product, as well as the particular thread number and the total number of threads over which the product is distributed.
  2. In matrixop.c, add the function
    void* threadMtxMultiplyMin(void *multParam)
    which performs the (faster) multiplication work for a single thread; it is the analog of mtxMultiplyMin. It will take *mtxThreadParam_t as an argument and return nothing.
    While there are several ways to divy up the work, the easiest approach is to start the outermost loop (whether for a row or column) at the particular thread index and set the loop stride to the number of threads. For example, if there are a total of two threads, each will do the work for every other column (row, respectively); if three threads, each will do every third column (row, respectively). The inner loops would use a unit stride, as in the unthreaded versions.
  3. Add the corresponding function threadMtxMultiplyMax, which performs the slower multiplication work; it is the analog of mtxMultiplyMax.
  4. Add a function parMtxMultiply that computes the threaded, concurrent (and hopefully parallel) matrix product. In addition to the matrix pointer arguments, the function should take the number of threads to use and which method (max or min). Your function should
    For good additional examples, see Example 2 - Thread Argument Passing and Example Code - Pthread Joining in the POSIX Threads Programming tutorial.
  5. Write a program ptestmultiply.c that takes N (matrix size), method (min or max), and a positive number of threads as arguments. Your program should create two N×N matrices filled with random values, and multiply them using the given method with the specified number of threads. The final program should not print the result, but you may want to in an intermediate version for testing.
    Add this program to your Makefile.

Part B: Performance Profiling

  1. Finally, modify your program to print the elapsed wall-clock time (in decimal seconds) for only the call to parMtxMultiply (that is, exclude any initialization from the timing). You can do this easily using a combination of gettimeofday(2) and timersub(3).
    For example:
    ./ptestmultiply 128 min 2
    0.008150
    ./ptestmultiply 2048 max 4
    58.491341
  2. Using both min and max options, run your program for values of N = [128, 192, 256, 384, 512, 768, 1024, 1792] with 1-4 threads. Collect the total times and order them nicely in a text file or spreadsheet.
    If you implemented your functions efficiently with pointers, you should be able to use N=2048 as well.
  3. Use a spreadsheet (or another tool of your choice, such as R, gnuplot, or MATLAB) to create the following plots:
    1. Matrix size N versus total time for both the min and max methods and varying numbers of threads. You may collect the various plots into axes in the way you deem most sensible for analysis.
    2. Matrix size N versus the speedup1
      S= T1(N)

      TP(N)
      where T is the total time and P is the number of threads. Show the plots for P > 1 on the same axes.
    3. Matrix size N versus the efficiency
      E= T1(N)

      P×TP(N)
      .
      Show the plots for for P > 1 on the same axes.
    4. Number of threads (P > 1) versus speedup for both the min and max methods using only the largest N you ran.
    5. Number of threads (P > 1) versus efficiency for both the min and max methods using only the largest N you ran.
  4. Analyze your results by answering the following questions:
    1. How do the times vary (Plot 3a) between the min and max methods and the number of threads? Explain any similarities and differences you see.
    2. Describe and explain what trends (if any) you see in the speedup as N increases.
    3. Describe and explain what trends (if any) you see in the efficiency as N increases.
    4. Describe and explain what trends (if any) you see in the speedup as P increases.
    5. Describe and explain what trends (if any) you see in the efficiency as P increases.
    Be sure to include any relevant details about the machine on which you performed your experiments, so that your reader can understand the context for them.

What to turn in

Copyright © 2012 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Footnotes:

1See Foster, Efficiency and Speedup, or Wikipedia, Speedup