Lab: Threads
CSC 213 - Operating Systems and Parallel Algorithms - Weinman
- Summary:
- Parallelize a multi-threaded matrix multiplication algorithm.
- Assigned:
- Tuesday 7 October
- Due:
- 11:30 PM Tuesday 14 October
- Objectives:
-
- Introduce data-based problem decomposition
- Become familiar with launching threads in the POSIX pthreads
environment
- Investigate parallel program behavior through empirical analysis
- Resource:
- Blaise Barney's POSIX
Threads Programming, sections 1-5.
- Collaboration:
- You will complete this lab in teams
as assigned by the instructor.
Preliminaries
- You will build on your mtxMultiplyMin implementation from
the previous lab. If your previous partner has the files you wish
to derive your solutions from, make arrangements to copy them
before the lab session. Make sure you properly cite the source of
any code your solutions to this lab are derived from.
- Do this laboratory on a MathLAN workstation in 3819 or 3815.
- Determine the number of available processor cores in your machine
-
$ nproc
You may also find other interesting details worth noting
-
$ less /proc/cpuinfo
Exercises
Part A: Threading Multiplication
-
In matrixop.h, define a struct,
called mtxThreadParam_t that will contain the parameters
needed for a threaded matrix product. The struct should include pointers
to the two factors, the product, as well as a particular thread number
and the total number of threads over which the product is distributed:
-
struct matrixThreadParam_t {
const struct matrix_t *a;
const struct matrix_t *b;
struct matrix_t *c;
int numThreads;
int threadId;
};
Now that you have copied and pasted this code into your file, be sure
to cite and attribute its origin properly.
- In matrixop.c, add the function
-
void* threadMtxMultiplyMin(void *multParam)
which performs the multiplication work for a single thread;
it is the analog of mtxMultiplyMin. The function
will take struct matrixThreadParam_t* as an argument and
return nothing (NULL).
While there are several ways to divy up the work, the easiest approach
is to start the outermost loop (whether for a row or column) at the
particular thread index and set the loop stride to the number
of threads. For example, if there are a total of two threads, each
will do the work for every other column (row, respectively); if three
threads, each will do every third column (row, respectively). The
inner loops would use a unit stride, as in the unthreaded versions.
- Add a function parMtxMultiply that computes the
threaded, concurrent (and hopefully parallel) matrix product. In addition
to the matrix pointer arguments, the function should take the number
of threads to use:
-
int parMtxMultiply( const struct matrix_t *a,
const struct matrix_t *b,
struct matrix_t *c,
int numThreads)
Your function should
- verify the dimensions of the factors and product, returning -1 if
the parameters are invalid and zero when multiplication is successful
- allocate a parameter struct (cf. A.1) for each
thread
- initialize each thread and explicitly make it joinable
- print an appropriate message to stderr and exit the program
with EXIT_FAILURE if any thread cannot be created, and
- wait for all threads to complete before returning.
For good additional examples of how to do these things, see Example
2 - Thread Argument Passing and Example
Code - Pthread Joining in the POSIX Threads Programming
tutorial.
Evaluation
As before, more credit will be given for solutions that are careful
to distribute all stages of the computation and that efficiently use
pointers for all matrix data references (both reading and storing).
Part B: Performance Profiling
The program ptimemultiply.c takes N (matrix size)
and T (number of threads) as arguments. It then takes the
multi-threaded product of two N×N matrices three times, using
your parMtxMultiply function. While the program
does not include the set-up (memory allocation and matrix initialization),
to simplify matters, it measures and prints the wall clock time using
gettimeofday(2) and timersub(3).
Because this puts your measurements somewhat at the mercy of the scheduler,
it will be even more important to print multiple test runs, tab separated
for each of three runs on a single line:
-
$ ./ptimemultiply 512 4
0.098640 0.098599 0.100816
$ ./ptimemultiply 256 2
0.032886 0.029050 0.028003
Of course, ptimemultiply.c requires you to have implemented
parMtxMultiply. However, a precompiled version is
already provided for you should you want to benchmark the (non)efficiency
of your pointer arithmetic or use a trusted version for your experiments.
You can build your own from your modified matrixop.c with
make
-
$ make ptimemultiply
In any case, be sure your report (below) documents the provenance
of the program used to generate your data.
- Run the ptimemultiply program for values of N = [128,
192, 256, 384, 512, 768, 1024, 1536] with 1, 2, 3, and 4 threads.
Collect the times and order them nicely in a text file or spreadsheet.
Hint: Use bash for loops for easy data generation.
If you implemented your function efficiently with pointers, you should
be able (and are encouraged) to use N=2048 as well.
-
Use a spreadsheet (or another tool of your choice,
such as R, gnuplot, or MATLAB) to create the following
plots:
-
Matrix size N versus average total time
with varying numbers of threads. You may collect the plots into axes
in the way you deem most sensible for analysis. Be sure to label your
axes (giving units), title your plot, and (if appropriate) provide
a legend.
-
Matrix size N versus the speedup1
where T is the total time and P > 1 is the number of threads.
Show the plots on the same axes. Title your plot, label your axes,
and provide a legend.
- Matrix size N versus the efficiency
Show the plots for P > 1 on the same axes. Title your plot, label
your axes, and provide a legend.
- Number of threads (P > 1) versus speedup using only the largest
N you ran.
- Number of threads (P > 1) versus efficiency using only the
largest N you ran.
-
Analyze your results by answering the following
questions:
- Describe and explain how the times vary (Plot 2a) with
the number of threads.
- Describe and explain the trend in the speedup as N increases as
well as any deviations from the trend when N is smaller.
- Describe and explain the trend in the efficiency as N increases
as well as any deviations from the trend when N is smaller.
- Describe and explain the trend in the speedup as P increases.
- Describe and explain the trend in the efficiency as P increases.
Be sure to include any relevant details about the machine on which
you performed your experiments, so that your reader can understand
the context for them.
What to turn in
- Your source code in matrixop.h, matrixop.c, and
any other files developed for testing
- A single PDF containing (merged)
- Your enscripted source code files (see above; do not include any extra
files)
- A transcript of your program's compilation and test run(s)
- Plots (B.2) and analysis (B.3)
Copyright © 2012, 2014 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Footnotes:
1See Foster, Efficiency
and Speedup, or Wikipedia, Speedup