Lab: Threads and Multicore Locality
CSC 213 - Operating Systems and Parallel Algorithms - Weinman
- Summary:
- Test how the the principle of locality applies to parallel
programs.
- Assigned:
- Tuesday 9 October
- Due:
- 10:30 PM Monday 15 October
- Objectives:
-
- Introduce data-based problem decomposition
- Become familiar with launching threads in the POSIX pthreads
environment
- Explore locality effects in multi-threaded programs on multicore processors
- Resource:
- Blaise Barney's POSIX
Threads Programming, sections 1-5.
- Collaboration:
- You will complete this lab in teams
as assigned by the instructor.
Preliminaries
- You will build on your mtxMultiplyMin and mtxMultiplyMax
implementations from the previous lab. If your previous partner has
the files you wish to derive your solutions from, make arrangements
to copy them before the lab session. Make sure you attribute the
source of any code your solutions to this lab are derived from.
- Do this laboratory on a MathLAN workstation in 3819 or 3815.
- Determine the number of processor cores in your machine (as well as
any other details you find interesting)
-
$ cat /proc/cpuinfo
Exercises
Part A: Threading Multiplication
- In matrixop.h, create a struct, called mtxThreadParam_t
that will contain the parameters needed for a threaded matrix product.
It should include pointers to the two multiplicands, the product,
as well as the particular thread number and the total number of threads
over which the product is distributed.
- In matrixop.c, add the function
-
void* threadMtxMultiplyMin(void *multParam)
which performs the (faster) multiplication work for a single thread;
it is the analog of mtxMultiplyMin. It will take
*mtxThreadParam_t as an argument and return nothing.
While there are several ways to divy up the work, the easiest approach
is to start the outermost loop (whether for a row or column) at the
particular thread index and set the loop stride to the number of threads.
For example, if there are a total of two threads, each will do the
work for every other column (row, respectively); if three threads,
each will do every third column (row, respectively). The inner loops
would use a unit stride, as in the unthreaded versions.
- Add the corresponding function threadMtxMultiplyMax,
which performs the slower multiplication work; it is the analog of
mtxMultiplyMax.
- Add a function parMtxMultiply that computes the
threaded, concurrent (and hopefully parallel) matrix product. In addition
to the matrix pointer arguments, the function should take the number
of threads to use and which method (max or min). Your function should
- allocate a parameter struct for each thread,
- initialize each thread and explicitly make it joinable
- print an appropriate message to stderr and exit with an appropriate
status if any thread cannot be created, and
- wait for all threads to complete before returning.
For good additional examples, see Example
2 - Thread Argument Passing and Example
Code - Pthread Joining in the POSIX Threads Programming
tutorial.
- Write a program ptestmultiply.c that takes N (matrix
size), method (min or max),
and a positive number of threads as arguments. Your program should
create two N×N matrices filled with random values, and multiply
them using the given method with the specified number of threads.
The final program should not print the result, but you may want to
in an intermediate version for testing.
Add this program to your Makefile.
Part B: Performance Profiling
- Finally, modify your program to print the elapsed wall-clock time
(in decimal seconds) for only the call to parMtxMultiply
(that is, exclude any initialization from the timing). You can do
this easily using a combination of gettimeofday(2)
and timersub(3).
For example:
-
$ ./ptestmultiply 128 min 2
0.008150
$ ./ptestmultiply 2048 max 4
58.491341
- Using both min and max options, run your program
for values of N = [128, 192, 256, 384, 512, 768, 1024, 1792]
with 1-4 threads. Collect the total times and order them nicely in
a text file or spreadsheet.
If you implemented your functions efficiently with pointers, you should
be able to use N=2048 as well.
-
Use a spreadsheet (or another tool of your choice,
such as R, gnuplot, or MATLAB) to create the following
plots:
-
Matrix size N versus total time for both
the min and max methods and varying numbers of threads.
You may collect the various plots into axes in the way you deem most
sensible for analysis.
-
Matrix size N versus the speedup1
where T is the total time and P is the number of threads. Show
the plots for P > 1 on the same axes.
- Matrix size N versus the efficiency
Show the plots for for P > 1 on the same axes.
- Number of threads (P > 1) versus speedup for both the min
and max methods using only the largest N you ran.
- Number of threads (P > 1) versus efficiency for both the min
and max methods using only the largest N you ran.
-
Analyze your results by answering the following
questions:
- How do the times vary (Plot 3a) between the min
and max methods and the number of threads? Explain any similarities
and differences you see.
- Describe and explain what trends (if any) you see in the speedup as
N increases.
- Describe and explain what trends (if any) you see in the efficiency
as N increases.
- Describe and explain what trends (if any) you see in the speedup as
P increases.
- Describe and explain what trends (if any) you see in the efficiency
as P increases.
Be sure to include any relevant details about the machine on which
you performed your experiments, so that your reader can understand
the context for them.
What to turn in
- Your source code in matrixop.h, matrixop.c, and
ptestmultiply.c
- A single PDF containing (merged)
- Your enscripted source code files
- A transcript of your program's compilation and test run(s)
- A unified writeup with plots (B.3) and analysis (B.4)
Copyright © 2012 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Footnotes:
1See Foster, Efficiency
and Speedup, or Wikipedia, Speedup