Determine some basic facts about your machine, like how much memory
it has
$ free -m
and how big its pages are
$ getconf PAGESIZE
Begin by browsing the starter code ...
matrix.h
This header file defines the type matrix_t. Note the fields
are simply an array (actually, a pointer to a memory address) *data,
and the dimensions of the matrix (rows and cols).
We will assume that the matrix is stored linearly in memory in row-major
order. This means that rows of the matrix are stored contiguously.
Thus, as we walk along the linear indices of the array, the column
index changes the fastest, while the row index changes slowest. For
instance, a 4×3 matrix has the following linear indices:
0
1
2
3
4
5
6
7
8
9
10
11
which means it is actually stored in memory as
0
1
2
3
4
5
6
7
8
9
10
11
The following macro converts row and column matrix subscript indices
to a single linear index of the array:
We could change the type of the data stored in the matrix at compile
time by changing the value defined by MTX_TYPE. To use plenty
of memory for our experiments, we'll use the unsigned long
type given here.
matrixop.c
This file implements one function, mtxAdd, that takes pointers
to two matrices (the parameters are type matrix_t*) and
stores their computed sum in a third matrix. There is also a helper
function, mtxCheckAddDim, that verifies the two matrices
have dimensions that are compatible for addition.
To compute this quickly, without doing any array indexing arithmetic,
pointers to the matrix data are used. Rather than addition, you will
write functions that compute and store matrix products. To maximize
efficiency, you will also be asked to use pointers to the matrix data.
matrixutil.h
You don't need to understand the implementation of these routines,
but the functions mtxAlloc and mtxRand will be useful.
You can read the comments in the header for descriptions.
testadd.c
A simple test program for the given matrix addition function mtxAdd.
Exercises
Part A: Multiplication and Locality
Read the Makefile, then make and run the testadd
program to verify its correctness.
Pseudocode for the matrix addition algorithm
in testadd.c looks like the following:
for i=0 to M-1
for j=0 to N-1
c[i,j] = a[i,j] + b[i,j]
Note that switching the order of the loops does not change the correctness
of the algorithm. Explain what would happen at the hardware level
(i.e., with respect to the TLB and memory cache) if the order of the
two loops were reversed. (Hint: Consider how the data is organized
in memory and how close in memory the algorithmically sequential reads
and writes are.)
Write similar pseudocode to calculate a matrix product so that the
sequential reads and writes are as contiguous as possible.
Explain which memory accesses (reads or writes) are contiguous and
why it is not possible for any other ordering to exhibit greater contiguity.
Write another matrix product pseudocode so
that the sequential reads and writes are as disparate as
possible. Explain which memory accesses are NOT contiguous and why
it is not possible for any other ordering to exhibit lesser contiguity.
Implement two functions in matrixop.c (and the header): mtxMultiplyMin
(for minimizing misses) and mtxMultiplyMax (for
maximizing misses), both of which should calculate the product of
two matrices based on the responses to your pseudocode above. Like
mtxAdd, both functions should use pointers for all
matrix data references (both reading and storing).
Tips
Keep the following things in mind as you plan your implementations:
Unless you and your partner are very sure of your pointer
programming (and debugging!) skills, you may want to write at least
one of these functions using array indices first, to make sure you've
got the right idea. (You can use the macro above to convert subscripts
to linear indices.) Once you've done that, it should be somewhat easier
to to write a version using pointers.
You do not need to initialize the pointers in the header of the for
loop; you can initialize some using your counter variables inside
loop bodies.
Write a small program to test your implementation, and use printf
liberally to verify you are accessing the values you mean to.
Part B: Performance Profiling
Write a program testmultiply.c that takes N (matrix
size) and method (min or max)
as arguments. Your program should create two N×N matrices
filled with random values (see the function mtxRand
in matrixutil.h) and multiply them using the given method.
Do not print the result.
Add this program to the Makefile.
Next, modify your program to perform the multiplication in a child
process and use the system call getrusage(2)
to query the resource usage caused by your matrix multiplication routine.
Because we don't want to include the set-up (memory allocation and
matrix initialization) in our time, use fork(2)
to create a child process that calls the appropriate multiplication
function (be sure any initialization is done in the parent process
before the fork). After the child performs the multiplication and
terminates, the parent process should use getrusage
to query the resource usage for the child. See the manual for timeradd(3)
to get details on the timeval struct used in the rusage
struct.
Your program should report:
The total user time (CPU time spent in user mode, in seconds)
The total system time (CPU time spent in kernel mode, in seconds)
The total CPU time (user + system time, in seconds)
tab separated on a single line. For example:
$ ./testmultiply 128 min
0.012 0 0.012
$ ./testmultiply 512 max
1.76011 0.004 1.76411
Note that the times are reported as integer second and microsecond
times. Use printf to print these these as decimal seconds.
Be sure to print a sufficient number of digits.
Tip:One nice way to print struct timeval
data is
printf("%3lu.%06lu",sec,usec);
Using both min and max options, run your program
for values of N = [128, 192, 256, 384, 512, 768, 1024, 1792].
Use each value of N three times. Collect the total times
and order them nicely in a text file or spreadsheet.
Note:N=1792 may take between 1-2 minutes.
Extra: If you implemented your functions efficiently
with pointers, you should be able to use N=2048 as an additional
data point. (My mtxMultiplyMax took about 4 minutes
for one run.)
If your program produces output on a single line, here is an example
of a (bash) shell command that will generate the file mtxMultiplyProfile
with lines giving the measurements in the format
N your-output
If your shell is something other than bash, you may need to modify
this slightly to meet the syntax of your shell. You can augment the
following command to include the matrix sizes you must measure times
for.
for n in 2 4 8 16 32 64
do
for ((t=0 ; t<3 ; t++))
do
echo -n $n >> mtxMultiplyProfile
./testmultiply $n min >> mtxMultiplyProfile
done
done
The leading spaces are not required; they are only helpful for visualization.
Use a spreadsheet (or another tool of your choice,
such as R, gnuplot, or MATLAB) to create the following
plots:
Matrix size N versus average total time for both
the min and max methods. Show the two plots on the same
axes with clear axis labels and a title.
Matrix size N versus the ratio of the paired
average times
Tmax(N)
Tmin(N)
for the same N. Title your plot and label your axes.
Analyze your results by answering the following
questions:
Does the expected O(N3) trend for matrix multiplication
seem to hold for the CPU times you found? Explain your answer.
Characterize and explain the difference (if any) between the two times
you plotted.
Be sure to include any relevant details about the machine on which
you performed your experiments, so that your reader can understand
the context for them.
What to turn in
Your source code in matrixop.h, matrixop.c, and
testmultiply.c
A single PDF containing (merged)
Your enscripted source code files
A transcript of your programs' compilations and test run(s)
How do you expect the results to change with other matrix types? In
particular, what happens as you keep the matrix size constant but
the data type gets smaller or bigger (i.e., short, int,long long)? Explore the performance relationships to one
another and explain your results.