CSC 213, Fall 2008 : Schedule : Lab 11
Goals: You will devise a synchronization strategy for a parallel multi-threaded implementation of an algorithm for cleaning up noisy images.
Background reading:
Collaboration: You will complete this lab in teams of size 1-3 of your choice.
Overview: You will explore a single-threaded version of an image restoration algorithm and try it on some noisy pictures. Then, you will devise a multi-threaded version of the algorithm that requires synchronization between the threads.
Copy the two sets of images (dilbert* and palisade*) from here. The file imageNN.pbm indicates that NN% noise has been added to the original image.pbm
Run icm on each of these (2*5=10) noisy images and inspect the output results using an image viewer.
Now you will devise a parallel implementation of the algorithm. The introduction notes that, for a given iteration, all updates can be made synchronously. Therefore, with sufficient processing power, each iteration could be O(1), rather than O(N), the size of the image.
There are several ways to partition the problem into a parallel implementation (the example in pinvert.c is one), but there are a few requirements:
pinvert.c), as it is much simpler.
Explain what type of partioning strategy is most appropriate: data or functional.
Write a complete description of a parallel version of the algorithm (in English or pseudocode). Make sure it is clear what each thread does and be sure to describe any shared data and the synchronization technique.
Note: You do not need provide details of the ICM algorithm itself. You need only describe when an ICM update happens and which portion(s) of the image is affected.
(Hint: One nice solution to this problem extends the "barrier" pattern so that it is reusable. In other words, it can be used within a loop. The key is to create, in addition to a turnstile that ensures all threads rendezvous at the beginning of a section of code, another turnstile that syncs them at the end of the looped code. This prevents one thread from "lapping" another by getting ahead one iteration. While the first "in" turnstile is only unlocked once all threads have arrived (by incrementing and checking a variable), the second "out" turnstile is only unlocked when all threads have completed (by decrementing and checking the same variable.)
Implement a multi-threaded version of your parallel algorithm picm using POSIX threads. Like icm this program should also accept an input and output image. The parallel and single-thread versions should produce the same results on all images. (The command diff(1) will tell you whether two files are the same.)
How many cores does your machine have?
How does the real time compare between the single threaded and multi-threaded version when the number of threads is no more than the number of cores? More than?
How does the user time compare between the single threaded and multi-threaded version when the number of threads is no more than the number of cores?
What does this tell you about the meaning of the user time?
What happens to all three times as the number of threads grows much larger than the number of cores?
What is the relative change of the three time categories? Does one category change proportionally more than another? Explain.
Based on this, why might it be important to tune your algorithm/implementation to the computer it is being run on?