CSC 213, Fall 2008 : Schedule : Lab 11

Lab 11: Threads and Synchronization for Image Restoration

Goals: You will devise a synchronization strategy for a parallel multi-threaded implementation of an algorithm for cleaning up noisy images.

Background reading:

A Brief Introduction to Iterated Conditional Modes (ICM), which provides background on the algorithm we'll be using.
pbmio.h / pbmio.c, which define a simple binary image type (Portable Bit Map), and provides routines for reading, writing, copying, and allocating.
The invert.c and pinvert.c programs, demonstrating use of the PBM type and parallel (multi-threaded with POSIX threads) version.
The single-threaded ICM algorithm icm.c.

Collaboration: You will complete this lab in teams of size 1-3 of your choice.

Overview: You will explore a single-threaded version of an image restoration algorithm and try it on some noisy pictures. Then, you will devise a multi-threaded version of the algorithm that requires synchronization between the threads.

Part A

Copy the library routines pbmio.h / pbmio.c so that you can use these routines to help you load and copy images.
Copy the ICM implementation icm.c and its Makefile to your directory. Read and compile the code, being sure that you understand it.
Copy the two sets of images (dilbert* and palisade*) from here. The file imageNN.pbm indicates that NN% noise has been added to the original image.pbm

Run icm on each of these (2*5=10) noisy images and inspect the output results using an image viewer.
Make observations and comment briefly on your results. For instance, how does the performance change with the level of noise? If you varied α and β, what impact does this have on the output?
Now you will devise a parallel implementation of the algorithm. The introduction notes that, for a given iteration, all updates can be made synchronously. Therefore, with sufficient processing power, each iteration could be O(1), rather than O(N), the size of the image.

There are several ways to partition the problem into a parallel implementation (the example in pinvert.c is one), but there are a few requirements:
- You may create the worker threads only once, at the outset of the algorithm.
- The previous requirement will necessitate synchronization between the threads. Semaphores are probably the most straightforward way to do this, though you may propose alternative modes to the instructor, who may or may not approve them.
- The complexity of your synchronization strategy should not depend on the number of threads. In other words, the number of semaphores required should be constant.
You are encouraged to use a 1D decomposition (as done in pinvert.c), as it is much simpler.
Explain what type of partioning strategy is most appropriate: data or functional.
Write a complete description of a parallel version of the algorithm (in English or pseudocode). Make sure it is clear what each thread does and be sure to describe any shared data and the synchronization technique.

Note: You do not need provide details of the ICM algorithm itself. You need only describe when an ICM update happens and which portion(s) of the image is affected.

(Hint: One nice solution to this problem extends the "barrier" pattern so that it is reusable. In other words, it can be used within a loop. The key is to create, in addition to a turnstile that ensures all threads rendezvous at the beginning of a section of code, another turnstile that syncs them at the end of the looped code. This prevents one thread from "lapping" another by getting ahead one iteration. While the first "in" turnstile is only unlocked once all threads have arrived (by incrementing and checking a variable), the second "out" turnstile is only unlocked when all threads have completed (by decrementing and checking the same variable.)
Give an informal "proof" explaining why your algorithm will exhibit proper synchronization (threads must wait for updates to data they depend on) and will not deadlock.

Part B

Implement a multi-threaded version of your parallel algorithm picm using POSIX threads. Like icm this program should also accept an input and output image. The parallel and single-thread versions should produce the same results on all images. (The command diff(1) will tell you whether two files are the same.)
Use the command time(1) to measure the time usage of both the single and multi-thread programs (with 1, 2, 4, 16, and 512 threads) for one of the palisade images. (Type man 1 time and be sure you understand the output of the command.) You should run this on a multicore machine (those in the OS lab are). Report these times and answer the following questions.
1. How many cores does your machine have?
2. How does the real time compare between the single threaded and multi-threaded version when the number of threads is no more than the number of cores? More than?
3. How does the user time compare between the single threaded and multi-threaded version when the number of threads is no more than the number of cores?
4. What does this tell you about the meaning of the user time?
5. What happens to all three times as the number of threads grows much larger than the number of cores?
6. What is the relative change of the three time categories? Does one category change proportionally more than another? Explain.
7. Based on this, why might it be important to tune your algorithm/implementation to the computer it is being run on?

Work To Be Turned In

Due Friday, 11/21

Answers to questions A.4, A.6, A.7, and A.8.

Due Wed, 11/26

Code and test runs for part B.1.
Answers to question B.2.

Jerod Weinman

Created on June 12, 2008
Last modified August 13, 2008