Lab: Parallelized Image Restoration

CSC 213 - Operating Systems and Parallel Algorithms - Weinman

Summary:

You will devise a synchronization strategy for a parallel multi-threaded implementation of an algorithm for cleaning up noisy images.

Assigned:

Tuesday 30 October

Due:

This assignment will be due in two parts.

Part A: 5:15 PM Monday 10 November
Parts B,C: 11:30 PM Monday 17 November

Objectives:

Experience parallelizing sequential code in a simple, real application.
Practice using Foster's PCAM framework.
Develop a new semaphore pattern to refine synchronization skills.
Improve analytical writing through revision.

Reading:

A Brief Introduction to Iterated Conditional Modes (ICM), which provides background on the algorithm we'll be using.

Preliminaries

Do this laboratory on a MathLAN workstation in 3819 or 3815.
Copy the starter files to your own directory:

$ cp -R ~weinman/public_html/courses/CSC213/2014F/labs/code/icm ~/somewhere
Skim bitmap.h so you know what library routines are available for a simple binary image type bm_t
Read invert.c, which demonstrates the use of bm_t for inverting an image (i.e., flipping all the bits).
Read icm.c and be sure you understand how it works in comparison to the reading.
Compile the icm program using the Makefile.

$ make icm
Investigate the contents of the data directory using an image viewer such as eog. The file imageNN.pbm indates that N% noise has been added to the original image.pbm.
Run icm on just a few of these (2*5=10) noisy images and inspect the output results. You could also experiment with different values for alpha and beta to obtain different results.
Inspect the task partitioning and multi-threading in pinvert.c.
Build pinvert using make. (Read the Makefile).

Exercises

Part A: Algorithm design

Now you will devise a parallel implementation of the ICM algorithm. With sufficient processing power, each iteration could take constant time, rather than time proportional to the size of the image.

Assume that your algorithm will run as threads on a shared memory multiprocessor (e.g., a MathLAN workstation). There are several ways to partition the problem into a parallel implementation (the example in pinvert.c, which is likely similar to your prior parMtxMultiply, is one partitioning), but your parallel implementation must meet requirements:

You may create worker threads only once, at the outset of the algorithm.
The previous requirement will necessitate synchronization between the threads. Mutex variables and semaphores are probably the most straightforward way to do this, though you may propose alternative modes to the instructor, who may or may not approve them.
The complexity of your synchronization strategy should not depend on the number of threads. In other words, the number of semaphores required should be constant.

Write a design paper that addresses the following:

Explain what type of partitioning strategy is most appropriate: domain or functional decomposition.
Explain what communication is required, bearing in mind the partitioning and the execution environment.
Give a strategy for agglomeration and mapping. What issues should be considered?
Write a complete description of your parallelized algorithm in pseudocode, with appropriate English explanations. Clearly describe what each thread does and any shared data or synchronization constructs. Note: You do not need to explain the ICM algorithm; just explain your parallelization.
Explain and justify (i.e., prove) why your algorithm will exhibit proper synchronization (threads wait for data they depend on) and will not deadlock. You should explain what variables are used and what the purpose of each is, as well as the interleaving properties of the threads as they relate to the synchronizations. (For examples, consult Downey, your prior lab, and any feedback on it.)

Hint: One nice solution to this problem extends the barrier pattern so that it is reusable. In other words, the barrier can be used within a loop. The key is to create, in addition to a turnstile that ensures all threads rendezvous at the beginning of a section of code, another turnstile that syncs them at the end of the looped code. This prevents one thread from "lapping" another by getting ahead one iteration. While the first "in" turnstile is only unlocked once all threads have arrived (by incrementing and testing a variable), the second "out" turnstile is only unlocked when all threads have completed (by decrementing and testing the same variable.)

Part B: Implementation

Begin the multi-threaded version (using POSIX threads) of your parallel algorithm by copying the original icm.c

: $ cp icm.c picm.c

modifying the existing functions, and introducing any new functions necessary. Cite and attribute the source of your new file. As you proceed, clearly separate which portions are exclusively yours, which are faithful to the original, and which you have modified.

Compile the picm program using the Makefile (which should already have a line for it).

: $ make picm

Like icm, your program picm should accept the filenames of an input and an output image. It should also take as a command-line parameter the number of threads to create.

Preserve the wall clock timer that measures only algorithm time (disk I/O dominates the overall time).

The parallel and single-thread versions should produce the same results on all images. To assert this, the command diff(1) will tell you whether two files are the same. If you want to create an image that shows you the difference between two images (rather than just whether there is a difference), you can use the handy program compare(1)

: $ compare image1.pbm image1.pbm differences.png

which will overlay the differences in red pixels in output images differences.png on top of a matted version of image1.pbm. For a simple binary output, use differences.pbm, which will be black where pixels differ and white where they're the same.

Part C: Performance evaluation

Revise your design from part A to include any changes you made and any other feedback.

Integrate your revised design with an evaluation that presents the behavior of your algorithm, including the following elements.

Setup: Note the experimental setup, including the number of cores on your machine, image size, algorithm parameters, etc.
Experiment: Record the times used by both the single- and multi-threaded programs (with 1, 2, 3, and 4 threads) for one of the noisy palisade images.
Results: Present the timing results in a nicely formatted table.

What to turn in

Part A

A single PDF of your algorithm design.

Parts B and C

Your source code in picm.c
A single PDF containing (merged)
- Your enscripted source code
- A transcript of your program's compilation and test run(s), i.e.,
$ make clean icm picm ; rm out*.pbm
$ ./icm in.pbm out.pbm
$ ./picm in.pbm out1.pbm 1
$ ./picm in.pbm out2.pbm 2
$ ./picm in.pbm out3.pbm 3
$ ./picm in.pbm out4.pbm 4
$ diff out.pbm out1.pbm
$ diff out.pbm out2.pbm
$ diff out.pbm out3.pbm
$ diff out.pbm out4.pbm
- Your revised design from Part A with evaluation (i.e., setup and results)

Acknowledgments

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.