Lab: Parallelized Image Restoration
CSC 213 - Operating Systems and Parallel Algorithms - Weinman
- Summary:
- You will devise a synchronization strategy for a parallel
multi-threaded implementation of an algorithm for cleaning up noisy
images.
- Assigned:
- Tuesday 30 October
- Due:
- This assignment will be due in two parts.
- Part A
- 5:15 PM Monday 10 November
- Parts B,C
- 11:30 PM Monday 17 November
- Objectives:
-
- Experience parallelizing sequential code in a simple, real application.
- Practice using Foster's PCAM framework.
- Develop a new semaphore pattern to refine synchronization skills.
- Improve analytical writing through revision.
- Reading:
- A Brief Introduction to Iterated
Conditional Modes (ICM), which provides background on the algorithm
we'll be using.
Preliminaries
- Do this laboratory on a MathLAN workstation in 3819 or 3815.
- Copy the starter files to your own directory:
-
$ cp -R ~weinman/public_html/courses/CSC213/2014F/labs/code/icm ~/somewhere
- Skim bitmap.h so you know what library routines are available
for a simple binary image type bm_t
- Read invert.c, which demonstrates the use of bm_t
for inverting an image (i.e., flipping all the bits).
- Read icm.c and be sure you understand how it works in comparison
to the reading.
- Compile the icm program using the Makefile.
-
$ make icm
- Investigate the contents of the data directory using an image
viewer such as eog. The file imageNN.pbm indates
that N% noise has been added to the original image.pbm.
- Run icm on just a few of these (2*5=10) noisy images and
inspect the output results. You could also experiment with different
values for alpha and beta to obtain different results.
- Inspect the task partitioning and multi-threading in pinvert.c.
- Build pinvert using make. (Read the Makefile).
Exercises
Part A: Algorithm design
Now you will devise a parallel implementation of the ICM algorithm.
With sufficient processing power, each iteration could take constant
time, rather than time proportional to the size of the image.
Assume that your algorithm will run as threads on a shared memory
multiprocessor (e.g., a MathLAN workstation). There are several ways
to partition the problem into a parallel implementation (the example
in pinvert.c, which is likely similar to your prior parMtxMultiply,
is one partitioning), but your parallel implementation must meet requirements:
- You may create worker threads only once, at the outset of the
algorithm.
- The previous requirement will necessitate synchronization between
the threads. Mutex variables and semaphores are probably the most
straightforward way to do this, though you may propose alternative
modes to the instructor, who may or may not approve them.
- The complexity of your synchronization strategy should not depend
on the number of threads. In other words, the number of semaphores
required should be constant.
Write a design paper that addresses the following:
- Explain what type of partitioning strategy is most appropriate:
domain or functional decomposition.
- Explain what communication is required, bearing in mind the
partitioning and the execution environment.
- Give a strategy for agglomeration and mapping. What
issues should be considered?
- Write a complete description of your parallelized algorithm in pseudocode,
with appropriate English explanations. Clearly describe what each
thread does and any shared data or synchronization constructs. Note:
You do not need to explain the ICM algorithm; just explain your parallelization.
- Explain and justify (i.e., prove) why your algorithm will exhibit
proper synchronization (threads wait for data they depend on) and
will not deadlock. You should explain what variables are used and
what the purpose of each is, as well as the interleaving properties
of the threads as they relate to the synchronizations. (For examples,
consult Downey, your prior
lab, and any feedback on it.)
Hint: One nice solution to this problem extends the
barrier pattern so that it is reusable.
In other words, the barrier can be used within a loop. The key is
to create, in addition to a turnstile that ensures all threads rendezvous
at the beginning of a section of code, another turnstile that
syncs them at the end of the looped code. This prevents one
thread from "lapping" another by getting ahead one iteration.
While the first "in" turnstile is only unlocked once all threads
have arrived (by incrementing and testing a variable), the second
"out" turnstile is only unlocked when all threads have completed
(by decrementing and testing the same variable.)
Part B: Implementation
Begin the multi-threaded version (using POSIX threads) of your parallel
algorithm by copying the original icm.c
-
$ cp icm.c picm.c
modifying the existing functions, and introducing any new functions
necessary. Cite and attribute the source of your new file. As you
proceed, clearly separate which portions are exclusively yours, which
are faithful to the original, and which you have modified.
Compile the picm program using the Makefile (which
should already have a line for it).
-
$ make picm
Like icm, your program picm should accept the filenames
of an input and an output image. It should also take as a command-line
parameter the number of threads to create.
Preserve the wall clock timer that measures only algorithm time (disk
I/O dominates the overall time).
The parallel and single-thread versions should produce the same results
on all images. To assert this, the command diff(1)
will tell you whether two files are the same. If you want to create
an image that shows you the difference between two images (rather
than just whether there is a difference), you can use the handy program
compare(1)
-
$ compare image1.pbm image1.pbm differences.png
which will overlay the differences in red pixels in output images
differences.png on top of a matted version of image1.pbm.
For a simple binary output, use differences.pbm, which
will be black where pixels differ and white where they're the same.
Part C: Performance evaluation
Revise your design from part A to include any changes you made and
any other feedback.
Integrate your revised design with an evaluation that presents the
behavior of your algorithm, including the following elements.
- Setup
- Note the experimental setup, including the number of cores
on your machine, image size, algorithm parameters, etc.
- Experiment
- Record the times used by both the single- and multi-threaded
programs (with 1, 2, 3, and 4 threads) for one of the noisy palisade
images.
- Results
- Present the timing results in a nicely formatted table.
What to turn in
- Part A
- A single PDF of your algorithm design.
- Parts B and C
-
- Your source code in picm.c
- A single PDF containing (merged)
- Your enscripted source code
- A transcript of your program's compilation and test run(s), i.e.,
-
$ make clean icm picm ; rm out*.pbm
$ ./icm in.pbm out.pbm
$ ./picm in.pbm out1.pbm 1
$ ./picm in.pbm out2.pbm 2
$ ./picm in.pbm out3.pbm 3
$ ./picm in.pbm out4.pbm 4
$ diff out.pbm out1.pbm
$ diff out.pbm out2.pbm
$ diff out.pbm out3.pbm
$ diff out.pbm out4.pbm
- Your revised design from Part A with evaluation (i.e., setup and results)
Acknowledgments
Janet Davis contributed notable improvements to the text. The public
domain Palisade
image is by the Pearson Scott Foresman company. The Dilbert
image is Copyright 2008 Scott Adams, Inc.
Copyright © 2012, 2014 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.