Lab: Pyramids and Wavelets
CSC 262 - Computer Vision - Weinman
- Summary:
- You will use a Laplacian pyramid for image compression
and explore the steerable pyramid representation.
Deliverables
- The Matlab script used to make your comparisons and generate your
report
- (5 points) Laplacian pyramid image (A.4)
- (10 points) Histogram of Laplacian high-pass coefficients (B.4)
- (10 points) Observations of high-pass coefficient histogram (B.5)
- (10 points) Compression amount and observations (C.7)
- (10 points) Reconstructed original and compressed images (D.3)
- (10 points) Commentary on reconstructed compressed image (D.4)
- (10 points) RMS errors and commentary (D.5,
D.6)
- (5 points) Steerable pyramid image and high-pass band (E.4,
E.7)
- (5 points) Steerable pyramid high-pass band observations (E.7)
- (10 points) Steerable pyramid observations (E.8)
- (10 points) Professionalism of write-up
Extras
- (5 points) Steerable pyramid reconstruction prediction (Extra 1)
- (5 points) Steerable pyramid reconstruction image and analysis (Extra
3)
Preparation
Load the venerable cameraman image and convert it to doubles for processing.
Exercises
A. Building a Pyramid
The command
-
[pyrValues, pyrDims] = buildLpyr(img, height);
from the pyramid toolbox builds a Laplacian pyramid of image
img
having
height levels (including the low-pass band).
The result
pyrValues is a single 1-D vector containing
the entire image pyramid, and
pyrDims is a
height×2
array giving the sizes of the images in the
pyrValues
vector
.
- What should the value of the dimensions be for the cameraman image
when the pyramid height is 4? (Note that you will have to know the
size of the image.)
- How long should the pyramid vector be?
- Use buildLpyr to construct a 4 level Laplacian pyramid of
the cameraman image and verify your answers to the previous two questions.
-
The toolbox also features ways to extract the
image corresponding to an individual levels from the pyramid, calculate
the indices into the pyramid vector, and display the entire pyramid.
The command
-
showLpyr(pyrValues, pyrDims)
renders the pyramid on the current figure. Use this to display your
Laplacian pyramid.
B. Taking a Pyramid Apart
- Which row in the pyramid dimensions corresponds to the low-pass band?
How big is this image? How many pixels does it have?
-
The last N entries in the pyramid vector,
where N is the size (in pixels) of the low-pass band, correspond
to the low-pass band itself. Use this information to create two separate
vectors from your pyramid. One containing only the entries
in the low-pass band, and the other containing all of the rest. The
latter represents all of the high-pass bands of the various scales.
Hint: You should probably use length and add the two
vectors' dimensions to make sure you get back the length of the original
pyramid values.
- What do you expect the histogram of the high-pass bands' values to
look like?
-
Use hist to create a histogram of your
high-pass band vector. You will likely want to use more than the default
number of bins.
-
Does it have the shape you expect? Explain
why or why not.
C. Compressing the Pyramid
In image compression, it is often useful to keep only the strongest
visual responses. This typically means ignoring small changes because
they are difficult to see. What does this mean for the Laplacian pyramid?
Getting rid of values that are sufficiently close to zero by making
them
actually zero.
- The first step in eliminating values is figuring out what "sufficiently
close to zero" means. Thus, it is useful to order the pyramid coefficients
by their magnitude (absolute value). Use abs and sort
to get a sorted version of your high-pass band coefficients' magnitude.
- What do these values look like when sorted? Use plot to investigate.
-
Let us discard 80% of our coefficients.
That means we need to find the value that is 80% of the way through
our sorted vector to use as a threshold. Find this threshold. What
is it?
Hint: The length and round commands may be
helpful.
- Use your threshold to set to zero any high-pass band coefficient whose
magnitude is less than the threshold.
Note: Think carefully! This will include both positive and
negative numbers.
Hint: You may wish to recall some fancy indexing work you did
in the first Matlab lab.
- Use the command whos to determine how many bytes your high-pass
band vector occupies. Note that this should be 8 bytes for every double.
- Matlab has a built-in sparse vector representation. Rather than storing
every value, it stores only the non-zero values with an index that
says where they are. Use the command
-
sparseVec = sparse(fullVec);
to transform your thresholded high-pass band vector into a sparse
version. (You should give it a new name rather than overwrite the
old value).
-
Use the command whos to determine
how many bytes your new sparse vector occupies. How do the two compare?
As a ratio, how much space did you save? Is it as much as your method
of choosing a threshold would suggest? Why or why not?
D. Reconstructing the Pyramid
Of course, it is one thing to compress an image, but another to make
sure the compression preserves meaningful structure. We should probably
reconstruct the image from the pyramid to see how it compares. The
command
-
img = reconLpyr(pyrValues, pyrDims);
inverts the
buildLpyr operation by reconstructing the image
from the pyramid.
- Take your thresholded high-pass band coefficients (the full version,
not the sparse version) and concatenate them with the low-pass band
vector you separated in B.2. (Be sure you put
the low-pass band at the end as in the original.)
- Use the command reconLpyr to reconstruct the image from your
compressed pyramid representation. As a baseline, you should also
reconstruct the image from the original, unmodified Laplacian pyramid
representation.
-
Display both images.
-
How does the reconstructed version look?
Is it reasonable? Where is the reconstruction strongest? Where does
it deviate most significantly from the original? Why?
-
Beyond our qualitative visual impressions,
it is also useful to have a quantitative measure of reconstruction.
One possible metric is called the root mean-square (RMS) error, so
called because it is the square root of the average squared difference
between a true value xi and a corresponding estimated value
yi:
|
RMS=( |
1
N
|
|
N ∑
i=1
|
(xi − yi)2)[1/2] |
|
Compute the RMS error between the original image and both reconstructed
versions (the thresholded and non-thresholded).
-
How do the RMS errors compare? How many 8-bit
gray levels does the average error correspond to? Using the results
from your image formation lab, how does this correspond to the average
and worst-case noise of our cameras? Does this seem tolerable?
E. Steerable Pyramids
We mentioned above that there are a few varieties of steerable pyramids.
Some use both even and odd basis functions (so that the resulting
coefficients are complex numbers), while another uses only even functions.
For simplicity, we will use the latter for now, avoiding complex numbers.
The command
-
[pyrValues, pyrDims] = buildSFpyr(img, height, order);
builds a steerable pyramid representation of
height
levels,
excluding the low-pass band where
order
is
one less than the number of orientations used.
- What should the values of the dimensions be for the cameraman image
when the pyramid height is 3? (Note that you will have to know the
size of the image.)
- How long should the pyramid values vector be?
- Use buildSFpyr to build a 3 level steerable pyramid with
4 orientations from the cameraman and verify your answers to the previous
two questions.
-
The command
-
showSpyr(pyrValues, pyrDims)
renders the pyramid on the current figure. Use this to display your
steerable pyramid. Note that it omits one band.
- You can retrieve the image for a particular band number (as ordered
in the pyramid's pyrDims matrix) using the command
-
bandImage = pyrBand(pyrValues, pyrDims, bandNum);
Use this to extract the (non-oriented) high-pass band that is missing
from the original display.
- What do you expect it to look like?
-
Diplay and inspect the high-pass band image
for completeness. Does it meet your expectations? Why or why not?
- Describe the filter responses in your steerable pyramid. What structures
stand out for each
orientation? Where are the
fine scale responses strongest? Where are the coarse scale responses
strongest?
Extras: Pyramid Tweaks
-
Suppose you zeroed out the coefficients
of all the horizontal and vertical oriented bands. What do you expect
the image to look like? Why?
- You can extract the specific indices for a band using the command.
-
bandIndices = pyrBandIndices(pyrDims, bandNum)
Use this to set the pyramid values for the horizontally- and vertically-oriented
bands at all scales to zero.
-
Display the resulting reconstructed image
(use reconSFpyr). In what ways does it meet or not meet your
explanations? Explain the results you do see (i.e., how it differs
from the original).
Acknowledgments
We gratefully acknowledge use of Eero Simoncelli's
Steerable
Pyramid tools for Matlab.