Load the venerable cameraman image and convert it to doubles for processing.
Exercises
A. Building a Pyramid
The command
[PYR, INDICES] = buildLpyr(IM, HEIGHT);
from the pyramid toolbox builds a Laplacian pyramid of image IM
having HEIGHT levels (including the low-pass band).
The result PYR is a single 1-D vector containing the
entire image pyramid, and INDICES is a HEIGHT×2
array giving the sizes of the images in the PYR vector.
What should the value of the indices be for the cameraman image when
the pyramid height is 4? (Note that you will have to know the size
of the image.)
How long should the pyramidvector be?
Use buildLpyr to construct a 4 level Laplace pyramid of the
cameraman image and verify your answers to the previous two questions.
Note: You do not need to use all-caps variable names. This
a common style for documentation only, not actual Matlab
code.
The toolbox also features ways to extract the image corresponding
to an individual levels from the pyramid, calculate the indices into
the pyramid vector, and display the entire pyramid. The command
showLpyr(PYR, INDICES)
renders the pyramid on the current figure. Use this to display your
Laplacian pyramid.
Use print to save the rendered version
of your pyramid.
B. Taking a Pyramid Apart
Which row in the pyramid indices corresponds to the low-pass band?
How big is this image? How many pixels does it have?
The last N entries in the pyramid vector,
where N is the size (in pixels) of the low-pass band, correspond
to the low-pass band itself. Use this information to create two separate
vectors from your pyramid. One containing only the entries in the
low-pass band, and the other containing all of the rest. The latter
represents all of the high-pass bands of the various scales.
What do you expect the histogram of the high-pass bands' values to
look like?
Use hist to create a histogram of your high-pass band vector.
You will likely want to use more than the default number of bins.
Save your histogram using print. Does
it have the shape you expect? Explain why or why not.
C. Compressing the Pyramid
In image compression, it is often useful to only keep the strongest
visual responses. This typically means ignoring small changes because
they are difficult to see. What does this mean for the Laplacian pyramid?
Getting rid of values that are sufficiently close to zero by making
them actually zero.
The first step in eliminating values is figuring out what "sufficiently
close to zero" means. Thus, it is useful to order the pyramid coefficients
by their magnitude (absolute value). Use abs and sort
to get a sorted version of your high-pass band coefficients' magnitude.
What do these values look like when sorted? Use plot to investigate.
Let us discard 80% of our coefficients.
That means we need to find the value that is 80% of the way through
our sorted vector to use as a threshold. Find this threshold. What
is it?
Hint: The length and round commands may be
helpful.
Use your threshold to set to zero any high-pass band coefficient whose
magnitude is less than the threshold.
Note: Think carefully! This will include both positive and
negative numbers.
Hint: You may wish to recall some fancy indexing work you did
in the first Matlab lab.
Use the command whos to determine how many bytes your high-pass
band vector occupies. Note that this should be 8 bytes for every double.
Matlab has a built-in sparse vector representation. Rather than storing
every value, it stores only the non-zero values with an index that
says where they are. Use the command
sparseVec = sparse(fullVec);
to transform your thresholded high-pass band vector into a sparse
version. (You should give it a new name).
Use the command whos to determine
how many bytes your new sparse vector occupies. How do the two compare?
As a ratio, how much space did you save? Is it as much as your method
of choosing a threshold would suggest? Why or why not?
D. Reconstructing the Pyramid
Of course, it is one thing to compress an image, but another to make
sure the compression preserves meaningful structure. We should probably
reconstruct the image from the pyramid to see how it compares. The
command
IM = reconLpyr(PYR, INDICES);
inverts the buildLpyr operation by reconstructing the image
from the pyramid.
Take your thresholded high-pass band coefficients (the full version,
not the sparse version) and concatenate them with the low-pass band
vector you separated in B.2. (Be sure you put
the low-pass band at the end as in the original.)
Use the command reconLpyr to reconstruct the image from your
compressed pyramid representation. As a baseline, you should also
reconstruct the image from the original, unmodified Laplacian pyramid
representation.
Display and save both images.
How does the reconstructed version look?
Is it reasonable? Where is the reconstruction strongest? Where does
it deviate most significantly from the original? Why?
Beyond our qualitative visual impressions,
it is also useful to have a quantitative measure of reconstruction.
One possible metric is called the root mean-square (RMS) error, so
called because it is the square root of the average squared difference
between a true value xi and a corresponding estimated value
yi:
RMS=(
1
N
N ∑ i=1
(xi - yi)2)[1/2]
Compute the RMS error between the original image and both reconstructed
versions (the thresholded and non-thresholded).
How do the RMS errors compare? How many 8-bit
gray levels does the average error correspond to? Using the results
from your image formation lab, how does this correspond to the average
and worst-case noise of our cameras? Does this seem tolerable?
E. Steerable Pyramids
We mentioned above that there are a few varieties of steerable pyramids.
Some use both even and odd basis functions (so that the resulting
coefficients are complex numbers), while another uses only even functions.
For simplicity, we will use the latter for now, avoiding complex numbers.
The command
[PYR, INDICES] = buildSFpyr(IM, HEIGHT, ORDER);
builds a steerable pyramid representation of HEIGHT
levels, excluding the low-pass band where ORDER
is one less than the number of orientations used.
What should the value of the indices be for the cameraman image when
the pyramid height is 3? (Note that you will have to know the size
of the image.)
How long should the pyramid vector be?
Use buildSFpyr to build a 3 level steerable pyramid with
4 orientations from the cameraman and verify your answers to the previous
two questions.
The command
showSpyr(PYR, INDICES)
renders the pyramid on the current figure. Use this to display your
steerable pyramid. Note that it omits one band.
Save your pyramid using print.
You can retrieve the image for a particular band number (as ordered
in the pyramid's INDICES matrix) using the command
BAND = pyrBand(PYR, INDICES, NUM);
Use this to extract the (non-oriented) high-pass band that is missing
from the original display.
What do you expect it to look like?
Diplay and save the high-pass band image
for completeness. Does it meet your expectations? Why or why not?
Describe the filter responses in your steerable pyramid. What structures
stand out for each orientation? Where are the
fine scale responses strongest? Where are the coarse scale responses
strongest?