Lab: Pyramids and Wavelets

CSC 295 - Computer Vision - Weinman

Summary:: You will use a Laplacian pyramid for image compression and explore the steerable pyramid representation.

Deliverables

The Matlab script used to make your comparisons and generate all figures
(5 points) Laplacian pyramid image (A.5)
(10 points) Histogram of Laplacian high-pass coefficients (B.5)
(10 points) Observations of high-pass coefficient histogram (B.5)
(10 points) Compression amount and observations (C.7)
(10 points) Reconstructed original and compressed images (D.3)
(10 points) Commentary on reconstructed compressed image (D.4)
(10 points) RMS errors and commentary (D.5, D.6)
(5 points) Steerable pyramid image and high-pass band (E.5, E.8)
(10 points) Steerable pyramid high-pass band observations (E.8)
(10 points) Steerable pyramid observations (E.9)
(10 points) Professionalism of write-up

Preparation

Load the venerable cameraman image and convert it to doubles for processing.

Exercises

A. Building a Pyramid

The command

: [PYR, INDICES] = buildLpyr(IM, HEIGHT);

from the pyramid toolbox builds a Laplacian pyramid of image IM having HEIGHT levels (including the low-pass band). The result PYR is a single 1-D vector containing the entire image pyramid, and INDICES is a HEIGHT×2 array giving the sizes of the images in the PYR vector.

What should the value of the indices be for the cameraman image when the pyramid height is 4? (Note that you will have to know the size of the image.)
How long should the pyramid vector be?
Use buildLpyr to construct a 4 level Laplace pyramid of the cameraman image and verify your answers to the previous two questions.
Note: You do not need to use all-caps variable names. This a common style for documentation only, not actual Matlab code.
The toolbox also features ways to extract the image corresponding to an individual levels from the pyramid, calculate the indices into the pyramid vector, and display the entire pyramid. The command

showLpyr(PYR, INDICES)

renders the pyramid on the current figure. Use this to display your Laplacian pyramid.
Use print to save the rendered version of your pyramid.

B. Taking a Pyramid Apart

Which row in the pyramid indices corresponds to the low-pass band? How big is this image? How many pixels does it have?
The last N entries in the pyramid vector, where N is the size (in pixels) of the low-pass band, correspond to the low-pass band itself. Use this information to create two separate vectors from your pyramid. One containing only the entries in the low-pass band, and the other containing all of the rest. The latter represents all of the high-pass bands of the various scales.
What do you expect the histogram of the high-pass bands' values to look like?
Use hist to create a histogram of your high-pass band vector. You will likely want to use more than the default number of bins.
Save your histogram using print. Does it have the shape you expect? Explain why or why not.

C. Compressing the Pyramid

In image compression, it is often useful to only keep the strongest visual responses. This typically means ignoring small changes because they are difficult to see. What does this mean for the Laplacian pyramid? Getting rid of values that are sufficiently close to zero by making them actually zero.

The first step in eliminating values is figuring out what "sufficiently close to zero" means. Thus, it is useful to order the pyramid coefficients by their magnitude (absolute value). Use abs and sort to get a sorted version of your high-pass band coefficients' magnitude.
What do these values look like when sorted? Use plot to investigate.
Let us discard 80% of our coefficients. That means we need to find the value that is 80% of the way through our sorted vector to use as a threshold. Find this threshold. What is it?
Hint: The length and round commands may be helpful.
Use your threshold to set to zero any high-pass band coefficient whose magnitude is less than the threshold.
Note: Think carefully! This will include both positive and negative numbers.
Hint: You may wish to recall some fancy indexing work you did in the first Matlab lab.
Use the command whos to determine how many bytes your high-pass band vector occupies. Note that this should be 8 bytes for every double.
Matlab has a built-in sparse vector representation. Rather than storing every value, it stores only the non-zero values with an index that says where they are. Use the command

sparseVec = sparse(fullVec);

to transform your thresholded high-pass band vector into a sparse version. (You should give it a new name).
Use the command whos to determine how many bytes your new sparse vector occupies. How do the two compare? As a ratio, how much space did you save? Is it as much as your method of choosing a threshold would suggest? Why or why not?

D. Reconstructing the Pyramid

Of course, it is one thing to compress an image, but another to make sure the compression preserves meaningful structure. We should probably reconstruct the image from the pyramid to see how it compares. The command

: IM = reconLpyr(PYR, INDICES);

inverts the buildLpyr operation by reconstructing the image from the pyramid.

Take your thresholded high-pass band coefficients (the full version, not the sparse version) and concatenate them with the low-pass band vector you separated in B.2. (Be sure you put the low-pass band at the end as in the original.)
Use the command reconLpyr to reconstruct the image from your compressed pyramid representation. As a baseline, you should also reconstruct the image from the original, unmodified Laplacian pyramid representation.
Display and save both images.
How does the reconstructed version look? Is it reasonable? Where is the reconstruction strongest? Where does it deviate most significantly from the original? Why?
Beyond our qualitative visual impressions, it is also useful to have a quantitative measure of reconstruction. One possible metric is called the root mean-square (RMS) error, so called because it is the square root of the average squared difference between a true value x_i and a corresponding estimated value y_i:

RMS=( 1
N
N
∑
i=1
(x_i - y_i)²)^[1/2]

Compute the RMS error between the original image and both reconstructed versions (the thresholded and non-thresholded).
How do the RMS errors compare? How many 8-bit gray levels does the average error correspond to? Using the results from your image formation lab, how does this correspond to the average and worst-case noise of our cameras? Does this seem tolerable?

E. Steerable Pyramids

We mentioned above that there are a few varieties of steerable pyramids. Some use both even and odd basis functions (so that the resulting coefficients are complex numbers), while another uses only even functions. For simplicity, we will use the latter for now, avoiding complex numbers. The command

: [PYR, INDICES] = buildSFpyr(IM, HEIGHT, ORDER);

builds a steerable pyramid representation of HEIGHT levels, excluding the low-pass band where ORDER is one less than the number of orientations used.

What should the value of the indices be for the cameraman image when the pyramid height is 3? (Note that you will have to know the size of the image.)
How long should the pyramid vector be?
Use buildSFpyr to build a 3 level steerable pyramid with 4 orientations from the cameraman and verify your answers to the previous two questions.
The command

showSpyr(PYR, INDICES)

renders the pyramid on the current figure. Use this to display your steerable pyramid. Note that it omits one band.
Save your pyramid using print.
You can retrieve the image for a particular band number (as ordered in the pyramid's INDICES matrix) using the command

BAND = pyrBand(PYR, INDICES, NUM);

Use this to extract the (non-oriented) high-pass band that is missing from the original display.
What do you expect it to look like?
Diplay and save the high-pass band image for completeness. Does it meet your expectations? Why or why not?
Describe the filter responses in your steerable pyramid. What structures stand out for each orientation? Where are the fine scale responses strongest? Where are the coarse scale responses strongest?

Acknowledgments

We gratefully acknowledge use of Eero Simoncelli's Steerable Pyramid tools for Matlab.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.