Lab: Image Pyramids

CSC 262 - Computer Vision - Weinman

Summary:: In this lab, you will write a procedure to build your own Gaussian pyramid and use it to do an efficient multi-scale search.

Deliverables

Matlab script used to make your comparisons and generate your report
(10 points) Matlab function gausspyr.m (A)
(10 points) Justification for Gaussian scale (A.4)
(10 points) Commentary on detections (B.12)
(10 points) Commentary on false positives (B.13)
(20 points) Contoured flower pyramid images (B.14)
(10 points) Professionalism of write-up

Extra credit:

(10 points) Blurred/subsampled images (Extra 9)
(10 points) Blurring/subsampling observations (Extra 10)

Background

Another data type in Matlab for storing arrays of disparate types and sizes is called the "cell". Like arrays, it's an arbitrary N-dimensional structure, but the entries can be anything, rather than just numbers or characters. By anything, we mean any Matlab value: a number, a vector, a matrix, even another cell. These are created using curly braces, rather than brackets.

: A = [ 1 2 ; 3 4 ]; % Some matrix
b = [ 5 6 7 8 ]; % A vector
C = {A,b}; % A cell containing a matrix and a vector

Cells are also indexed using curly braces, instead of parentheses like regular arrays.

: C{1}      % Value in the cell's first slot
C{end}    % Value in the cell's last slot
C(1:2)    % Sub-cell containing only the first two items
length(C) % Length of the cell
size(C)   % Size of the cell

Preparation

Load the following image into your workspace and convert it to doubles.

: /home/weinman/courses/CSC262/images/sunflowers.png

Exercises

A. Gaussian Pyramid

In this section you will write a procedure for creating a Gaussian pyramid. If you don't remember the syntax for procedures, revisit the example at the bottom of the Matlab Tutorial.

Create a file called gausspyr.m to place your function in. Note, this is a function, not a script. That means it will take parameters as arguments and/or return values. The body will have a new namespace that is separate from the variables in your workspace. The menu option New -> Script will give you an empty file that you can populate yourself, and the option New -> Function gives you a template from which to edit; either choice should ultimately end in the same result.
Add a declaration line to your function so that it takes two parameters, an image and the number of levels to compute, and returns one value, the pyramid (which will be a cell).
Use gkern to create a 1-D Gaussian blurring kernel in the body of your function.
What scale (standard deviation) should your Gaussian have to prevent aliasing at the subsampling stage? Justify your choice in terms of the frequencies it will eliminate.
Hint: You may wish to explore the magnitude of the Fourier transform of (the two-dimensional version of) your kernel and include this in your justification. You will not need to take the log to compress the results. For example,

gauss = gkern(12^2); % Gaussian kernel with sigma=12
fftGauss = fftshift(fft(G)); % 1-D FFT, has scale=1/12 in freq domain.
frequencies = linspace(-0.5,0.5,length(Gf));
plot(frequencies,abs(fftGauss));

Note: Recall from the previous lab (where you investigated the box filter) that you can see the frequency reponses of a filter in a larger image by keeping the filter in the upper-left corner of a larger matrix of zeros before transforming.
Extra: Note that the functional relationship between a Gaussian and its Fourier transform given by Szeliski in Table 3.2.
Following Forsyth and Ponce (Algorithm 8.1), add a for loop to your function that continually blurs and downsamples the result from the previous pyramid layer, adding each successive result to a cell containing the pyramid.
- Downsample your resulting filtered images by taking every other pixel row and column, starting at 1.
- The original image will be the first entry in your cell.
- You should arrange for conv2 to give you back a result the same size as its image argument.
Note: You should assume that the image is grayscale (not color) and consists of doubles.
Return your cell as the value of the function gausspyr.
Test your pyramid script on the grayscale version of the bricks image and display some of the levels to be sure it is giving reasonable results.

B. Multi-Scale Search

If you haven't already noticed, the sunflower image is rather large, and flowers appear at many scales. In fact, the largest sunflower is about 600 pixels across. It took approximately 20 seconds to run a separable kernel of that size on this image; you probably don't even want to think about what it would take a square kernel to run. Thus, we need to use a pyramid to run the search for large structures at coarser scales.

Recall from the previous lab that the Laplacian of Gaussian (LoG) kernel can act as a "blob" detector. This is well-suited as a template for finding sunflowers. Fortunately, it's even a separable kernel:

LoG(x,y)=

∂²

∂x²

G(x,y)+

∂²

∂y²

G(x,y).

Because convolution is a linear operator, we can take the convolution of a sum by taking the sum of two convolutions!

Use gkern to create a 1-D Gaussian kernel of variance 4². This will give a reasonable size for us to detect sunflower blobs.
Use gkern to create a 1-D Gaussian second-derivative kernel with the same variance.
Use your procedure gausspyr to create a Gaussian pyramid of the sunflower image with enough levels to render the largest sunflowers sufficiently small at the coarsest scale to be detected by the LoG operator.
Choose one level of your pyramid and apply the LoG operator to it. Note that you'll need to apply the Gaussian along one dimension and its second derivative along the other dimension, repeat the process reversing the two kernels, and then adding the two results. Once again, you should arrange for conv2 to give you back results that are the same size as its image argument.
Display the image from your pyramid level.
In separate figure window, display the results of the LoG operator.
Use impixelinfo¹ or the data cursor to to guess an initial reasonable threshold for detecting a sunflower "blob" from the LoG responses.
Create a thresholded, binary image by vectorizing a "greater than" operation over your LoG responses.
Bring your pyramid level image figure window to the front, and then type hold on to indicate that we want to keep the image in that window. You can get a contour plot of a function, such as your thresholded responses using the contour command. We will use this to identify the regions detected using your threshold. The command

contour(B, [1 1], color);

says to draw a contour of B of the given color using only the value 1. (Because this is a binary image, we only want to do the level that is "on" at value one.) Use this command to add a contour of the detected regions to your image. See the doc LineSpec section "Color Specifier" for valid color possibilities; you should choose one that is easy to see on a grayscale image across a variety of brightnesses.
Add a for loop to apply the LoG operator to every level of your pyramid, displaying the original image and the contour of the thresholded version on top of it.
How good is your threshold? Adjust it so that you can detect as many flowers over all scales as you can.
How reliably does your method find sunflowers at the coarser scales? Finer scales?
How does the false positive rate change as the scale changes?
Display each figure from within the loop.

Extras: Blurring, Subsampling, and Aliasing

Szeliski defines several separable decimation/blurring kernels in Table 3.4 of Section 3.5 Add these 1-D filter definitions and a version that conglomerates them into a single cell to your workspace. (Note that if the lines below wrap in the web browser, they should NOT wrap in your code definitions.)

: hLin = [0.25 0.5 0.25];
hBin = [0.0625 0.25 0.3750 0.25 0.0625];
hCub1 = [-0.0625 0.0 0.3125 0.5 0.3125 0.0 -0.0625];
hCubH = [-0.03125 0.0 0.28125 0.5 0.28125 0.0 -0.03125];
hSinc = [-0.0153 0.0 0.2684 0.4939 0.2684 0.0 -0.0153];
hQMF = [0.0198 -0.0431 -0.0519 0.2932 0.5638 0.2932 -0.0519 -0.0431 0.0198];
hJPG = [0.0267 -0.0169 -0.0782 0.2669 0.6029 0.2669 -0.0782 -0.0169 0.0267];
F = {hLin, hBin, hCub1, hCubH, hSinc, hQMF, hJPG};

Load the following image of a brick wall

/home/weinman/courses/CSC262/images/bricks.jpg

and convert the image to doubles, but preserve the color channels.
Display the original bricks image. Zoom in so you can see the individual pixels of a region of bricks. Does the image look well-sampled?
Choose one of the filters given above, say the binomial filter, which is accessible as hBin and F{2}. Apply this filter to both the rows and the columns of each color channel of the bricks image to produce a color result. Recall that you can index (or assign) an entire slice of an image as:

B(:,:,c)
Downsample your resulting filtered image by taking every other pixel starting at 1. Don't forget to include all the color channels.
Display your image and zoom in so you can see the individual pixels on a region of bricks. How does the sampling compare to the original image?
Make a note of the region of the image you are looking at (i.e., the row and column lower/upper bounds). It should be sufficient to see individual pixels but also several rows of bricks and a few columns of their staggered pattern.
Extract a color sub-image using your row and column bounds.
Now you will generalize all the steps above to apply all the filters. Write a for loop to iterate over all the filters in F, applying the blurring and subsampling to the original brick image.
Inside your loop, use imshow to display the entire subsampled image.
Note: You can use a combination of title and num2str to help you keep clear which image corresponds to which filter. Alternatively, you could use montage or subplot.
Inspect your images. How do the various filters do at preserving the image structure? Which do you think is best? Worst? Why? Do your comparisons depend on where in the image you look?

Acknowledgments

The sunflowers image is by Bruce Fritz, taken on behalf of the USDA Agricultural Research Service, and is therefore in the public domain. The brick image is by Colin M.L. Burnett and is released under the GNU Free Documentation License.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.

Footnotes:

¹Recall that impixelinfo will show you both the coordinate and brightness value of a moused-over region in a figure window.