Lab: Image Pyramids

CSC 262 - Computer Vision - Weinman



Summary:
In this lab, you will write a procedure to build your own Gaussian pyramid and use it to do an efficient multi-scale search.

Deliverables

Background

Another data type in Matlab for storing arrays of disparate types and sizes is called the "cell". Like arrays, it's an arbitrary N-dimensional structure, but the entries can be anything, rather than just numbers or characters. By anything, we mean any Matlab value: a number, a vector, a matrix, even another cell. These are created using curly braces, rather than brackets.
A = [ 1 2 ; 3 4 ]; % Some matrix
b = [ 5 6 7 8 ];   % A vector
C = {A,b};         % A cell containing a matrix and a vector
Cells are also indexed using curly braces, instead of parentheses like regular arrays.
C{1}      % Value in the cell's first slot 
C{end}    % Value in the cell's last slot
C(1:2)    % Sub-cell containing only the first two items
length(C) % Length of the cell
size(C)   % Size of the cell

Preparation

Load the following images into your workspace and convert them to doubles.
/home/weinman/courses/CSC262/images/sunflowers.png
/home/weinman/courses/CSC262/images/bricks.jpg

Exercises

A. Gaussian Pyramid

In this section you will write a procedure for creating a Gaussian pyramid. If you don't remember the syntax for procedures, revisit the example at the bottom of the Matlab Tutorial.
  1. Create a file called gausspyr.m to place your function in. Note, this is a function, not a script. That means it will take parameters as arguments and/or return values. The body will have a new namespace that is separate from the variables in your workspace. The menu option New -> Script will give you an empty file that you can populate yourself, while the option New -> Function gives you a template from which to edit; either choice should ultimately end in the same result.
  2. Add a declaration line to your function so that it takes two parameters, an image and the number of levels to compute, and returns one value, the pyramid (which will be a cell).
  3. Use gkern to create a 1-D Gaussian blurring kernel in the body of your function.
  4. What scale (standard deviation) should your Gaussian have to prevent aliasing at the subsampling stage? Justify your choice in terms of the frequencies it will eliminate.
    Hint: You may wish to explore the magnitude of the Fourier transform of your kernel and include this in your justification. You will not need to take the log to compress the results. For example,
    gauss = gkern(12); % Gaussian kernel with sigma=12
    fftGauss = fftshift(fft(gauss)); % 1-D FFT, has  scale/12 in freq domain.
    frequencies = linspace(-0.5,0.5,length(fftGauss));
    plot(frequencies,abs(fftGauss)); 
    Note: Recall from the previous lab (where you investigated the box filter) that you can see the frequency reponses of a filter in a larger image by keeping the filter in the upper-left corner of a larger matrix of zeros before transforming.
    Extra: Note that the functional relationship between a Gaussian of scale σ and its Fourier transform is
    F{ G(x;σ)} =





    σ
    G(ω;σ−1).
  5. Following Forsyth and Ponce (Algorithm 8.1), add a for loop to your function that repeatedly blurs and downsamples the result from the previous pyramid layer, adding each successive result to a cell containing the pyramid.
    • Downsample your resulting filtered images by taking every other pixel row and column, starting at 1.
    • The original image will be the first entry in your cell.
    • You should arrange for conv2 to give you back a result the same size as its image argument.
    Note: You should assume that the image is grayscale (not color) and consists of doubles.
  6. Return your cell as the value of the function gausspyr.
  7. Test your pyramid script on the grayscale version of the bricks image and display some of the levels to be sure it is giving reasonable results.

B. Multi-Scale Search

If you haven't already noticed, the sunflower image is rather large, and flowers appear at many scales. In fact, the largest sunflower is about 600 pixels across. It took approximately 20 seconds to run a separable kernel of that size on this image; you probably don't even want to think about what it would take a square (non-separable) kernel to run. Thus, we need to use a pyramid to run the search for large structures at coarser scales.
Recall from the previous lab that the Laplacian of Gaussian (LoG) kernel can act as a "blob" detector. This is well-suited as a template for finding sunflowers. Fortunately, because G(x,y)=G(x)G(y), it's even a separable kernel:
LoG(x,y)
=
2

x2
G(x,y)+2

y2
G(x,y)
=

2

x2
G(x)
G(y)+
2

y2
G(y)
G(x).
Because convolution is a linear operator, we can take the convolution of a sum by taking the sum of two convolutions!
  1. Use gkern to create a 1-D Gaussian kernel of scale 4. This will give a reasonable size for us to detect sunflower blobs.
  2. Use gkern to create a 1-D Gaussian second-derivative kernel with the same scale.
  3. Use your procedure gausspyr to create a Gaussian pyramid of the sunflower image with enough levels to render the largest sunflowers sufficiently small at the coarsest scale to be detected by the LoG operator.
  4. Choose one level of your pyramid and apply the LoG operator to it. Note that you'll need to apply the Gaussian along one dimension and its second derivative along the other dimension, repeat the process reversing the two kernels, and then adding the two results together. Once again, you should arrange for conv2 to give you back results that are the same size as its image argument.
  5. Display the image from your pyramid level.
  6. In separate figure window, display the results of the LoG operator.
  7. Use impixelinfo1 or the data cursor to to guess an initial reasonable threshold for detecting a sunflower "blob" from the LoG responses.
  8. Create a thresholded, binary image by vectorizing a "greater than" operation over your LoG responses.
  9. Bring your pyramid level image figure window to the front, and then type hold on to indicate that we want to keep the image in that window. You can get a contour plot of a function, such as your thresholded responses using the contour command. We will use this to identify the regions detected using your threshold. The command
    contour(B, [1 1], color);
    says to draw a contour of B of the given color using only the value 1. (Because this is a binary image, we only want to do the level that is "on" at value one.) Use this command to add a contour of the detected regions to your image. See the doc LineSpec section "Color Specifier" for valid color possibilities; you should choose one that is easy to see on a grayscale image across a variety of brightnesses.
  10. Add a for loop to apply the LoG operator to every level of your pyramid, displaying the original image and the contour of the thresholded version on top of it.
  11. How good is your threshold? Adjust it so that you can detect as many flowers over all scales as you can.
  12. How reliably does your method find sunflowers at the coarser scales? Finer scales?
  13. How does the false positive rate change as the scale changes?
  14. Display each figure from within the loop.

Acknowledgments

The sunflowers image is by Bruce Fritz, taken on behalf of the USDA Agricultural Research Service, and is therefore in the public domain. The brick image is by Colin M.L. Burnett and is released under the GNU Free Documentation License.
Copyright © 2010, 2012, 2015, 2019, 2020, 2022 Jerod Weinman.
ccbyncsa.png

Footnotes:

1Recall that impixelinfo will show you both the coordinate and brightness value of a moused-over region in a figure window.