Lab: Image Pyramids
CSC 262 - Computer Vision - Weinman
- Summary:
- In this lab, you will write a procedure to build your
own Gaussian pyramid and use it to do an efficient multi-scale search.
Deliverables
- Matlab script used to make your comparisons and generate your report
- (10 points) Matlab function gausspyr.m (A)
- (10 points) Justification for Gaussian scale (A.4)
- (10 points) Commentary on detections (B.12)
- (10 points) Commentary on false positives (B.13)
- (20 points) Contoured flower pyramid images (B.14)
- (10 points) Professionalism of write-up
Background
Another data type in Matlab for storing arrays of disparate types
and sizes is called the "cell". Like arrays, it's an arbitrary
N-dimensional structure, but the entries can be anything, rather
than just numbers or characters. By anything, we mean any Matlab value:
a number, a vector, a matrix, even another cell. These are created
using curly braces, rather than brackets.
-
A = [ 1 2 ; 3 4 ]; % Some matrix
b = [ 5 6 7 8 ]; % A vector
C = {A,b}; % A cell containing a matrix and a vector
Cells are also
indexed using curly braces, instead of parentheses
like regular arrays.
-
C{1} % Value in the cell's first slot
C{end} % Value in the cell's last slot
C(1:2) % Sub-cell containing only the first two items
length(C) % Length of the cell
size(C) % Size of the cell
Preparation
Load the following images into your workspace and convert them to
doubles.
-
/home/weinman/courses/CSC262/images/sunflowers.png
/home/weinman/courses/CSC262/images/bricks.jpg
Exercises
A. Gaussian Pyramid
In this section you will write a procedure for creating a Gaussian
pyramid. If you don't remember the syntax for procedures, revisit
the example at the bottom of the
Matlab Tutorial.
- Create a file called gausspyr.m to place your function in.
Note, this is a function, not a script. That means it will
take parameters as arguments and/or return values. The body will have
a new namespace that is separate from the variables in your workspace.
The menu option New -> Script will give you an empty file
that you can populate yourself, while the option New -> Function
gives you a template from which to edit; either choice should ultimately
end in the same result.
- Add a declaration line to your function so that it takes two parameters,
an image and the number of levels to compute, and returns one value,
the pyramid (which will be a cell).
- Use gkern to create a 1-D Gaussian blurring kernel in the
body of your function.
-
What scale (standard deviation) should your
Gaussian have to prevent aliasing at the subsampling stage? Justify
your choice in terms of the frequencies it will eliminate.
Hint: You may wish to explore the magnitude of the Fourier
transform of your kernel and include this in your justification. You
will need to take the log to compress
the results. For example,
-
gauss = gkern(12^2); % Gaussian kernel with sigma=12
fftGauss = fftshift(fft(gauss)); % 1-D FFT, has scale/12 in freq domain.
frequencies = linspace(-0.5,0.5,length(fftGauss));
plot(frequencies,abs(fftGauss));
Note: Recall from the previous lab (where you investigated
the box filter) that you can see the frequency reponses of a filter
in a larger image by keeping the filter in the upper-left corner of
a larger matrix of zeros before transforming.
Extra: Note that the functional relationship between a Gaussian
of scale σ and its Fourier transform is
- Following Forsyth and Ponce (Algorithm 8.1), add a for loop
to your function that continually blurs and downsamples the result
from the previous pyramid layer, adding each successive result to
a cell containing the pyramid.
- Downsample your resulting filtered images by taking every other pixel
row and column, starting at 1.
- The original image will be the first entry in your cell.
- You should arrange for conv2 to give you back a result the
same size as its image argument.
Note: You should assume that the image is grayscale (not
color) and consists of doubles.
- Return your cell as the value of the function gausspyr.
- Test your pyramid script on the grayscale version of the bricks image
and display some of the levels to be sure it is giving reasonable
results.
B. Multi-Scale Search
If you haven't already noticed, the sunflower image is rather large,
and flowers appear at many scales. In fact, the largest sunflower
is about 600 pixels across. It took approximately 20 seconds to run
a separable kernel of that size on this image; you probably don't
even want to think about what it would take a square (non-separable)
kernel to run. Thus, we need to use a pyramid to run the search for
large structures at coarser scales.
Recall from the previous lab that the Laplacian of Gaussian (
LoG)
kernel can act as a "blob" detector. This is well-suited as a
template for finding sunflowers. Fortunately, because
G(
x,
y)=
G(
x)
G(
y),
it's even a separable kernel:
| |
|
|
|
∂2
∂x2
|
G(x,y)+ |
∂2
∂y2
|
G(x,y) |
| |
| |
|
| ⎡ ⎣
|
∂2
∂x2
|
G(x) | ⎤ ⎦
|
G(y)+ | ⎡ ⎣
|
∂2
∂y2
|
G(y) | ⎤ ⎦
|
G(x). |
| |
|
Because convolution is a linear operator, we can take the convolution
of a sum by taking the sum of two convolutions!
- Use gkern to create a 1-D Gaussian kernel of variance 42.
This will give a reasonable size for us to detect sunflower blobs.1
- Use gkern to create a 1-D Gaussian second-derivative kernel
with the same variance.
- Use your procedure gausspyr to create a Gaussian pyramid
of the sunflower image with enough levels to render the largest sunflowers
sufficiently small at the coarsest scale to be detected by the LoG
operator.
- Choose one level of your pyramid and apply the LoG operator
to it. Note that you'll need to apply the Gaussian along one dimension
and its second derivative along the other dimension, repeat the process
reversing the two kernels, and then adding the two results. Once again,
you should arrange for conv2 to give you back results that
are the same size as its image argument.
- Display the image from your pyramid level.
- In separate figure window, display the results of the LoG operator.
- Use impixelinfo2 or the data cursor to to guess an initial reasonable threshold for
detecting a sunflower "blob" from the LoG responses.
- Create a thresholded, binary image by vectorizing a "greater than"
operation over your LoG responses.
- Bring your pyramid level image figure window to the front, and then
type hold on to indicate that we want to keep the image in
that window. You can get a contour plot of a function, such as your
thresholded responses using the contour command. We will
use this to identify the regions detected using your threshold. The
command
-
contour(B, [1 1], color);
says to draw a contour of B of the given color using
only the value 1. (Because this is a binary image, we only
want to do the level that is "on" at value one.) Use this command
to add a contour of the detected regions to your image. See the doc
LineSpec section "Color Specifier" for valid color possibilities;
you should choose one that is easy to see on a grayscale image across
a variety of brightnesses.
- Add a for loop to apply the LoG operator to every
level of your pyramid, displaying the original image and the contour
of the thresholded version on top of it.
- How good is your threshold? Adjust it so that you can detect as many
flowers over all scales as you can.
-
How reliably does your method find sunflowers
at the coarser scales? Finer scales?
-
How does the false positive rate
change as the scale changes?
-
Display each figure from within
the loop.
Acknowledgments
The
sunflowers
image is by Bruce Fritz, taken on behalf of the USDA
Agricultural
Research Service, and is therefore in the public domain. The
brick
image is by Colin M.L. Burnett and is released under the GNU Free
Documentation License.
Footnotes:
1Note that the scale σ of a Gaussian is usually reported, rather
than σ
2, because the units of the former are pixels. However,
our
gkern function takes the "variance" (σ
2)
because it happens to be the natural parameterization for the discretized
version of the kernel.
2Recall that
impixelinfo will show you both the coordinate
and brightness value of a moused-over region in a figure window.