Another data type in Matlab for storing arrays of disparate types
and sizes is called the "cell". Like arrays, it's an arbitrary
N-dimensional structure, but the entries can be anything, rather
than just numbers or characters. By anything, we mean any Matlab value:
a number, a vector, a matrix, even another cell. These are created
using curly braces, rather than brackets.
A = [ 1 2 ; 3 4 ]; % Some matrix
b = [ 5 6 7 8 ]; % A vector
C = {A,b}; % A cell containing a matrix and a vector
Cells are also indexed using curly braces, instead of parentheses
like regular arrays.
C{1} % Value in the cell's first slot
C{end} % Value in the cell's last slot
C(1:2) % Sub-cell containing only the first two items
length(C) % Length of the cell
size(C) % Size of the cell
Preparation
Load the following image into your workspace and convert it to doubles.
In this section you will write a procedure for creating a Gaussian
pyramid. If you don't remember the syntax for procedures, revisit
the example at the bottom of the Matlab Tutorial.
Create a file called gausspyr.m to place your function in.
Note, this is a function, not a script. That means it will
take parameters as arguments and/or return values. The body will have
a new namespace that is separate from the variables in your workspace.
The menu option New -> Script will give you an empty file
that you can populate yourself, and the option New -> Function
gives you a template from which to edit; either choice should ultimately
end in the same result.
Add a declaration line to your function so that it takes two parameters,
an image and the number of levels to compute, and returns one value,
the pyramid (which will be a cell).
Use gkern to create a 1-D Gaussian blurring kernel in the
body of your function.
What scale (standard deviation) should your
Gaussian have to prevent aliasing at the subsampling stage? Justify
your choice in terms of the frequencies it will eliminate.
Hint: You may wish to explore the magnitude of the Fourier
transform of (the two-dimensional version of) your kernel and include
this in your justification. You will not need to take the log
to compress the results. For example,
gauss = gkern(12^2); % Gaussian kernel with sigma=12
fftGauss = fftshift(fft(G)); % 1-D FFT, has scale=1/12 in freq domain.
frequencies = linspace(-0.5,0.5,length(Gf));
plot(frequencies,abs(fftGauss));
Note: Recall from the previous lab (where you investigated
the box filter) that you can see the frequency reponses of a filter
in a larger image by keeping the filter in the upper-left corner of
a larger matrix of zeros before transforming.
Extra: Note that the functional relationship between a Gaussian
and its Fourier transform given by Szeliski in Table 3.2.
Following Forsyth and Ponce (Algorithm 8.1), add a for loop
to your function that continually blurs and downsamples the result
from the previous pyramid layer, adding each successive result to
a cell containing the pyramid.
Downsample your resulting filtered images by taking every other pixel
row and column, starting at 1.
The original image will be the first entry in your cell.
You should arrange for conv2 to give you back a result the
same size as its image argument.
Note: You should assume that the image is grayscale (not
color) and consists of doubles.
Return your cell as the value of the function gausspyr.
Test your pyramid script on the grayscale version of the bricks image
and display some of the levels to be sure it is giving reasonable
results.
B. Multi-Scale Search
If you haven't already noticed, the sunflower image is rather large,
and flowers appear at many scales. In fact, the largest sunflower
is about 600 pixels across. It took approximately 20 seconds to run
a separable kernel of that size on this image; you probably don't
even want to think about what it would take a square kernel to run.
Thus, we need to use a pyramid to run the search for large structures
at coarser scales.
Recall from the previous lab that the Laplacian of Gaussian (LoG)
kernel can act as a "blob" detector. This is well-suited as a
template for finding sunflowers. Fortunately, it's even a separable
kernel:
LoG(x,y)=
∂2
∂x2
G(x,y)+
∂2
∂y2
G(x,y).
Because convolution is a linear operator, we can take the convolution
of a sum by taking the sum of two convolutions!
Use gkern to create a 1-D Gaussian kernel of variance 42.
This will give a reasonable size for us to detect sunflower blobs.
Use gkern to create a 1-D Gaussian second-derivative kernel
with the same variance.
Use your procedure gausspyr to create a Gaussian pyramid
of the sunflower image with enough levels to render the largest sunflowers
sufficiently small at the coarsest scale to be detected by the LoG
operator.
Choose one level of your pyramid and apply the LoG operator
to it. Note that you'll need to apply the Gaussian along one dimension
and its second derivative along the other dimension, repeat the process
reversing the two kernels, and then adding the two results. Once again,
you should arrange for conv2 to give you back results that
are the same size as its image argument.
Display the image from your pyramid level.
In separate figure window, display the results of the LoG operator.
Use impixelinfo1 or the data cursor to to guess an initial reasonable threshold for
detecting a sunflower "blob" from the LoG responses.
Create a thresholded, binary image by vectorizing a "greater than"
operation over your LoG responses.
Bring your pyramid level image figure window to the front, and then
type hold on to indicate that we want to keep the image in
that window. You can get a contour plot of a function, such as your
thresholded responses using the contour command. We will
use this to identify the regions detected using your threshold. The
command
contour(B, [1 1], color);
says to draw a contour of B of the given color using
only the value 1. (Because this is a binary image, we only
want to do the level that is "on" at value one.) Use this command
to add a contour of the detected regions to your image. See the doc
LineSpec section "Color Specifier" for valid color possibilities;
you should choose one that is easy to see on a grayscale image across
a variety of brightnesses.
Add a for loop to apply the LoG operator to every
level of your pyramid, displaying the original image and the contour
of the thresholded version on top of it.
How good is your threshold? Adjust it so that you can detect as many
flowers over all scales as you can.
How reliably does your method find sunflowers
at the coarser scales? Finer scales?
How does the false positive rate
change as the scale changes?
Display each figure from within
the loop.
Extras: Blurring, Subsampling, and Aliasing
Szeliski defines several separable decimation/blurring kernels in
Table 3.4 of Section 3.5 Add these 1-D filter definitions and a version
that conglomerates them into a single cell to your workspace. (Note
that if the lines below wrap in the web browser, they should NOT wrap
in your code definitions.)
and convert the image to doubles, but preserve the color channels.
Display the original bricks image. Zoom in so you can see the individual
pixels of a region of bricks. Does the image look well-sampled?
Choose one of the filters given above, say the binomial filter, which
is accessible as hBin and F{2}. Apply this filter
to both the rows and the columns of each color channel of the bricks
image to produce a color result. Recall that you can index (or assign)
an entire slice of an image as:
B(:,:,c)
Downsample your resulting filtered image by taking every other pixel
starting at 1. Don't forget to include all the color channels.
Display your image and zoom in so you can see the individual pixels
on a region of bricks. How does the sampling compare to the original
image?
Make a note of the region of the image you are looking at (i.e., the
row and column lower/upper bounds). It should be sufficient to see
individual pixels but also several rows of bricks and a few columns
of their staggered pattern.
Extract a color sub-image using your row and
column bounds.
Now you will generalize all the steps above to apply all the filters.
Write a for loop to iterate over all the filters in F,
applying the blurring and subsampling to the original brick image.
Inside your loop, use imshow to
display the entire subsampled image.
Note: You can use a combination of title and num2str
to help you keep clear which image corresponds to which filter. Alternatively,
you could use montage or subplot.
Inspect your images. How do the various
filters do at preserving the image structure? Which do you think is
best? Worst? Why? Do your comparisons depend on where in the image
you look?