Lab: Structure from Motion
CSC 295 - Computer Vision - Weinman
- Summary:
- In this lab you will use a given set of very dense feature
matches to create a 3-D orthographic reconstruction.
- Due:
- 4/14
Deliverables
- The Matlab script used to make your comparisons and generate all figures
- (5 points) Top singular values from decomposition (B.3)
- (10 points) Image of 3-D shape of reconstruction (C.3)
- (10 points) Observations of reconstruction (C.4)
- (25 points) Synthesized 3-D views of scene and commentary (D.5)
- (20 points) 3-D views of sparse/noisy reconstruction (E.4)
- (10 points) Observations of sparse/noisy reconstruction (E. 5)
- (10 points) Professionalism of write-up
Preparation
The base images we will be using in this lab come from the Middlebury
Optical Flow data1 and they are found on the MathLAN at:
-
/home/weinman/courses/CSC295/images/frame1.png
/home/weinman/courses/CSC295/images/frame2.png
Can you infer the 3-D structure from inspecting the images? The data
also features a set of dense feature correspondences. Since the images
are synthetic, we know the ground truth matching will be correct.
You can load this in a Matlab data file with the following Matlab
command
-
>> load /home/weinman/courses/CSC295/images/flow.mat
which will add the variable F to your workspace. It has the
same number of rows and columns as frame1.png. The first
slice contains the x offset to the corresponding point in frame2.png
and the second slice contains the y offset.
Exercises
A. Set Up
-
Use meshgrid to construct the 2-D
spatial domain (x and y coordinates in two matrices) of the
pixels in frame1. Since there are far too many points in
the image for the MathLAN computers to handle a decomposition of,
use only every third row and column.
Note: Remember that the first dimension of a matrix is the
rows, and the second dimension is the columns, which is the reverse
of the typical (x,y) pair ordering.
-
Now that you have matrices representing (x,y)
pairs of points in the image, use the data in the variable F
to create matrices of the coordinates of their corresponding points
in frame2. That is, if X1(a,b) and Y1(a,b)
are the x and y coordinates of a point in frame1, then
you should create matrices such that X2(a,b) and Y2(a,b)
are the coordinates of the same point in frame2.
- Construct a complete 4×N data matrix of the two frames' points.
That is, the x coordinates from the first image should be the first
row, the x coordinates of the matching points from the second image
should be the second row, and similarly with the y coordinates
for the third and fourth row.
- Calculate the translation of the points perpendicular to the viewing
direction (i.e., find the mean of each row)..
- Create a modified data matrix where the row mean has been subtracted
from each value in the data matrix.
Hint: You can use repmat to tile things and avoid
for loops.
B. Factorization
- You should now have a matrix suitable for singular value decomposition
(SVD). Using the Matlab command, decompose the data matrix into its
constituent factors, i.e.,
-
[U,W,V] = svd(X);
-
Extract the 3×3 sub-matrix of W
containing the top three singular values along with the corresponding
columns from V (an N×3 matrix).
-
What are the three singular values you
find?
- Take the product of the two extracted matrices from B.2
in order to recover the shape matrix.
C. Analysis
- The first and second column of your shape matrix contain the (arbitrary)
x and y coordinates of the recovered points. In order to construct
a full 3-D plot of these points in a connected fashion, we will use
what is called a Delaunay triangulation. This connects the recovered
points with triangles. Use the delaunay Matlab command to
create a plot structure from your points, i.e.,
-
T = delaunay(X,Y);
-
Next we plot a simple visualization of the
3-D structure recovered by algorithm. The command trisurf
takes a set of x, y, and z coordinates and plots them using
the Delaunay triangles we created in the previous step. Use the coordinates
from your shape matrix to render the recovered scene structure. One
suggested syntax is
-
trisurf(T, X, Y, Z, 'EdgeColor', 'none', 'FaceColor','red');
lighting phong
camlight headlight
-
Use the 'Rotation 3D" tool in the figure window
to adjust the view and inspect the structure. Save an view you find
to be interesting using print.
-
Does it look like you expect? Are there any
3-D structures that appear that you did not necessarily expect?
D. Visualization
Visualizing the surface by shape alone is helpful, especially for
finding flaws, but is unsatisfying visually. Fortunately, Matlab allows
the surface faces to be colored by an index.
- If you haven't already, load an RGB version of frame1.png.
-
Use the command rgb2ind to convert
the image to an indexed image using 256 colors. You will need to specify
both the indexed image and the colormap as outputs.
- Recall that you took only every third row and column when you constructed
the points used in the correspondence. Extract the corresponding sub-sampled
indices from the indexed image and reshape the result into a column
vector.
-
Now you can use the resulting index vector
as an additional argument to trisurf, which will indicate
the color to use in the rendering. For example,
-
figure;
trisurf(T, X, Y, Z, C, 'EdgeColor', 'none');
colormap(map);
where C is your vector of color indices and map
is the colormap you got in D.2.
-
Use the "Rotation 3D" tool to synthesize
at least two novel views of the scene. Save these views using print.
What do your views help to illustrate about the 3-D structure?
E. Reality
Usually, matching is more unreliable and much sparser than the data
we have used so far. In this section you'll explore a simulated, but
more realistic reconstruction scenario.
- Return to step A.1 and reduce the number of
points you use. You should try devise a way to sample them non-uniformly.
Hint: one way might be to use randi to get integer
indices and ind2sub to conver these to subscripts (i.e.,
y and x coordinates).
- Using your much smaller set of points, repeat step A.2,
but this time add some random noise to the offset.
Hint: Use randn. Even small errors multiply greatly
for points at large distances.
You may wish to investigate the magnitudes of the typical match offsets
before choosing a multiple.
- Carry out the remainder of the steps for doing orthographic structure
from motion on your reduced, noisy data.
-
Save two visualizations of your data:
one that is a solidly colored shape (as in C.2)
and another that is color-mapped (as in D.4).
You will likely need to experiment with the amount of sparsity and
noise you inject before getting a vaguely satisfying result.
-
Comment on the quality of the reconstruction.
Based on your explorations, which issue (reliability or sparsity)
may have bigger impact on the quality of the result?
Copyright © 2010 Jerod
Weinman.
This work is licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Footnotes:
1http://vision.middlebury.edu/flow