Lab: Structure from Motion

CSC 295 - Computer Vision - Weinman

Summary:: In this lab you will use a given set of very dense feature matches to create a 3-D orthographic reconstruction.
Due:: 4/14

Deliverables

The Matlab script used to make your comparisons and generate all figures
(5 points) Top singular values from decomposition (B.3)
(10 points) Image of 3-D shape of reconstruction (C.3)
(10 points) Observations of reconstruction (C.4)
(25 points) Synthesized 3-D views of scene and commentary (D.5)
(20 points) 3-D views of sparse/noisy reconstruction (E.4)
(10 points) Observations of sparse/noisy reconstruction (E. 5)
(10 points) Professionalism of write-up

Preparation

The base images we will be using in this lab come from the Middlebury Optical Flow data¹ and they are found on the MathLAN at:

: /home/weinman/courses/CSC295/images/frame1.png
/home/weinman/courses/CSC295/images/frame2.png

Can you infer the 3-D structure from inspecting the images? The data also features a set of dense feature correspondences. Since the images are synthetic, we know the ground truth matching will be correct. You can load this in a Matlab data file with the following Matlab command

: >> load /home/weinman/courses/CSC295/images/flow.mat

which will add the variable F to your workspace. It has the same number of rows and columns as frame1.png. The first slice contains the x offset to the corresponding point in frame2.png and the second slice contains the y offset.

Exercises

A. Set Up

Use meshgrid to construct the 2-D spatial domain (x and y coordinates in two matrices) of the pixels in frame1. Since there are far too many points in the image for the MathLAN computers to handle a decomposition of, use only every third row and column.
Note: Remember that the first dimension of a matrix is the rows, and the second dimension is the columns, which is the reverse of the typical (x,y) pair ordering.
Now that you have matrices representing (x,y) pairs of points in the image, use the data in the variable F to create matrices of the coordinates of their corresponding points in frame2. That is, if X1(a,b) and Y1(a,b) are the x and y coordinates of a point in frame1, then you should create matrices such that X2(a,b) and Y2(a,b) are the coordinates of the same point in frame2.
Construct a complete 4×N data matrix of the two frames' points. That is, the x coordinates from the first image should be the first row, the x coordinates of the matching points from the second image should be the second row, and similarly with the y coordinates for the third and fourth row.
Calculate the translation of the points perpendicular to the viewing direction (i.e., find the mean of each row)..
Create a modified data matrix where the row mean has been subtracted from each value in the data matrix.
Hint: You can use repmat to tile things and avoid for loops.

B. Factorization

You should now have a matrix suitable for singular value decomposition (SVD). Using the Matlab command, decompose the data matrix into its constituent factors, i.e.,

[U,W,V] = svd(X);
Extract the 3×3 sub-matrix of W containing the top three singular values along with the corresponding columns from V (an N×3 matrix).
What are the three singular values you find?
Take the product of the two extracted matrices from B.2 in order to recover the shape matrix.

C. Analysis

The first and second column of your shape matrix contain the (arbitrary) x and y coordinates of the recovered points. In order to construct a full 3-D plot of these points in a connected fashion, we will use what is called a Delaunay triangulation. This connects the recovered points with triangles. Use the delaunay Matlab command to create a plot structure from your points, i.e.,

T = delaunay(X,Y);
Next we plot a simple visualization of the 3-D structure recovered by algorithm. The command trisurf takes a set of x, y, and z coordinates and plots them using the Delaunay triangles we created in the previous step. Use the coordinates from your shape matrix to render the recovered scene structure. One suggested syntax is

trisurf(T, X, Y, Z, 'EdgeColor', 'none', 'FaceColor','red');
lighting phong
camlight headlight
Use the 'Rotation 3D" tool in the figure window to adjust the view and inspect the structure. Save an view you find to be interesting using print.
Does it look like you expect? Are there any 3-D structures that appear that you did not necessarily expect?

D. Visualization

Visualizing the surface by shape alone is helpful, especially for finding flaws, but is unsatisfying visually. Fortunately, Matlab allows the surface faces to be colored by an index.

If you haven't already, load an RGB version of frame1.png.
Use the command rgb2ind to convert the image to an indexed image using 256 colors. You will need to specify both the indexed image and the colormap as outputs.
Recall that you took only every third row and column when you constructed the points used in the correspondence. Extract the corresponding sub-sampled indices from the indexed image and reshape the result into a column vector.
Now you can use the resulting index vector as an additional argument to trisurf, which will indicate the color to use in the rendering. For example,

figure;
trisurf(T, X, Y, Z, C, 'EdgeColor', 'none');
colormap(map);

where C is your vector of color indices and map is the colormap you got in D.2.
Use the "Rotation 3D" tool to synthesize at least two novel views of the scene. Save these views using print. What do your views help to illustrate about the 3-D structure?

E. Reality

Usually, matching is more unreliable and much sparser than the data we have used so far. In this section you'll explore a simulated, but more realistic reconstruction scenario.

Return to step A.1 and reduce the number of points you use. You should try devise a way to sample them non-uniformly.
Hint: one way might be to use randi to get integer indices and ind2sub to conver these to subscripts (i.e., y and x coordinates).
Using your much smaller set of points, repeat step A.2, but this time add some random noise to the offset.
Hint: Use randn. Even small errors multiply greatly for points at large distances. You may wish to investigate the magnitudes of the typical match offsets before choosing a multiple.
Carry out the remainder of the steps for doing orthographic structure from motion on your reduced, noisy data.
Save two visualizations of your data: one that is a solidly colored shape (as in C.2) and another that is color-mapped (as in D.4). You will likely need to experiment with the amount of sparsity and noise you inject before getting a vaguely satisfying result.
Comment on the quality of the reconstruction. Based on your explorations, which issue (reliability or sparsity) may have bigger impact on the quality of the result?

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Footnotes:

¹ http://vision.middlebury.edu/flow