Lab: Structure from Motion

CSC 262 - Computer Vision - Weinman



Summary:
You will use a given set of very dense feature matches to create a 3-D orthographic reconstruction.

Deliverables

Preparation

Load one of the following synthetic images that you'd like to build a 3-D reconstruction of.
/home/weinman/courses/CSC262/images/frame1.png
/home/weinman/courses/CSC262/images/gframe1.png
images/frame1.png images/gframe1.png
frame1.png gframe1.png
Can you infer the 3-D structure from inspecting the image? The "motion" here might be of the camera mounted to an aerial vehicle or a Mars rover. We will be reconstructing the geometry of the scene.
Because the images are synthetic, we can generate a set of very dense feature correspondences that we know will be correct. You can load this from a Matlab data file with one of the following Matlab commands
>>  load /home/weinman/courses/CSC262/images/flow.mat   % For frame1.png
>>  load /home/weinman/courses/CSC262/images/gflow.mat % For gframe1.png
This load command will add to your workspace the variable F, which has the same number of rows and columns as the corresponding image. The first slice contains the x offset to the corresponding point in the subsequent video frame and the second slice contains the y offset.

Exercises

A. Set Up

  1. Use meshgrid to construct the 2-D spatial domain (x and y coordinates in two matrices) of the pixels in frame1.
    Note: Remember that the first dimension of a matrix is the rows, and the second dimension is the columns, which is the reverse of the typical (x,y) pair ordering.
  2. Now that you have matrices representing (x,y) pairs of points in the image, use the data in the variable F to create matrices of the coordinates of their corresponding points in frame2. That is, if X1(a,b) and Y1(a,b) are the x and y coordinates of a point (a,b) in frame1, then you should create matrices X2 and Y2 such that X2(a,b) and Y2(a,b) are the coordinates of the same (real) point in frame2 using the relations
    x2
    =
    x1+Fx
    y2
    =
    y1+Fy
  3. Construct a complete 4×N data matrix of the two frames' points. That is, the x coordinates from the first image should be the first row, the x coordinates of the matching points from the second image should be the second row, and similarly with the y coordinates for the third and fourth row.
  4. Calculate the translation of the points perpendicular to the viewing direction (i.e., find the mean of each row).
  5. Create a modified data matrix where the row mean (centroid) has been subtracted from each value in the data matrix (hint: use bsxfun).

B. Factorization

  1. You should now have a matrix suitable for singular value decomposition (SVD). Using the Matlab command, decompose the data matrix into its constituent factors, i.e.,
    [U,W,V] = svd(X,'econ');
    Note: The 'econ' argument only constructs the minimum necessary singular values. Given that X is a very large matrix of rank at most 4, we don't need all those extra columns.
  2. Extract W3, the 3×3 sub-matrix of W containing the top three singular values and V3, the corresponding columns from V (an N×3 matrix).
  3. What are the three singular values you find? What do they tell you?
  4. Take the product of the two extracted matrices from B.2 in order to recover the shape matrix S*.

C. Analysis

  1. The first and second rows of your shape matrix S* contain the (arbitrary) X and Y coordinates of the recovered 3-D points. In order to construct a full 3-D plot of these points in a connected fashion, we will use what is called a Delaunay triangulation. This connects the recovered points with triangles. Use the delaunay Matlab command to create a plot structure from your points, i.e.,
    T = delaunay(X,Y);
  2. Next we plot a simple visualization of the 3-D structure recovered by algorithm. The command trisurf takes a set of x, y, and z coordinates and plots them using the Delaunay triangles we created in the previous step. Use the coordinates from your shape matrix to render the recovered scene structure. One suggested syntax is
    trisurf(TXYZ, 'EdgeColor', 'none', 'FaceColor','red');
    lighting phong
    If you are using the urban structure (frame1.png), reorient the reconstructed surface as follows:
    view([-161 46]);
    Then, regardless of which you chose, shine a light on the scene to you can better visualize the emergent details by light reflections:
    camlight headlight
  3. Use the 'Rotation 3D" tool in the figure window to adjust the view and inspect the structure. After you find an interesting view (the azimuth and elevation are reported for you within the figure window), programmatically set the figure to use this view (i.e., in your published report) with the view command.
    Note: you should do this secondary view change after installing the light with camlight).
  4. Does the structure look like you expect? Are there any 3-D structures that appear that you did not necessarily expect?

D. Visualization

Visualizing the surface by shape alone is helpful, especially for finding flaws, but is unsatisfying visually. Fortunately, Matlab allows the surface faces to be colored by an index.
  1. If you haven't already, load an RGB version of frame1.png.
  2. Use the command rgb2ind to convert the image to an indexed image using 256 colors. You will need to specify both the indexed image and the colormap as outputs.
  3. Reshape the resulting imaged index into a column vector.
  4. Now you can use the resulting index vector as an additional argument to trisurf, which will indicate the color to use in the rendering. For example,
    figure;
    trisurf(TXYZC, 'EdgeColor', 'none');
    colormap(map);
    where C is your vector of color indices and map is the colormap you got in D.2.
  5. Use the "Rotation 3D" tool to synthesize at least two novel views of the scene using view and snapnow.
  6. What do your views help to illustrate about the 3-D structure?

E. Reality

Usually, matching is more unreliable and much sparser than the data we have used so far. In this section you'll explore a simulated, but more realistic reconstruction scenario.
  1. Return to step A.1 and reduce the number of coordinate points you use. You should try devise a way to sample them non-uniformly.
    Hint: one way might be to use randi to get integer indices and ind2sub to convert these to subscripts (i.e., y and x coordinates).
  2. Using your much smaller set of points, repeat step A.2, but this time add some random noise to the offset.
    Hint: Use randn, which produces number from the "standard" normal distribution (zero mean, unit standard deviation). Even small errors multiply greatly for points at large distances. You may wish to investigate the magnitudes of the typical match offsets before choosing a multiple to alter the standard deviation of the noise.
  3. Carry out the remainder of the steps for doing orthographic structure from motion on your reduced, noisy data.
  4. Display two visualizations of your data: one that is a solidly colored shape (as in C.2) and another that is color-mapped (as in D.4). You will likely need to experiment with the amount of sparsity and noise you inject before getting a vaguely satisfying result.
  5. Comment on the quality of the reconstruction. Based on your explorations, which issue (noise or sparsity) may have bigger impact on the quality of the result? Justify your claim.

Acknowledgments

The images and flow ground truth are from the Middlebury Optical Flow data set (Synthetic Grove2 and Urban2 images), which proclaims:
If you report performance results on our benchmark, we request that you cite
A Database and Evaluation Methodology for Optical Flow, published open access in International Journal of Computer Vision, 92(1):1-31, March 2011.
Copyright © 2010, 2012, 2015, 2019 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 International License.