Lab: Stereo Disparity

CSC 295 - Computer Vision - Weinman



Summary:
You will estimate depth via stereo disparity by finding the minimum SSD between small patches of a rectified stereo image pair.

Deliverables

Extra Credit

Preparation

Load the rectified stereo pair of images from the MathLAN:
/home/weinman/courses/CSC295/images/view1.png
/home/weinman/courses/CSC295/images/view5.png
To simplify our initial operations (though you may want to revisit this simplification later), convert both of these to grayscale doubles.
Load the ground truth disparity values from an image on the MathLAN:
/home/weinman/courses/CSC295/images/truedisp.png
The left and right images above have been scaled down by a factor of 3 so that they are easier to work with. You will need to scale down the disparity image accordingly. Divide the entries by 3, but be sure to convert them to doubles first using double (not im2double!). The entries in the true disparity that have values of zero are not at an infinite distance (as zero disparity would indicate) , but instead are occluded between views. We will discount these when we measure the error in our disparity predictions.

Method

We will be using the sum of squared differences (SSD) as a feature matching criterion, because it is very efficient and easy to do with convolutions. However, it does require a little bit of thought. Say we have two images of the same size, A and B. To measure the SSD between all 5×5 patches of the images, we need to calculate the squared differences C. This is simple to do in Matlab, with vectorized operations.
Next, we need to add up these squared differences. This can easily be done with a convolution over C. One option would be to use a 5×5 matrix of ones as the convolution kernel. However, this kernel is separable and we can do much, much better (especially with larger patches) if we use its separable components. To make locating the responses easier, we would use a convolution that returns a value the same size as C.
Once we have the convolution, each location in the result gives the SSD between A and B over a 5×5 patch centered at that location.
How does this help us find the disparity? We will need to first make sure that A and B represent the stereo pair of images, but with one of them translated by the hypothesized disparity amount. Of course, we are looking for the disparity that gives us the minimum SSD, so we need to try several values for the disparity, and then keep the one that gives us the smallest SSD. The remainder of the lab will guide you through the set-up for this problem. Then you will need to do some creative exploration to improve your results.

Exercises

A. Data Setup

  1. What is the largest disparity you can expect this data to have? Ask your ground truth image.
  2. As mentioned in the introduction, we need the pair of images we want to compare (via local SSD convolutions) to be the same size. However, if one of them is translated by the disparity, they may no longer be the same size. We will need to make sure we can fully handle this case by "enlarging" our images.
    Use the Matlab command padarray to add zeros to the end of both images (not the disparity). How many? As many as the largest disparity. The resulting images should look something like this:
    images/stereo-disparity-pad.png
  3. The provided command imtranslate (not a Matlab standard) returns an image of the same size as its input, but shifted (or translated) along the rows and columns a specified amount. Use this to translate the padded version of the right (view5) stereo image by 50 pixels horizontally, filling in the "new" pixels with zeros.
    Note: You will need to specify that imtranslate not introduce an extended border. The result should look something like this:
    images/stereo-disparity-pad-translate.png
  4. You now have a pair of images that are the same size, where one is translated by a (single) hypothesized disparity. Create a third image that is the squared difference between these two.
    Note: Don't forget to use vectorized component-wise operations.
  5. Finally, calculate the sum of these squared differences for 5×5 patches using separable filters. Ask for a result that is the same size as the input.

B. Multiple Disparities

In Part A, you found the SSD for a single disparity everywhere. We want to find the SSD for all possible disparities everywhere. After that, we can choose the best disparity at each location. To do this, we will construct an stack of SSDs whose first two dimensions are the locations in the padded images, but with an extra (third) dimension that corresponds to the disparity. Thus, the first slice is for a disparity of 1, the fiftieth slice is for a disparity of 50. Yes, this will be a large array, but fast vectorized operations in Matlab often come at the cost of space over the benefit of time.
  1. Create a for loop that counts down from the largest possible disparity to one (the smallest possible disparity).
  2. Inside your loop, generalize the imtranslate operation of A.3 so that it is specific to the current disparity indicated by the loop variable.
  3. Inside your loop calculate the SSD (just as in A.4 and A.5) for that disparity.
  4. Assign the result as the appropriate slice of your three-dimensional array (according to the loop variable).
Yes, running this loop might take a little while. But it's much faster than doing a for loop over locations, too.

C. Optimal Disparity

After running the loop you created in Part B, you should have a three-dimensional array, where the third-dimension is the value of the SSD for each disparity. Next we want to find the best disparity at each point. Thus, we will be taking the min of the array along the third dimension. Matlab's min command has the syntax
[C,I] = min(A, [], dim);
which returns in C the minimum of A along the dimension dim, and the (index) of that minimum value in I.
  1. Use min in the manner described above to find the lowest disparity at each row and column of the original image.
    Note: The extra padding you added in A.2 is now no longer needed. Thus, you only need to take the min using the original columns of the image.
    The result you get should be the same size as the true disparity image you loaded from truedisp.png.
  2. Use find to determine the linear (not subscript) indices of all non-zero values in truedisp.png.
  3. Calculate the RMS (root mean square) error between your prediction of the optimal disparity using SSD and the actual disparity, using only the non-zero (i.e., non-occluded) pixels in the actual disparity.

D. Parameter Adjustment

  1. It is possible that your SSD window is too small or too large. Repeat your method and RMS measurement as above for several other values of the window size.
  2. Create a plot of the RMS versus the SSD window sizes you try. Save this plot.
  3. How would you characterize what you found? What value(s) are optimal? Why do you suppose they are the best? Are there any other trade-offs involved?

E. Visualization

Single measures have limited usefulness. It is often better to get a better sense of the results by visualizing them. Fortunately, we can do that via Matlab's plotting capabilities.
You can simply take the disparity as the z-coordinate for a plot, with no need of x and y coordinates. In addition, it is useful to turn off the plotting of edges between grid points, since there are too many. Finally, we are displaying an image, rather than a function, so we need to tell Matlab to use image-based ij coordinates rather than the more common functional xy coordinates. All together:
surf(Z, 'EdgeColor','none');
axis ij;
  1. Use commands like the above to create a plot of the disparity. You may wish to use the figure window's "Rotate 3-D" option to visualize the structure indicated by the disparity map.
  2. Use a similar process to visualize your predicted disparity in 3-D.
  3. Create a 2-D version of the true disparity by using imshow (with relaxed bounds) and colormap jet to get the same colorful visualization that is in the 3-D surface. Save this image.
  4. In the places where the true disparity is zero, set the values in your hypothesized disparity to 0.
  5. Create a (2-D) image of your (adjusted) hypothesized disparity using the window size with the best RMS and save it.
  6. How do the images compare visually? Analyze your results, making as many observations and conjectures as you can.

F. Handling Noise

You may have noticed that there are some strange outliers in your reconstruction. Depth (disparity) images are like any others in that they can suffer from "noise" effects. Here we will try some "post-processing" on the disparity results to see if they can be improved.
  1. Apply at least two noise-reduction methods to your disparity image. Think about whether you want to apply them before or after you set values to zero.
  2. What are the RMS values (excluding occluded pixels in the RMS calculation) for each processed result? Is either an improvement over the best result in D.2?
  3. For at least one of the methods you tried, experiment with at least one of its parameters. Plot the RMS of the post-processed version versus the parameter values you have tried. Save this plot.
  4. What conclusions do you draw from your numeric results?
  5. Create an image of your "best" post-processed disparity map as you did in E.3 and save it.
  6. How does the visualization of disparity compare to your original? Is this an improvement? Why or why not?

Extra Credit

One key observation by Kanade and Okutomi in "A Stereo Matching Algorithm with an Adaptive Window: Theory and Experiments" (IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, pp. 920-932, 1994) is that square SSD windows are not always best. When measuring the disparity over a region that is at an object boundary (and thus a disparity discontinuity), we may want to minimize the SSD using a window that only includes part of the object.
See if you can improve your reconstruction by using a set of 9 windows: The basic idea is to still loop over all the possible disparities, but also loop over all 9 possible windows. However, you are now measuring the SSD using kernels with different numbers of items, so you cannot simply take the min. Your kernels will need to be "normalized" for the number of elements they are summing the differences over. The end result is thus still the same in spirit: find the configuration (over disparity and window shape) that gives the best result.

Acknowledgments

The stereo and disparity images are from the 2005 Middlebury stereo dataset (Art image), which proclaims:
We grant permission to use and publish all images and disparity maps on this website. However, if you use our datasets, we request that you cite the appropriate paper(s):
Christopher J. Pal, Jerod J. Weinman, Lam C. Tran and Daniel Scharstein. (2012). On Learning Conditional Random Fields for Stereo: Exploring Model Structures and Approximate Inference. International Journal of Computer Vision, 99(3), 319-337.
Copyright © 2010, 2012 Jerod Weinman.
ccbyncsa.png
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.