Learning Mobile Background SubtractionResearch < Jerod Weinman < CompSci < Grinnell


Robot Head
Figure 1. BiSight head with pan, tilt, and zoom capability.

It is typically useful for a robot vision system to know the orientation of its view. However, typical view localization techniques such as initialization or sensor encoding are slow or prone to failure.

We would also like the vision system to detect new objects in its environment. Background subtraction is a simple technique for sensing changes in a video stream, but it is typically only used for fixed cameras. When the camera system moves, standard background subtraction techniques for detecting new objects will usually fail.


Figure 1. Overview of our approach to mobile background subtraction and orientation.

We propose a persistent "visual memory" for a moveable camera system in a relatively stable environment that can be used to provide visual orientation and discover new objects present in the field of view.

Our system is able to orient itself by finding the closest reference image from a known view. This uses matching technique robust to parallax and occlusions caused by new objects.

Once the best reference image is found and aligned to the query with a local deformation model, the pair of aligned images can be used to detect differences. A global alignment will not work well due to the sparse nature of the memory store and parallax of objects in the scene viewed from slightly different angles.

Learning Background Subtraction

Like our work on sign detection, we use a contextual model for detecting novel objects in an input image. We learn a probabilistic model

Pr ( Foreground | Query Image, Reference Image, θ )

that uses image differences between the query and aligned reference as features, while promoting the expected spatial continuity of both background and foreground objects.

Qualitatively, this approach works better than a strictly local difference threshold.


Query Image Background Image Global Local Contextual
Query Image Background Image Global Local Contextual
Query Stored Global Local Contextual

The results above show the input query image and the closest stored match. The middle column shows the results of using a global alignment procedure with a simple threshold on the image difference. The second column from the right shows the result of using our local alignment procedure with a learned probability model that uses no spatial continuity information; this is essentially a learned threshold on the image difference.

The rightmost column shows the result of using our contextual probability model on the difference between images aligned with our local deformation technique to predict the location of foreground objects. Our contextual model performs well at eliminating spurious foreground detections and filling in holes on true foreground objects left by models that only use local information.

Related Papers

  • Techniques and Applications for Persistent Backgrounding in a Humanoid Torso Robot, with D. W. Duhon and E. Learned-Miller. IEEE Intl. Conference on Robotics and Automation (ICRA), April 2007 [PDF] [bib] [doi].