Research ProjectsJerod Weinman < CompSci < Grinnell

Multimodal Keypoint Detection (2022–2023)

We adapt a multimodal (text+image) object detection model to the keypoint detection task and demonstrate that adding richer descriptions of the keypoints improves performance and reduces train time. Read more ...

Paper: ICVS23.

Self-Supervised Depth Prediction (2019–2021)

Map Processing (2010–2020+)

By leveraging a gazetteer of place names (toponyms) and their locations, we can improve OCR on historical map images by inferring an alignment between the gazetteer and words on a map. Given such an alignment, calculating posterior probabilities significantly reduces word error. Read more ...

Papers: SIGSPATIAL19, ICDAR19, ICDAR17, ICDAR13, Technical Report.

Wearable Aid for the Blind (2004–2014)

Developing algorithms as part of a wearable aid for the blind called VIDI (Visual Information Dissemination for the Impaired) involved several contributions and subprojects toward this end.

Text and Sign Detection

Scene Image Detected Sign Detected Sign Scene Image Detected Sign

Using a contextual model to eliminate isolated false positives and more fully cover all regions of a detected sign, we are able to robustly detect text and logos with arbitrary sizes and layouts in complex scenes. Read more ...

Papers: MLSP04, CVAVI05, Master's Thesis.

Robust Recognition

By integrating character appearance models more closely with statistical language and lexicon models, as well as a locally adaptive font model, we can more reliably recognize characters from signs in different fonts. Read more ...

Papers: CVPR06, ICDAR07, ICPR08, PAMI09, ICPR10.

Joint Detection and Recognition

By asking the "where?" and "what?" questions of finding and identifying text simultaneously during learning, we can be faster or more accurate than the usual method of learning these independently. Read more ...

Papers: Tech Report UM-CS-2006-054

Portions of this work have been funded by NSF grant numbers IIS-0100851, IIS-0326249, and IIS-0546666 as well as the Central Intelligence Agency and the National Security Agency.

Faster Learning for Stereo (2006–2010)

One Image of a Stereo Pair Estimated Disparity

Learning algorithms that rely on message-passing algorithms can be very slow when state spaces are large, as in most stereo disparity estimation problems. We improve stereo predictions by using an algorithm that can capture the appropriate amount of uncertainty during learning. Read more ...

Papers: Tech Report UM-CS-2007-054, ECCV08, IJCV10

Learning Mobile Background Subtraction (2007)

Robot Head Input Image Stored Background Image Calculated Foreground Image

By storing background images for a stationary robot with a moveable vision system, we can perform an efficient matching of the current view to known views with a local deformation model to overcome parallax. The robot learns to perform novel object detection in the presence of camera motion. Read more ...

Papers: ICRA07

Portions of this work have been funded by NSF grant number IIS-0546666 and MIT/NASA cooperative agreement NNJ05HB61A.

Stroke Segmentation (2001–2003)

MRI Segmentation Confidence Segmentation Surface

The problems of noisy magnetic resonance (MR) images, ambiguous boundaries, along with unknown and irregular shape, must be overcome to accurately estimate the volume of brain tissue affected by ischemic stroke. We use multiple segmentation parameters to create a confidence map that can closely match physician segmentations. Read more ...

Papers: MICCAI03, Tech. Report UM-CS-2003-017.