Research ProjectsJerod Weinman < CompSci < Grinnell

Multimodal Keypoint Detection (2022–2023)

We adapt a multimodal (text+image) object detection model to the keypoint detection task and demonstrate that adding richer descriptions of the keypoints improves performance and reduces train time. Read more ...

Paper: ICVS23.

Self-Supervised Depth Prediction (2019–2021)

Diagram of egomotion depth estimator learning

Monocular depth estimation can be learned from egomotion or stereo pairs. We improve this process through a novel loss function that encourages the model to predict positive depths for visible points and a GPU-based differentiable Z-buffering algorithm for quickly rendering transformed point clouds in the training loop.

Paper: CRV21.

Map Processing (2010–2020+)

By leveraging a gazetteer of place names (toponyms) and their locations, we can improve OCR on historical map images by inferring an alignment between the gazetteer and words on a map. Given such an alignment, calculating posterior probabilities significantly reduces word error. Read more ...

Papers: ICDAR25, SIGSPATIAL19, ICDAR19, ICDAR17, ICDAR13, Technical Report.

Reading Text In the Wild (2004–2026)

Developing algorithms as part of a wearable aid for the blind called VIDI (Visual Information Dissemination for the Impaired) involved several contributions and subprojects toward this end.

Text and Sign Detection

Using a contextual model to eliminate isolated false positives and more fully cover all regions of a detected sign, we are able to robustly detect text and logos with arbitrary sizes and layouts in complex scenes. Read more ...

Papers: MLSP04, CVAVI05, Master's Thesis.

Robust Recognition

By integrating character appearance models more closely with statistical language and lexicon models, as well as a locally adaptive font model, we can more reliably recognize characters from signs in different fonts. Read more ...

Papers: CVPR06, ICDAR07, ICPR08, PAMI09, ICPR10, PAMI14, WACV26.

Joint Detection and Recognition

By asking the "where?" and "what?" questions of finding and identifying text simultaneously during learning, we can be faster or more accurate than the usual method of learning these independently. Read more ...

Papers: Tech Report UM-CS-2006-054

Portions of this work have been funded by NSF grant numbers IIS-0100851, IIS-0326249, and IIS-0546666 as well as the Central Intelligence Agency and the National Security Agency.

Faster Learning for Stereo (2006–2010)

Learning algorithms that rely on message-passing algorithms can be very slow when state spaces are large, as in most stereo disparity estimation problems. We improve stereo predictions by using an algorithm that can capture the appropriate amount of uncertainty during learning. Read more ...

Papers: Tech Report UM-CS-2007-054, ECCV08, IJCV10

Learning Mobile Background Subtraction (2007)

By storing background images for a stationary robot with a moveable vision system, we can perform an efficient matching of the current view to known views with a local deformation model to overcome parallax. The robot learns to perform novel object detection in the presence of camera motion. Read more ...

Papers: ICRA07

Portions of this work have been funded by NSF grant number IIS-0546666 and MIT/NASA cooperative agreement NNJ05HB61A.

Stroke Segmentation (2001–2003)

The problems of noisy magnetic resonance (MR) images, ambiguous boundaries, along with unknown and irregular shape, must be overcome to accurately estimate the volume of brain tissue affected by ischemic stroke. We use multiple segmentation parameters to create a confidence map that can closely match physician segmentations. Read more ...

Papers: MICCAI03, Tech. Report UM-CS-2003-017.

Supported by the UMass-Amherst/Baystate Medical Center Collaborative Biomedical Research Center.

Collaborators

Dimosthenis Karatzas (U. Autònoma de Barcelona)
Yao-Yi Chiang (U. Minnesota)
Zekun Li (U. Minnesota)
Leeje Jang (U. Minnesota)
Yijun Lin (U. Minnesota)
Serge Belongie (U. Copenhagen)
Stella Frank (U. Copenhagen)
Nicholas Howe (Smith College)
Jacqueline Feild (U. Mass)
Allen Hanson (U. Mass)
Erik Learned-Miller (U. Mass)
Chris Pal (École Polytechnique de Montréal)
Andrew McCallum (U. Mass)
Daniel Scharstein (Middlebury College)
Piyanuch Silapachote (U. Mass)
Marwan A. Mattar (U. Mass)
Dima Lisin (MathWorks)
George Bissias (U. Mass)
Sai Ravela (MIT)
Edward M. Riseman (U. Mass)

Undergraduates

Charles Frantz '11 (BrightTag)
Jeff Leep '11 (Epic)
Augustus "Jay" Lidaka '10 (Microsoft)
Shitanshu Aggarwal '11 (Amazon)
Ravi Chande '11 (Microsoft)
Dylan Gumm '11 (Epic)
Zachary Butler '13 (Univ. California Irvine, PhD)
Dugan Knoll '12 (Showyou/Vevo)
Pelle Hall '14 (Univ. Michigan Ann Arbor, PhD)
Chenheli Hua '15 (Google)
Yirong Jing '14 (Univ. Virginia, MSc)
Alex Turner '16 (Oregon State, PhD)
Cuong Nguyen Tu Manh '14 (Blackrock)
Kitt Nika '16 (PNC Bank)
Shen Zhang '15.5 (Google)
Bo Wang '16 (Brown, MSc)
Toby Baratta '17 (Microsoft)
Matt Murphy '18 (Facebook)
Ziwen Chen '20 (Oregon State Univ., PhD)
Ben Gafford '20 (Carnegie Mellon Univ., PhD)
Nathan Gifford '20 (Amazon)
Abyaya Lamsal '19 (Bank of America)
Liam Niehus-Staab '20 (Honey/Paypal)
John Gouwar '21 (Northeastern, Univ., PhD)
Aabid Shamji '20 (Univ. Iowa, MSc)
Nhi Ngo '20 (Univ. Iowa, MSc)
Stefan Ilic '21 (MathWorks)
Zixuan Guo '22 (Meta)