Sign RecognitionResearch < Jerod Weinman < CompSci < Grinnell

Motivation

Sign Image traditional Asi
Hcalipg Arts
Figure 1. An image of a sign (left) and the output (right) of a top commercial OCR system on a binarized version of the image.

Most commercial OCR systems geared for document recognition cannot handle the wider variety of fonts found in many street scenes and signs, as demonstrated by the example above.

We address this by adding two components to a robust character recognizer trained on many fonts, one that adapts to the specific font in a sign by using character similarity, and another that intelligently incorporates a lexicon without slowing the system. Unlike previous work, all the information sources are unified in a single model for recognizing characters.

Character Similarity

Similarity Example
Figure 2. Labels given by the OCR software are inconsistent.

Although many fonts can be quite difficult, there is one bit of information that can be easily incorporated: whether two characters appear the same or not. Regardless of the label that is given, two characters that appear the same should be given the same label, and two characters that appear different should be given different labels.

Earlier work has incorporated this for document recognition, where thousands of example characters available, but in sign recognition only a handful are present.

We begin by learning a simple "similarity" function that identifies whether characters are the same or different, regardless of identity.

Similarity Model

Similarity Model
Figure 3. A unified model for robustly recognizing characters y from an image. Black boxes capture relationships between the image and character identity, blue boxes between neighboring y capture language information such as bigrams, and red factors account for (dis)similarities between character images for jointly labeling the string.

Whereas prior work clustered characters prior to recognition, our unified model uses all the information, including similarity, simultaneously. This prevents unrecoverable errors from a clustering step.

Example Results

Example Signs
The signs we are able to recognize correctly have various fonts, backgrounds, and lighting conditions.

Lexicon Incorporation

Using a lexicon can greatly boost recognition rates, but this often comes at the expense of efficiency when the lexicon is large. We have proposed an addition to the model that incorporates the lexicon as a unified part, rather than post processing, without reducing the speed of the system.

By using a sparse inference method, only words with moderate support from the character appearance and local language models become candidates. This is done by using a constrained optimization technique that eliminates as many lexicon words as possible, without deviating too much from the original probability approximation.

Lexicon Model

Lexicon Model
Figure 4. Model incorporating a lexicon for one word. The magenta box can constrain the string y to be drawn from a lexicon, while the cyan box captures the bias for lexicon words.

By introducing an auxiliary variable that indicates whether a particular string is from the lexicon, we can learn a bias for predicting known words. Note that this is integrated with the local language and generic character recognition models.

Thus, appearance is taken into consideration, unlike in postprocessing approaches. We have also joined the lexicon and similarity-based recognition models to further improve performance.

Example Results

31 BOLTWOOD
Sign Sign Sign Sign Sign
No Lexicon USED BOOKI THOR UP5 Free crecking 31 BOLTWOOD
Lexicon USED BOOKS HOOK UPS Free checking
Forced Lexicon USED BOOKS HOOK UPS Free creaking SI BENTWOOD

Using a lexicon corrects many errors. In addition, our model prevents false corrections by allowing the predicted words to be drawn from strings outside the lexicon.

Data

The sign images, and character box annotations used for recognition in these papers are available. If this data is used in a publication, please cite the 2009 PAMI paper above.

The font images used as the basis for training the character recognition algorithms are available. The font size is 100px, and the images are generated in a 128x128 pixel window, all at the same baseline, with no anti-aliasing. Images were generated using the GIMP.

Fonts were restricted to those that had "normal" lowercase and uppercase letters, with no graphics. The characters [A-Za-z0-9] appear in the set letters while all other keyboard (PC104) characters appear in the set symbol. To accomodate the character "/" in a Unix file system, that character in the filename was replaced with the (non-keyboard) character "¬".

  • Letters (tar.bz2 format; 62 characters in 1866 fonts; 35M)
  • Symbols (tar.bz2 format; 32 symbols in 1866 fonts; 11M)