| Literature Reviews CSC499-2010U |
Abstract—Scene text recognition (STR) is the recognition of text anywhere in the environment, such as signs and store fronts. Relative to document recognition, it is challenging because of font variability, minimal language context, and uncontrolled conditions. Much information available to solve this problem is frequently ignored or used sequentially. Similarity between character images is often overlooked as useful information. Because of language priors, a recognizer may assign different labels to identical characters. Directly comparing characters to each other, rather than only a model, helps ensure that similar instances receive the same label. Lexicons improve recognition accuracy but are used post hoc. We introduce a probabilistic model for STR that integrates similarity, language properties, and lexical decision. Inference is accelerated with sparse belief propagation, a bottom-up method for shortening messages by reducing the dependency between weakly supported hypotheses. By fusing information sources in one model, we eliminate unrecoverable errors that result from sequential processing, improving accuracy. In experimental results recognizing text from images of signs in outdoor scenes, incorporating similarity reduces character recognition error by 19%, the lexicon reduces word recognition error by 35%, and sparse belief propagation reduces the lexicon words considered by 99.9% with a 12X speedup and no loss in accuracy.
Cheap and versatile cameras make it possible to easily and quickly capture a wide variety of documents. However, low resolution cameras present a challenge to OCR because it is virtually impossible to do character segmentation independently from recognition. In this paper we solve these problems simultaneously by applying methods borrowed from cursive handwriting recognition. To achieve maximum robustness, we use a machine learning approach based on a convolutional neural network. When our system is combined with a language model using dynamic programming, the overall performance is in the vicinity of 80-95% word accuracy on pages captured with a 1024x768 webcam and 10-point text.
The following are the papers proposed by each of you.
This paper describes tools for character string recognition on maps. Single character recognition is performed using elliptical Fourier descriptors applying a statistical classifier. The recognized characters are grouped into strings, and the syntax of these strings are tien analysed to detect and correct errors. As training of the classifier is essential, tools for manual and automatic training and updating are included.
Map images are composed of semantic layers depicted in arbitrary color. Layer extraction and removal is often needed for improving readability as well as for further processing. When image is separated into the set of layers with respect to the colors, it results in appearance of severe artifacts because of the layer overlapping. In this way the extracted layers differ from the semantic data, which affects further map image processing analysis tasks. In this work, we introduce techniques for extraction and removal of the semantic layers from the map images. The techniques utilize low-complexity morphological image restoration algorithms. The restoration provides good quality of the reconstructed layers, and alleviates the affect of artifacts on the precision of image analysis tasks.
Many maps [an important source of information for efficient spatial data evaluation using a Geographic Information System (GIS)] must still be digitized manually, a time-consuming and error-prone process. Therefore, we developed the processing of maps (PROMAP) system, which incorporates adequate image analysis. The system can generate a symbolic description of the map that can be imported into a GIS. A color scanner generates a multicolor raster image of the map. This image is split into layers of predefined map colors. Each layer is vectorized and methods like neural network-based symbol and object recognition for the extraction of attributed structure primitives and knowledge-directed image interpretation are applied. The map scene is structured hierarchically. The interface to the GIS is represented by the map objects at the upper levels of hierarchy. These investigations described are part of the interdisciplinary projectEnvironmental Planning System. The scope of this project is the combination of data acquisition, the development of an evaluation scheme, and GIS in an integrated concept.
Over several years, the demand for digital data has grown as geographical information systems were implemented. Because of the cost of database creation, data acquisition should unquestionably be automated as much as possible. Today, scanning devices produce a huge amount of data that still needs complex processing. This paper proposes an original approach using image processing tools coming from Mathematical Morphology theory to acquire GIS data from scanned thematic maps. These tools are used in order to obtain a segmentation prior to radiometric analysis. To illustrate the methodology, a subset of the Belgian soil map is treated.
A method to separate and recognize the touching/overlapping alphanumeric characters is proposed. The characters are processed in raster-scanned color cartographic maps. The map is segmented first to extract all text strings including those that are touching other symbols, strokes and characters. Second, OCR-based recognition with Artificial Neural Networks (ANN) is applied to define the coordinates, size and orientation of alphanumeric character strings in each case presented in the map. Third, four straight lines or a number of “curves” computed as a function of primarily recognized by ANN characters are extrapolated to separate those symbols that are attached. Finally, the separated characters input into ANN again to be finally identified. Results showed high method’s rendering in the context of raster-to-vector conversion of color cartographic images.
A system named MAGELLAN (denoting Map Acquisition of GEographic Labels by Legend ANalysis) is described that utilizes the symbolic knowledge found in the legend of the map to drive geographic symbol (or label) recognition. MAGELLAN first scans the geographic symbol layer(s) of the map. The legend of the map is located and segmented. The geographic symbols (i.e., labels) are identified, and their semantic meaning is attached. An initial training set library is constructed based on this information. The training set library is subsequently used to classify geographic symbols in input maps using statistical pattern recognition. User interaction is required at first to assist in constructing the training set library to account for variability in the symbols. The training set library is built dynamically by entering only instances that add information to it. MAGELLAN then proceeds to identify the geographic symbols in the input maps automatically. MAGELLAN can be fine-tuned by the user to suit specific needs. Recognition rates of over 93% were achieved in an experimental study on a large amount of data.
Hierarchical spatial data structures offer the distinct advantages of data compression and fast access, but are difficult to adapt to the globe. Following Dutton, we propose projecting the globe onto an octahedron and then recursively subdividing each of its eight triangular faces into four triangles. We provide procedures for addressing the hierarchy and for computing addresses in the hierarchical structure from latitude and longitude and vice versa. At any level in the hierarchy the finite elements are all triangles, but are only approximately equal in area and shape; we provide methods for computing area and for finding the addresses of neighboring triangles.
In this paper, we present a general methodology for the generation of digital elevation models (DEMs) starting from scanned topographic maps. We concentrate on the extraction and filtering of the contour lines from the input maps. This is a difficult problem due to the presence of complex textured backgrounds and information layers overlaid on the elevation lines (e.g., grid lines, toponymy, etc.). Results are presented on a wide variety of samples extracted from a 1:50000 plate scanned at 300 DPI.
Topographic paper maps are a common support for geographical information. In the field of document analysis of this kind of support, this paper proposes an automatic approach to extract and recognize toponyms. We present a technique based on image segmentation and connected component processing. Different filtering stages ensure the consistency of plausible characters and strings. Detected text areas are used to feed an OCR software and the recognized words are analyzed and corrected. The main advantage of our technique is that no assumption is made about the character font, size or orientation. Experimental results obtained are encouraging in term of recognition efficiency.
At present a lot of methods and programs for automatic text recognition exist. However there are no effective text recognition systems for graphic documents. Graphic documents usually contain a great variety of textual information. As a rule the text appears in arbitrary spatial positions, in different fonts, sizes and colors. The text can touch and overlap graphic symbols. The text meaning is semantically much more ambiguous in comparison with standard text. To recognize a text of graphical documents, it is necessary first to separate it from linear objects, solids, and symbols and to define its orientation. Even so, the recognition programs nearly always produce errors. In the context of raster-to-vector conversion of graphical documents, the problem of text recognition is of special interest, because textual information can be used for verification of vectorization results (post-processing). In this work, we propose a method that combines OCRbased text recognition in raster-scanned maps with heuristics specially adapted for cartographic data to resolve the recognition ambiguities using, among other information sources, the spatial object relationships. Our goal is to form in the vector thematic layers geographically meaningful words correctly attached to the cartographic objects.
Graphical documents such as cartographic maps contain a great variety of textual elements appearing in different spatial positions, in different fonts, sizes, and colors, touching and overlapping graphical symbols. This greatly complicates automatic optical recognition of such textual elements in the process of raster-to-vector conversion of graphical documents. In this work, we propose a method that combines OCR-based text recognition in rasterscanned maps with heuristics specially adapted for cartographic data to resolve the recognition ambiguities using various sources of evidence. Our goal is to form in the vector thematic layers geographically meaningful words correctly attached to the cartographic objects.
DIGMAP is a project focused on historical digitized maps that will develop a set of Internet services based on reusable open-source software solutions. The main service will provide discovery and access to resources related to historical cartography, based on metadata from European national libraries and other relevant third part providers. These resources will comprise both physical and digitized objects. In the case of digitized maps, available metadata will be enriched by automatic and semi-automatic processes that will try to extract relevant indexing information from the images of the digitized maps, as also from any kind of associated text. This paper presents an early overview on the project, particularly focusing on the aspects related to geographical information retrieval.