Summer 2018 MAP Projects

Jerod Weinman

Abstract

This document provides a background on summer 499 (MAP) for Grinnell students and the high expectations I have for my summer students.

1 Introduction
2 Project Overview
    2.1 Background
    2.2 Synthetic Map Data Generation
    2.3 Using Dynamic Training Data
3 Approximate Schedule
4 Activities
    4.1 Spring
    4.2 Summer
    4.3 Fall and Beyond

1 Introduction

The general focus of my research is in machine learning for computer vision. Because reconstructing a 3-D image from a 2-D projection is a difficult inference problem, some computational machinery is necessary. Furthermore, understanding and extracting meaning from images is a problem that has been solved by humans, but remains elusive for machines. Because it is nearly impossible to specify and hand-code models for these tasks, machines must be endowed with some amount of learning capabilities.

The application context of the projects for this summer is character recognition in historical maps. While character recognition is one of the oldest problems in pattern recognition, the most general form of the problem-automated recognition that matches human performance in any situation-is still very far from solved. My general aim is to advance information availability with automated character recognition. We will examine the training process of a text recognition pipeline this summer.

2 Project Overview

2.1 Background

The application for the project, already underway, is recognizing place names (toponyms) on historical map images. While many old maps are being scanned and distributed online, their contents remain largely impenetrable to automated search. This project works change that by automatically detecting and recognizing the text in scanned map images to enable indexing and search of these images the same way we now search web pages and (more recently) digitized books.

The following resources describe my prior work in this area in (in order of increasing detail):

Project Overview Web Page
Publication Slides
J. Weinman. Geographic and Style Models for Historical Map Alignment and Toponym Recognition. In Proc. International Conference on Document Analysis and Recognition (ICDAR), November 2017.

The work described above uses a parser dating to my dissertation research with a newer convolutional neural network-based character recognition module. That system is largely brittle and slow, so I developed an end-to-end word recognition system based on a recently published model by others.

J. Weinman. Tensorflow-based CNN+LSTM trained with CTC-loss for OCR.
B. Shi, X. Bai, & C. Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. CoRR:1507.05717.

Projects for the summer will involve improving the training stage of the text recognizer. Because the network has over fifteen million parameters, it requires significant quantities of training data.

2.2 Synthetic Map Data Generation

The model above is trained with static synthetic data generated for scene text recognition. This static dataset of 9 million images is over 10 Gb and is not wholly appropriate for recognizing map text because it lacks the same background layers and typography. Moreover, its static nature means that a highly-parameterized network could learn the specifics of the dataset, leading to poor generalization performance on previously unseen examples.

Rather than learn from scene text data, this project will generate synthetic data that provides a better match for map text recognition. The synthesis process must be efficient (exceeding 100 Hz) to outpace with the trainer's consumption rate.

The starting point would likely be this contributed OpenCV module for text synthesis, which has a simple demo.. While the module uses the Qt5 framework to render text, alternative text rendering engines might include Pango, or Graphite.

Creative, detail-oriented students who have taken CSC 207 and either MAT 215 or CSC 213 (or their equivalents) will be good candidates for this project. Willingness (and a demonstrated ability) to learn C++ before the start of the project is required.

2.3 Using Dynamic Training Data

While the first project will focus on generating text image data, this project will work to feed that dynamically generated images to the model trainer, which uses several threads to read and fill data buffers in parallel. This buffer-feeding parallelism was originally designed to keep the training process supplied with data in the face of large latency in reading data from disk.

This project will involve significant independent study and learning about the Tensorflow platform's data-reading mechanisms, which are primarily designed for reading files from disk, not from memory. (The separate so-called feed_dict mechanism reads from memory but may be too inefficient.)

The following Tensorflow guides provide a starting point for learning about the "current" architecture (which may soon be deprecated).

A new data input API called Dataset may provide the necessary functionality. Your job could be to figure out how to make it so.

There is also a lengthy thread discussing the development of the new Dataset API, with a summary comment linking to a few potentially related solutions here and here.

If trying to come to grips with these discussions, sample code, and sparsely documented, tutorial-free, bleeding-edge APIs frightens you, this project is probably not for you. However, detail-oriented students who are undaunted by sheer platform complexity and who have taken CSC 207 and CSC 213 (or their equivalents) will be good candidates for this project. Willingness (and a demonstrated ability) to learn C++ and Python before the start of the project is required.

3 Approximate Schedule

This schedule largely follows that officially approved by the division. However, since other (off-campus) options have different schedules, I will need to know if you are considering other opportunities, what the schedule is, and whether you are likely to choose an off-campus opportunity if accepted.

February 23:: Application forms due. You must submit the division-wide form online and your responses to my questions to me.
March 9:: Initial selections announced (provided that the college has approved funding) via e-mail.
March 16:: Decisions due.
Week of 2 April:: First meeting.
Unspecified other dates:: Additional meetings.
April 23:: Draft MAP proposal due.
April 30:: Revised MAP proposal due.
May 7:: Final MAP proposal due.
May 22:: Commencement.
May 25:: Brief literature survey due.
June 8:: (or earlier): Other background preparation (e.g., languages, advanced topics) completed.
June 11:: Summer research begins. (Tentative date)
August 18:: Summer research concludes. (Ten weeks hence)

4 Activities

This this section is largely an overview; you may find many more details in the syllabus.

4.1 Spring

Topic Preparation

You are expected to begin your background research during the spring. In particular, you must identify at least four scientific papers on related projects. You are also encouraged to use the web to aid your search. Some useful resources are:

Some of the related conferences to find this work are

CVPR: Computer Vision and Pattern Recognition
ICDAR: International Conference on Document Analysis and Recognition
DAS: IAPR International Workshop on Document Analysis Systems
GREC: IAPR International Workshop on Graphics Recognition
ICCV: International Conference on Computer Vision
ECCV: European Conference on Computer Vision
ICPR: International Conference on Pattern Recognition
ICIP: International Conference on Image Processing
ICASSP: International Conference on Acoustics, Speech, and Signal Processing

and some related journals include

PAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCV: International Journal on Computer Vision
IJDAR: International Journal on Document Analysis and Recognition
TIP: IEEE Transactions on Image Processing

though there are of course many, many others. Once you have identified potentially useful resources, if you cannot find an author preprint online (they nearly always are), consult with the librarians about obtaining a copy of an article or conference paper.

You will email me your list of papers (with complete citations) by the date above.

Skill Preparation

If your project will require a programming language, data interface, or library that you do not yet know, you are expected to begin learning them. You need not master any of them, but should develop comfort and familiarity.

4.2 Summer

During the summer, you are expected to work full-time on the project (40 hours per week for ten weeks). This work will include regularly scheduled group meetings. See the syllabus for more information.

4.3 Fall and Beyond

Poster Presentation

You will create a poster describing your work and present it at the Grinnell Science Poster Seminar (typically during parents' weekend).

Internal Public Presentation

You will give a twenty-five or fifty minute presentation on your work as part of the Computer Science Department's Thursday Extras series.

External Conference Presentation

If your work is submitted to and accepted by a conference you are expected to attend and present your work. (Funding is available from the Dean's office for you to attend the conference.)

External Pew Presentation

You may submit your work to the Pew Midstates Science and Mathematics Consortium Fall Symposium on Undergraduate Research in the Physical and Mathematical Sciences. If your work is accepted, you must attend the symposium (including non-CS talks) and present your work (in poster or talk form). You must give at least one practice talk before going to the conference.

Acknowledgement

With thanks to Professor Sam Rebelsky for many elements of Section 4.