Summer 2019 MAP Projects

Jerod Weinman

Abstract

This document provides a background on summer 499 (MAP) for Grinnell students and the high expectations I have for my summer students.

Contents

1  Introduction
2  Project Overview
    2.1  Background
    2.2  Improving Text Detection
    2.3  Aligning Maps
3  Approximate Schedule
4  Activities
    4.1  Spring
    4.2  Summer
    4.3  Fall and Beyond

1  Introduction

The general focus of my research is in machine learning for computer vision. Because reconstructing a 3-D image from a 2-D projection is a difficult inference problem, some computational machinery is necessary. Furthermore, understanding and extracting meaning from images is a problem that has been solved by humans, but remains elusive for machines. Because it is nearly impossible to specify and hand-code models for these tasks, machines must be endowed with some amount of learning capabilities.
The application context of the projects for this summer is extracting information from historical maps. The projects will focus on finding text, and aligning map image contents with known political boundaries.

2  Project Overview

2.1  Background

The application for the project, already underway, is recognizing place names (toponyms) on historical map images. While many old maps are being scanned and distributed online, their contents remain largely impenetrable to automated search. This project works change that by automatically detecting and recognizing the text in scanned map images to enable indexing and search of these images the same way we now search web pages and (more recently) digitized books.
The following resources describe some of my prior work in this area in (in order of increasing detail): The work described above provides some background context. Ongoing work in progress has also added a computation stage dedicating to detecting the text in maps. Its pipeline is adapted from the following paper. Projects for the summer will involve refining our existing text detection work and starting a new sub-project using graphical information to align map images to GIS hydrography and border data.

2.2  Improving Text Detection

Our current approach to extracting the text regions from maps suffers from two primary limitations. The first is that it does not deal well with curved text (that is, text written along a curve, rather than a straight baseline). This is primarily because the current model can only produce a rotated rectangle to cover a region it thinks is a word. This project will explore alternative formulations or follow-up strategies to remedy this problem. For example, several rectangles might be linked (based on learned criteria) to indicate they cover a single word, or the model might be altered to produce curved regions from the start.
The second limitation of separating text regions from a map image is that the predicted rectangle often fails to cover the complete extent of a word, essentially chopping off some leading or trailing characters. The likely cause for this behavior is that the effective field of view for decision is too narrow. At each image location, the model predicts whether there is text at that location, and if so, what the size of the rectangle is around that text, relative to the current location. In theory the model has information about features of the image in large radius around the location, but recent work has shown that in practice, the effective radius of information (called the receptive field) is drastically smaller. Another approach might explore alternative model architectures to improve the effective receptive field size or develop secondary iterative search procedures to expand the (often limited) initial prediction made by the current model.
Recent work on a related problem of finding and reading text in arbitrary photographs has shown that integrating the detection and recognition process during model learning improves results. Unfortunately, voluminous quantities of training data are required to leverage this fact. Previous MAP students dealt with this problem in part by creating a mechanism for synthesizing artificial map text for training a text recognition module. Another project would expand the capabilities of the synthesis system to generate artificial data suitable for training a text detector with the ultimate goal of training an end-to-end reader that seamlessly detects and recognizes text.
semantic_rects.png det_rects.png
Human labeled text regions (left) and automatic text detections (right) with text baseline orientation in blue. (Original map Copyright Cartography Associates; creative commons licensed.)
Good candidates for this project will be students who have taken (in approximate order of preference) CSC 262, MAT 215, CSC 301, or CSC 207. Willingness (and a demonstrated ability) to learn Python and Tensorflow before the start of the project is required.

2.3  Aligning Maps

Another student project related to the same overall task of extracting information from maps but orthogonal to those presented so far involves using modern day political and geographic boundaries from a GIS system to align map images to world atlases. Previous project work used the placement of the text on a map and the matching of this text to real-world place names to infer the function that would convert a map image pixel location to a latitude and longitude. Because the text is relatively sparse on a map, this calculation is imperfect and would be improved by using the cartographic contents of the image.
linearlik.png boundary.png
Models for georeferenced linear features: rivers (left) and boundaries (right). (Original maps Copyright Cartography Associates; creative commons licensed.)
Good candidates for this project will be students who have taken (in approximate order of preference) CSC 262, MAT 215, CSC 301, or CSC 207. Willingness (and a demonstrated ability) to learn Matlab before the start of the project is required.

3  Approximate Schedule

This schedule largely follows that officially approved by the division. However, since other (off-campus) options have different schedules, I will need to know if you are considering other opportunities, what the schedule is, and whether you are likely to choose an off-campus opportunity if accepted.
March 3:
Applications due. Submit responses to my questions to me.
March 8:
Initial selections announced (pending funding approval by the College) via e-mail.
March 15:
Decisions due.
Week of 2 April:
First meeting.
Unspecified other dates:
Additional meetings.
April 22:
Draft MAP proposal due.
April 29:
Revised MAP proposal due.
May 6:
Final MAP proposal due.
May 20:
Commencement.
May 21:
Brief literature survey due.
May 24:
(or earlier): Other background preparation (e.g., languages, advanced topics) completed.
May 27:
Summer research begins. (Tentative date)
August 2:
Summer research concludes. (Ten weeks hence)

4  Activities

I have very high expectations of my summer research students. Among other things, I expect my students to begin their summer research during spring semester and continue their summer research into fall semester (and sometimes beyond). By applying for summer research you are agreeing to meet these expectations if I take you on as a research student. You are unlikely to receive explicit credit or compensation for work in the spring and fall.
I also expect my students to be self-reliant. While I do my best to be around, I expect you to be able to do many things on your own or with a small group.
I expect your graded research milestones (eight in all) to conform to the highest standards of writing at Grinnell.
The remainder of this section is an overview; you may find many more details in the syllabus.

4.1  Spring

Topic Preparation  
You are expected to begin your background research during the spring. In particular, you must identify at least four scientific papers on related projects. You are also encouraged to use the web to aid your search. Some useful resources are: Some of the related conferences to find this work are
CVPR
Computer Vision and Pattern Recognition
ICDAR
International Conference on Document Analysis and Recognition
DAS
IAPR International Workshop on Document Analysis Systems
GREC
IAPR International Workshop on Graphics Recognition
ICCV
International Conference on Computer Vision
ECCV
European Conference on Computer Vision
ICPR
International Conference on Pattern Recognition
ICIP
International Conference on Image Processing
ICASSP
International Conference on Acoustics, Speech, and Signal Processing
and some related journals include
PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCV
International Journal on Computer Vision
IJDAR
International Journal on Document Analysis and Recognition
TIP
IEEE Transactions on Image Processing
though there are of course many, many others. Once you have identified potentially useful resources, if you cannot find an author preprint online (they nearly always are), consult with the librarians about obtaining a copy of an article or conference paper.
You will email me your list of papers (with complete citations) by the date above.
Skill Preparation  
If your project will require a programming language, data interface, or library that you do not yet know, you are expected to begin learning them. You need not master any of them, but should develop comfort and familiarity.

4.2  Summer

During the summer, you are expected to work full-time on the project (40 hours per week for ten weeks). This work will include regularly scheduled group meetings. See the syllabus for more information.
Topic Preparation  
For the first week of summer research, you will continue your preparation from the Spring, developing a survey of the state of the art in whatever project you've decided to undertake. You should prepare a short survey paper. This will serve as an introduction/literature review for a later paper. On the first day of the second week, you will give a public presentation of your work.
Core Research and Development  
For the next eight weeks of the summer, you will work on your project, using what you've learned during preparation for guidance. Some of this time may be spent developing skills. We will have a full-group meeting several times per week. Each group will present at least once per week at that meeting.
Writing  
Throughout the summer, you will work on a five-to-ten page paper describing your work and placing it in the context of related work. Your paper should meet the highest standards of writing at Grinnell. I hope our work will progress to the point that you will be able to submit a version of this paper to a scholarly conference or journal. (I will provide significant assistance in developing the submitted version, in which case I will probably ask to be listed as a co-author.)

4.3  Fall and Beyond

Poster Presentation  
You will create a poster describing your work and present it at the Grinnell Science Poster Seminar (typically during parents' weekend).
Internal Public Presentation  
You will give a twenty-five or fifty minute presentation on your work as part of the Computer Science Department's Thursday Extras series.
External Conference Presentation  
If your work is submitted to and accepted by a conference you are expected to attend and present your work. (Funding is available from the Dean's office for you to attend the conference.)
External Pew Presentation  
You may submit your work to the Pew Midstates Science and Mathematics Consortium Fall Symposium on Undergraduate Research in the Physical and Mathematical Sciences. If your work is accepted, you must attend the symposium (including non-CS talks) and present your work (in poster or talk form). You must give at least one practice talk before going to the conference.

Acknowledgement

With thanks to Professor Sam Rebelsky for many elements of Section 4.