Summer 2018 MAP Projects
Jerod Weinman
Abstract
This document provides a background on summer 499 (MAP) for Grinnell
students and the high expectations I have for my summer students.
Contents
1 Introduction
2 Project Overview
2.1 Background
2.2 Synthetic Map Data Generation
2.3 Using Dynamic Training Data
3 Approximate Schedule
4 Activities
4.1 Spring
4.2 Summer
4.3 Fall and Beyond
1 Introduction
The general focus of my research is in machine learning for computer
vision. Because reconstructing a 3-D image from a 2-D projection is
a difficult inference problem, some computational machinery is necessary.
Furthermore, understanding and extracting meaning from images is a
problem that has been solved by humans, but remains elusive for machines.
Because it is nearly impossible to specify and hand-code models for
these tasks, machines must be endowed with some amount of learning
capabilities.
The application context of the projects for this summer is character
recognition in historical maps. While character recognition is one
of the oldest problems in pattern recognition, the most general form
of the problem-automated recognition that matches human performance
in any situation-is still very far from solved. My general aim is
to advance information availability with automated character recognition.
We will examine the training process of a text recognition pipeline
this summer.
2 Project Overview
2.1 Background
The application for the project, already underway, is recognizing
place names (toponyms) on historical map images. While many old maps
are being scanned and distributed online, their contents remain largely
impenetrable to automated search. This project works change that by
automatically detecting and recognizing the text in scanned map images
to enable indexing and search of these images the same way we now
search web pages and (more recently) digitized books.
The following resources describe my prior work in this area in (in
order of increasing detail):
The work described above uses a parser dating to my dissertation research
with a newer convolutional neural network-based character recognition
module. That system is largely brittle and slow, so I developed an
end-to-end word recognition system based on a recently published model
by others.
Projects for the summer will involve improving the training stage
of the text recognizer. Because the network has over fifteen million
parameters, it requires significant quantities of training data.
2.2 Synthetic Map Data Generation
The model above is trained with static synthetic
data generated for scene text recognition. This static dataset of
9 million images is over 10 Gb and is not wholly appropriate for recognizing
map text because it lacks the same background layers and typography.
Moreover, its static nature means that a highly-parameterized network
could learn the specifics of the dataset, leading to poor generalization
performance on previously unseen examples.
Rather than learn from scene text data, this project will generate
synthetic data that provides a better match for map text recognition.
The synthesis process must be efficient (exceeding 100 Hz) to outpace
with the trainer's consumption rate.
The starting point would likely be this contributed
OpenCV module for text synthesis, which has a simple
demo.. While the module uses the Qt5
framework to render text, alternative text rendering engines might
include Pango, or Graphite.
Creative, detail-oriented students who have taken CSC 207 and either
MAT 215 or CSC 213 (or their equivalents) will be good candidates
for this project. Willingness (and a demonstrated ability) to learn
C++ before the start of the project is required.
2.3 Using Dynamic Training Data
While the first project will focus on generating text image data,
this project will work to feed that dynamically generated images to
the model trainer, which uses several threads to read and fill data
buffers in parallel. This buffer-feeding parallelism was originally
designed to keep the training process supplied with data in the face
of large latency in reading data from disk.
This project will involve significant independent study and learning
about the Tensorflow platform's
data-reading mechanisms, which are primarily designed for reading
files from disk, not from memory. (The separate so-called feed_dict
mechanism reads from memory but may be too inefficient.)
The following Tensorflow guides provide a starting point for learning
about the "current" architecture (which may soon be deprecated).
A new data input API called Dataset may provide the
necessary functionality. Your job could be to figure out how to make
it so.
There is also a lengthy
thread discussing the development of the new Dataset API,
with a summary
comment linking to a few potentially related solutions here
and here.
If trying to come to grips with these discussions, sample code, and
sparsely documented, tutorial-free, bleeding-edge APIs frightens you,
this project is probably not for you. However, detail-oriented students
who are undaunted by sheer platform complexity and who have taken
CSC 207 and CSC 213 (or their equivalents) will be good candidates
for this project. Willingness (and a demonstrated ability) to learn
C++ and Python before the start of the project is required.
3 Approximate Schedule
This schedule largely follows that officially approved by the division.
However, since other (off-campus) options have different schedules,
I will need to know if you are considering other opportunities, what
the schedule is, and whether you are likely to choose an off-campus
opportunity if accepted.
- February 23:
- Application forms due. You must submit the division-wide
form online and your responses to my questions
to me.
- March 9:
- Initial selections announced (provided that the college
has approved funding) via e-mail.
- March 16:
- Decisions due.
- Week of 2 April:
- First meeting.
- Unspecified other dates:
- Additional meetings.
- April 23:
- Draft MAP proposal due.
- April 30:
- Revised MAP proposal due.
- May 7:
- Final MAP proposal due.
- May 22:
- Commencement.
- May 25:
- Brief literature survey due.
- June 8:
- (or earlier): Other background preparation (e.g., languages,
advanced topics) completed.
- June 11:
- Summer research begins. (Tentative date)
- August 18:
- Summer research concludes. (Ten weeks hence)
4 Activities
This this section is largely an overview; you may find many more details
in the syllabus.
4.1 Spring
Topic Preparation
You are expected to begin your background research during the spring.
In particular, you must identify at least four scientific papers on
related projects. You are also encouraged to use the web to aid your
search. Some useful resources are:
Some of the related conferences to find this work are
- CVPR
- Computer Vision and Pattern Recognition
- ICDAR
- International Conference on Document Analysis and Recognition
- DAS
- IAPR International Workshop on Document Analysis Systems
- GREC
- IAPR International Workshop on Graphics Recognition
- ICCV
- International Conference on Computer Vision
- ECCV
- European Conference on Computer Vision
- ICPR
- International Conference on Pattern Recognition
- ICIP
- International Conference on Image Processing
- ICASSP
- International Conference on Acoustics, Speech, and Signal
Processing
and some related journals include
- PAMI
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- IJCV
- International Journal on Computer Vision
- IJDAR
- International Journal on Document Analysis and Recognition
- TIP
- IEEE Transactions on Image Processing
though there are of course many, many others. Once you have identified
potentially useful resources, if you cannot find an author preprint
online (they nearly always are), consult with the librarians about
obtaining a copy of an article or conference paper.
You will email me your list of papers (with complete citations) by
the date above.
Skill Preparation
If your project will require a programming language, data interface,
or library that you do not yet know, you are expected to begin learning
them. You need not master any of them, but should develop comfort
and familiarity.
4.2 Summer
During the summer, you are expected to work full-time on the project
(40 hours per week for ten weeks). This work will include regularly
scheduled group meetings. See the syllabus
for more information.
4.3 Fall and Beyond
Poster Presentation
You will create a poster describing your work and present it at the
Grinnell Science Poster Seminar (typically during parents' weekend).
Internal Public Presentation
You will give a twenty-five or fifty minute presentation on your work
as part of the Computer Science Department's Thursday Extras series.
External Conference Presentation
If your work is submitted to and accepted by a conference you are
expected to attend and present your work. (Funding is available from
the Dean's office for you to attend the conference.)
External Pew Presentation
You may submit your work to the Pew Midstates Science and Mathematics
Consortium Fall Symposium on Undergraduate Research in the Physical
and Mathematical Sciences. If your work is accepted, you must attend
the symposium (including non-CS talks) and present your work (in poster
or talk form). You must give at least one practice talk before going
to the conference.
Acknowledgement
With thanks to Professor Sam Rebelsky for many elements of Section
4.