Tools for the Efficient Generation of Hand-Drawn Corpora Based on Context-Free Grammars

Loading...
Thumbnail Image
Date
2009
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
In sketch recognition systems, ground-truth data sets serve to both train and test recognition algorithms. Unfortunately, generating data sets that are sufficiently large and varied is frequently a costly and time-consuming endeavour. In this paper, we present a novel technique for creating a large and varied ground-truthed corpus for hand drawn math recognition. Candidate math expressions for the corpus are generated via random walks through a context-free grammar, the expressions are transcribed by human writers, and an algorithm automatically generates ground-truth data for individual symbols and inter-symbol relationships within the math expressions. While the techniques we develop in this paper are illustrated through the creation of a ground-truthed corpus of mathematical expressions, they are applicable to any sketching domain that can be described by a formal grammar.
Description

        
@inproceedings{
:10.2312/SBM/SBM09/125-132
, booktitle = {
EUROGRAPHICS Workshop on Sketch-Based Interfaces and Modeling
}, editor = {
Cindy Grimm and Joseph J. LaViola, Jr.
}, title = {{
Tools for the Efficient Generation of Hand-Drawn Corpora Based on Context-Free Grammars
}}, author = {
MacLean, Scott
and
Tausky, David
and
Labahn, George
and
Lank, Edward
and
Marzouk, Mirette
}, year = {
2009
}, publisher = {
The Eurographics Association
}, ISSN = {
1812-3503
}, ISBN = {
978-3-905674-19-4
}, DOI = {
/10.2312/SBM/SBM09/125-132
} }
Citation