Word Hypotheses for Segmentation-Free Word Spotting in Historic Document Images

Author(s):  
Leonard Rothacker ◽  
Sebastian Sudholt ◽  
Eugen Rusakov ◽  
Matthias Kasperidus ◽  
Gernot A. Fink
Author(s):  
Leonard Rothacker ◽  
Fabian Wolf ◽  
Gernot A. Fink

The annotation-free word spotting method that is proposed in this paper makes document images searchable without requiring any labeled training data. Thus, our method supports the exploration of a document collection directly without demanding any manual efforts from the users for the preparation of a training dataset. Our method works in the query-by-example scenario where the user selects an exemplary occurrence of the query word. Afterwards, the entire collection of document images is searched according to visual similarity to the query. The proposed method requires only minimal assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. Therefore, the method is also segmentation-free. Word size variabilities can be handled by representing the sequential structure of text with a statistical sequence model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Re-ranking these regions according to the visual similarity obtained with the sequence model leads to highly accurate word spotting results. The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated training data is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks.


Author(s):  
Shamik Majumder ◽  
Subhrangshu Ghosh ◽  
Samir Malakar ◽  
Ram Sarkar ◽  
Mita Nasipuri

2013 ◽  
Author(s):  
Nikos Vasilopoulos ◽  
Ergina Kavallieratou
Keyword(s):  

2019 ◽  
Vol 9 (2) ◽  
pp. 49-65
Author(s):  
Thontadari C. ◽  
Prabhakar C. J.

In this article, the authors propose a segmentation-free word spotting in handwritten document images using a Bag of Visual Words (BoVW) framework based on the co-occurrence histogram of oriented gradient (Co-HOG) descriptor. Initially, the handwritten document is represented using visual word vectors which are obtained based on the frequency of occurrence of Co-HOG descriptor within local patches of the document. The visual word representation vector does not consider their spatial location and spatial information helps to determine a location exclusively with visual information when the different location can be perceived as the same. Hence, to add spatial distribution information of visual words into the unstructured BoVW framework, the authors adopted spatial pyramid matching (SPM) technique. The performance of the proposed method evaluated using popular datasets and it is confirmed that the authors' method outperforms existing segmentation free word spotting techniques.


Sign in / Sign up

Export Citation Format

Share Document