N-Gram Language Models for Offline Handwritten Text Recognition

Author(s):  
M. Zimmermann ◽  
H. Bunke
Author(s):  
ROMAN BERTOLAMI ◽  
HORST BUNKE

Current multiple classifier systems for unconstrained handwritten text recognition do not provide a straightforward way to utilize language model information. In this paper, we describe a generic method to integrate a statistical n-gram language model into the combination of multiple offline handwritten text line recognizers. The proposed method first builds a word transition network and then rescores this network with an n-gram language model. Experimental evaluation conducted on a large dataset of offline handwritten text lines shows that the proposed approach improves the recognition accuracy over a reference system as well as over the original combination method that does not include a language model.


2020 ◽  
Vol 10 (21) ◽  
pp. 7711
Author(s):  
Arthur Flor de Sousa Neto ◽  
Byron Leite Dantas Bezerra ◽  
Alejandro Héctor Toselli

The increasing portability of physical manuscripts to the digital environment makes it common for systems to offer automatic mechanisms for offline Handwritten Text Recognition (HTR). However, several scenarios and writing variations bring challenges in recognition accuracy, and, to minimize this problem, optical models can be used with language models to assist in decoding text. Thus, with the aim of improving results, dictionaries of characters and words are generated from the dataset and linguistic restrictions are created in the recognition process. In this way, this work proposes the use of spelling correction techniques for text post-processing to achieve better results and eliminate the linguistic dependence between the optical model and the decoding stage. In addition, an encoder–decoder neural network architecture in conjunction with a training methodology are developed and presented to achieve the goal of spelling correction. To demonstrate the effectiveness of this new approach, we conducted an experiment on five datasets of text lines, widely known in the field of HTR, three state-of-the-art Optical Models for text recognition and eight spelling correction techniques, among traditional statistics and current approaches of neural networks in the field of Natural Language Processing (NLP). Finally, our proposed spelling correction model is analyzed statistically through HTR system metrics, reaching an average sentence correction of 54% higher than the state-of-the-art method of decoding in the tested datasets.


Author(s):  
Sri. Yugandhar Manchala ◽  
Jayaram Kinthali ◽  
Kowshik Kotha ◽  
Kanithi Santosh Kumar, Jagilinki Jayalaxmi ◽  

Sign in / Sign up

Export Citation Format

Share Document