Open vocabulary handwriting recognition using combined word-level and character-level language models

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.

Download Full-text

Separating Optical and Language Models Through Encoder-Decoder Strategy for Transferable Handwriting Recognition

2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) ◽

10.1109/icfhr-2018.2018.00061 ◽

2018 ◽

Author(s):

Adeline Granet ◽

Emmanuel Morin ◽

Harold Mouchere ◽

Solen Quiniou ◽

Christian Viard-Gaudin

Keyword(s):

Handwriting Recognition ◽

Language Models

Download Full-text

Category-Based Language Models for Handwriting Recognition of Marriage License Books

2013 12th International Conference on Document Analysis and Recognition ◽

10.1109/icdar.2013.161 ◽

2013 ◽

Cited By ~ 7

Author(s):

Veronica Romero ◽

Joan Andreu Sanchez

Keyword(s):

Handwriting Recognition ◽

Language Models

Download Full-text

Assessment of Word-Level Neural Language Models for Sentence Completion

Applied Sciences ◽

10.3390/app10041340 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1340

Author(s):

Heewoong Park ◽

Jonghun Park

Keyword(s):

Language Model ◽

Fine Tuning ◽

Language Models ◽

Sentence Completion ◽

Korean Language ◽

Learning Framework ◽

Scholastic Aptitude ◽

Word Level ◽

Network Language ◽

Comprehensive Study

The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research.

Download Full-text

Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models

10.18653/v1/w18-5428 ◽

2018 ◽

Cited By ~ 1

Author(s):

Avery Hiebert ◽

Cole Peterson ◽

Alona Fyshe ◽

Nishant Mehta

Keyword(s):

Language Models ◽

Word Level ◽

State Behaviour

Download Full-text

USING A STATISTICAL LANGUAGE MODEL TO IMPROVE THE PERFORMANCE OF AN HMM-BASED CURSIVE HANDWRITING RECOGNITION SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001401000848 ◽

2001 ◽

Vol 15 (01) ◽

pp. 65-90 ◽

Cited By ~ 235

Author(s):

U.-V. MARTI ◽

H. BUNKE

Keyword(s):

Hidden Markov ◽

Handwriting Recognition ◽

Language Model ◽

Recognition System ◽

Difficult Problem ◽

Language Models ◽

Linguistic Knowledge ◽

Statistical Language Model ◽

Cursive Handwriting ◽

Handwritten Text

In this paper, a system for the reading of totally unconstrained handwritten text is presented. The kernel of the system is a hidden Markov model (HMM) for handwriting recognition. This HMM is enhanced by a statistical language model. Thus linguistic knowledge beyond the lexicon level is incorporated in the recognition process. Another novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided. A number of experiments with various language models and large vocabularies have been conducted. The language models used in the system were also analytically compared based on their perplexity.

Download Full-text

ANALOGIC PREPROCESSING AND SEGMENTATION ALGORITHMS FOR OFFLINE HANDWRITING RECOGNITION

Journal of Circuits System and Computers ◽

10.1142/s0218126603001185 ◽

2003 ◽

Vol 12 (06) ◽

pp. 783-804 ◽

Cited By ~ 1

Author(s):

GERGELY TÍMÁR ◽

KRISTÓF KARACS ◽

CSABA REKECZKY

Keyword(s):

Future Development ◽

Handwriting Recognition ◽

Iterative Algorithms ◽

Front Propagation ◽

Word Segmentation ◽

Word Level ◽

Segmentation Algorithms ◽

Wave Front Propagation ◽

Offline Handwriting Recognition ◽

Dynamic Wave

This report describes analogic algorithms used in the preprocessing and segmentation phase of offline handwriting recognition tasks. A segmentation-based handwriting recognition approach is discussed, i.e., the system attempts to segment the words into their constituent letters. In order to improve their speed, the utilized CNN algorithms, whenever possible, use dynamic, wave front propagation-based methods instead of relying on morphologic operators were embedded into iterative algorithms. The system first locates the handwritten lines in the page image, then corrects their skew as necessary. It then searches for the words within the lines and corrects the skew at the word level as well. A novel trigger wave-based word segmentation algorithm is presented, which operates on the skeletons of words. Sample results of experiments conducted on a database of 25 handwritten pages along with suggestions for future development are presented.

Download Full-text

Recognizer characterisation for combining handwriting recognition results at word level

Proceedings of 3rd International Conference on Document Analysis and Recognition ◽

10.1109/icdar.1995.598946 ◽

2002 ◽

Cited By ~ 1

Author(s):

R.K. Powalka ◽

N. Sherkat ◽

R.J. Whitrow

Keyword(s):

Handwriting Recognition ◽

Word Level

Download Full-text

Open vocabulary handwriting recognition using combined word-level and character-level language models

Creating word-level language models for handwriting recognition

Creating word-level language models for large-vocabulary handwriting recognition

Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Separating Optical and Language Models Through Encoder-Decoder Strategy for Transferable Handwriting Recognition

Category-Based Language Models for Handwriting Recognition of Marriage License Books

Assessment of Word-Level Neural Language Models for Sentence Completion

Interpreting Word-Level Hidden State Behaviour of Character-Level LSTM Language Models

USING A STATISTICAL LANGUAGE MODEL TO IMPROVE THE PERFORMANCE OF AN HMM-BASED CURSIVE HANDWRITING RECOGNITION SYSTEM

ANALOGIC PREPROCESSING AND SEGMENTATION ALGORITHMS FOR OFFLINE HANDWRITING RECOGNITION

Recognizer characterisation for combining handwriting recognition results at word level

Export Citation Format