Probabilistic N-gram language model for SMS Lingo

Author(s):  
R. Damdoo ◽  
U. Shrawankar
Keyword(s):  
Author(s):  
ROMAN BERTOLAMI ◽  
HORST BUNKE

Current multiple classifier systems for unconstrained handwritten text recognition do not provide a straightforward way to utilize language model information. In this paper, we describe a generic method to integrate a statistical n-gram language model into the combination of multiple offline handwritten text line recognizers. The proposed method first builds a word transition network and then rescores this network with an n-gram language model. Experimental evaluation conducted on a large dataset of offline handwritten text lines shows that the proposed approach improves the recognition accuracy over a reference system as well as over the original combination method that does not include a language model.


2012 ◽  
Vol E95.D (9) ◽  
pp. 2308-2317 ◽  
Author(s):  
Welly NAPTALI ◽  
Masatoshi TSUCHIYA ◽  
Seiichi NAKAGAWA
Keyword(s):  

2012 ◽  
Vol 38 (3) ◽  
pp. 631-671 ◽  
Author(s):  
Ming Tan ◽  
Wenli Zhou ◽  
Lei Zheng ◽  
Shaojun Wang

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.


MACRo 2015 ◽  
2017 ◽  
Vol 2 (1) ◽  
pp. 1-10
Author(s):  
József Domokos ◽  
Zsolt Attila Szakács

AbstractThis paper presents a Romanian language phonetic transcription web service and application built using Java technologies, on the top of the Phonetisaurus G2P, a Word Finite State Transducer (WFST)-driven Grapheme-to-Phoneme Conversion toolkit.We used NaviRO Romanian language pronunciation dictionary for WFST model training, and MIT Language Modeling (MITLM) toolkit to estimate the needed joint sequence n-gram language model.Dictionary evaluation tests are also included in the paper.The service can be accessed for educational, research and other non-commercial usage at http://users.utcluj.ro/~jdomokos/naviro/.


Sign in / Sign up

Export Citation Format

Share Document