Maximum mutual information training for an online neural predictive handwritten word recognition system

2001 ◽  
Vol 4 (1) ◽  
pp. 56-68 ◽  
Author(s):  
Sonia Garcia-Salicetti ◽  
Bernadette Dorizzi ◽  
Patrick Gallinari ◽  
Zsolt Wimmer
2022 ◽  
Vol 14 (2) ◽  
pp. 614
Author(s):  
Taniya Hasija ◽  
Virender Kadyan ◽  
Kalpna Guleria ◽  
Abdullah Alharbi ◽  
Hashem Alyami ◽  
...  

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.


Author(s):  
Vishal A. Naik ◽  
Apurva A. Desai

In this article, an online handwritten word recognition system for the Gujarati language is presented by combining strokes, characters, punctuation marks, and diacritics. The authors have used a support vector machine classification algorithm with a radial basis function kernel. The authors used a hybrid features set. The hybrid feature set consists of directional features with curvature data. The authors have used a normalized chain code and zoning-based chain code features. Words are a combination of characters and diacritics. Recognized strokes require post-processing to form a word. The authors have used location-based and mapping rule-based post-processing methods. The authors have achieved an accuracy of 95.3% for individual characters, 91.5% for individual words, and 83.3% for sentences. The average processing time for individual characters is 0.071 seconds.


Author(s):  
Ke Han ◽  
Ishwar K. Sethi

Off-line cursive script recognition has got increasing attention during the last three decades since it is of interest in several areas such as banking and postal service. An off-line cursive handwritten word recognition system is described in this paper and is used for legal amount interpretation in personal checks. The proposed recognition system uses a set of geometric and topologic features to characterize each word. By considering the spatial distribution of these features in a word image, the proposed system maps each word into two strings of finite symbols. A local associative indexing scheme is then used on these strings to organize a vocabulary. When presented with an unknown word, the system uses the same indexing scheme to retrieve a set of candidate words likely to match the input word. A verification process is then carried out to find the best match among the candidate set. The performance of the proposed system has been tested with a legal amount image database from real bankchecks. The results obtained indicate that the proposed system is able to recognize legal amounts with great accuracy.


Author(s):  
Cinthia O. A. Freitas ◽  
Flávio Bortolozzi ◽  
Robert Sabourin

Este artigo descreve uma metodologia para seleção de classes de símbolos a partir de classesde grafemas em um sistema de reconhecimento de palavras manuscritas do extenso de cheques bancáriosbrasileiros baseado em HMM (Hidden Markov Models). Este artigo discute as definições de primitivas,grafemas e símbolos considerando um enfoque Global para o reconhecimento das palavras, o qual evita asegmentação das palavras em letras ou pseudo-letras utilizando HMM. Assim, a entrada para os modelosconsiste em uma descrição da palavra a partir de um alfabeto de símbolos gerados a partir dos grafemasextraídos das imagens das palavras, sendo esta a representação visível para o HMM. Portanto, a idéia éintroduzir uma conceituação de alto nível, tais como primitivas perceptivas (laços, ascendentes,descendentes, concavidades e convexidades) e fornecer um modo de retro-alimentação rápido e informativosobre a informação contida em cada classe de grafema, permitindo uma seleção de classes de símbolos. Oartigo apresenta o algoritmo com base na Informação Mútua (Mutual Information) e HMM, ambostrabalhando em um mesmo processo de avaliação. Os resultados experimentais demonstram que é possívelselecionar a partir de um conjunto “original” de grafemas (composto por 94 grafemas) um alfabeto desímbolos (composto por 29 símbolos). O artigo conclui que o poder discriminante dos grafemas é muitoimportante para a consolidação de um alfabeto de símbolos.


Sign in / Sign up

Export Citation Format

Share Document