Improving Recurrent Neural Networks for Offline Arabic Handwriting Recognition by Combining Different Language Models

Author(s):  
Sana Khamekhem Jemni ◽  
Yousri Kessentini ◽  
Slim Kanoun

In handwriting recognition, the design of relevant features is very important, but it is a daunting task. Deep neural networks are able to extract pertinent features automatically from the input image. This drops the dependency on handcrafted features, which is typically a trial and error process. In this paper, we perform an exhaustive experimental evaluation of learned against handcrafted features for Arabic handwriting recognition task. Moreover, we focus on the optimization of the competing full-word based language models by incorporating different characters and sub-words models. We extensively investigate the use of different sub-word-based language models, mainly characters, pseudo-words, morphemes and hybrid units in order to enhance the full-word handwriting recognition system for Arabic script. The proposed method allows the recognition of any out of vocabulary word as an arbitrary sequence of sub-word units. The KHATT database has been used as a benchmark for the Arabic handwriting recognition. We show that combining multiple language models enhances considerably the recognition performance for a morphologically rich language like Arabic. We achieve the state-of-the-art performance on the KHATT dataset.

Processes ◽  
2019 ◽  
Vol 7 (7) ◽  
pp. 457 ◽  
Author(s):  
William Raveane ◽  
Pedro Luis Galdámez ◽  
María Angélica González Arrieta

The difficulty in precisely detecting and locating an ear within an image is the first step to tackle in an ear-based biometric recognition system, a challenge which increases in difficulty when working with variable photographic conditions. This is in part due to the irregular shapes of human ears, but also because of variable lighting conditions and the ever changing profile shape of an ear’s projection when photographed. An ear detection system involving multiple convolutional neural networks and a detection grouping algorithm is proposed to identify the presence and location of an ear in a given input image. The proposed method matches the performance of other methods when analyzed against clean and purpose-shot photographs, reaching an accuracy of upwards of 98%, but clearly outperforms them with a rate of over 86% when the system is subjected to non-cooperative natural images where the subject appears in challenging orientations and photographic conditions.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Faisal Ahmed ◽  
Emam Hossain

Recognition of human expression from facial image is an interesting research area, which has received increasing attention in the recent years. A robust and effective facial feature descriptor is the key to designing a successful expression recognition system. Although much progress has been made, deriving a face feature descriptor that can perform consistently under changing environment is still a difficult and challenging task. In this paper, we present the gradient local ternary pattern (GLTP)—a discriminative local texture feature for representing facial expression. The proposed GLTP operator encodes the local texture of an image by computing the gradient magnitudes of the local neighborhood and quantizing those values in three discrimination levels. The location and occurrence information of the resulting micropatterns is then used as the face feature descriptor. The performance of the proposed method has been evaluated for the person-independent face expression recognition task. Experiments with prototypic expression images from the Cohn-Kanade (CK) face expression database validate that the GLTP feature descriptor can effectively encode the facial texture and thus achieves improved recognition performance than some well-known appearance-based facial features.


Author(s):  
U.-V. MARTI ◽  
H. BUNKE

In this paper, a system for the reading of totally unconstrained handwritten text is presented. The kernel of the system is a hidden Markov model (HMM) for handwriting recognition. This HMM is enhanced by a statistical language model. Thus linguistic knowledge beyond the lexicon level is incorporated in the recognition process. Another novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided. A number of experiments with various language models and large vocabularies have been conducted. The language models used in the system were also analytically compared based on their perplexity.


1992 ◽  
Vol 36 (4) ◽  
pp. 283-287 ◽  
Author(s):  
Paulo J. Santos ◽  
Amy J. Baltzer ◽  
Albert N. Badre ◽  
Richard L. Henneman ◽  
Michael S. Miller

Performance of a rule-based handwriting recognition system is considered. Performance limits of such systems are defined by the robustness of the character templates and the ability of the system to segment characters. Published performance figures, however, are typically based on pre-segmented characters. Six experiments are reported (using a total of 128 subjects) that tested a state-of-the-art recognition system under more realistic conditions. Variables investigated include display format (grid, lined, and blank), surface texture, feedback (location and time delay), amount of training, practice, and effects of use over an extended period. Results indicated that novice users writing on a lined display (the most preferred format) averaged 57% recognition performance. By giving subjects continuous feedback of results, training, and after about 10 minutes of use, the system averaged 90.6% character recognition. Following three hours of interrupted use and with performance incentives, subjects achieved an average 96.8% accuracy with the system. Future work should focus on improving the ability of the recognition algorithm to segment characters and on developing non-obtrusive interaction techniques to train users, to provide feedback and to correct mis-recognized characters.


2014 ◽  
Vol 610 ◽  
pp. 265-269
Author(s):  
Jing Ya Zhang ◽  
Li Yang ◽  
Rong Zhao ◽  
Long Hua Yang

In this paper, Discrete Hopfield Neural Network (DHNN) is adopted to realize handwritten characters recognition. First, learning samples are preprocessed including binarization, normalization and interpolation. Then pixel features are extracted and used to establish DHNN. The handwritten test samples and noise corrupted samples are finally inputted into the network to verify its recognition performance. Simulation results reveal that DHNN has good fault tolerance and disturbance rejection performance. In addition, the recognition system is realized with MATLAB neural network toolbox and GUI, which verifies the feasibility of the algorithm.


2012 ◽  
Vol 20 (2) ◽  
pp. 235-259 ◽  
Author(s):  
MARTHA YIFIRU TACHBELIE ◽  
SOLOMON TEFERRA ABATE ◽  
WOLFGANG MENZEL

AbstractThis paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.


Author(s):  
PAOLA FLOCCHINI ◽  
FRANCESCO GARDIN ◽  
GIANCARLO MAURI ◽  
MARIA PIA PENSINI ◽  
PAOLO STOFELLA

This paper describes a system able to recognize human faces from different perspectives, and which have different expressions. It possibly presents some kind of noise in their representation. The problem of face recognition has been approached using a complex architecture based on a hierarchy of neural networks, with a particular self-referencing structure. The system, in fact, is structured as a tree in which nodes correspond to neural networks, each one having different tasks. Each leaf is a recognition module composed by some networks with different characteristics depending on the different preprocessing operators used. These networks are coordinated by a supervisor in a self-referencing structure. During the training phase, the supervisor, called Meta-Net, observes the behaviour of recognition nets and learns which net is more able in which task, while during the test phase it decides, given an input image, which weights to assign to each network and modifies their output in order to obtain the final result. This architecture shows a high generalization capability and allows the recognition of images with different kinds of noise better than what each single network can do, as confirmed by a preliminary experimental evaluation.


1989 ◽  
Vol 33 (5) ◽  
pp. 301-304 ◽  
Author(s):  
Catalina M. Danis

This paper reports on a study of recognition performance for a group of new users during their first month of experience with the Tangora system. Tangora is a 20,000 word, speaker dependent, isolated-word system which transcribes speech input into text in real-time. Twelve users, six males and six females, participated in 21 sessions each, during which they read aloud unrelated sentences selected from a corpus of office correspondence. Their goal was to develop a speaking style which minimized Tangora's recognition error. To this end, starting with the third session, the experimenter generated hypotheses about each users' speech habits which may have resulted in high recognition error and made suggestions to the user on how to modify his/her speaking style. In addition, each user produced a new speech sample each of the four weeks of the experiment which was used to “train” the system to recognize the speaker. On average, recognition error decreased by 33% from the first to the fourth week. This improvement was attributable to “retraining” the system with, apparently, more representative speech samples. A number of speech habits brought by users to the recognition task were identified as contributing to poor recognition performance by Tangora. These included: (a) a too fast speech rate, (b) failure to pause between words, (c) hyper-correct articulation of the final phoneme in words and (d) incomplete articulation of the first phoneme in words. Feedback relating to these speech habits was used successfully by a majority of the users to modify their speaking style into one more successfully recognized by the Tangora system.


2019 ◽  
Vol 19 (2) ◽  
pp. 28-37
Author(s):  
Hawraa H. Abbas ◽  
Bilal Z. Ahmed ◽  
Ahmed Kamil Abbas

Abstract The face is the preferable biometrics for person recognition or identification applications because person identifying by face is a human connate habit. In contrast to 2D face recognition, 3D face recognition is practically robust to illumination variance, facial cosmetics, and face pose changes. Traditional 3D face recognition methods describe shape variation across the whole face using holistic features. In spite of that, taking into account facial regions, which are unchanged within expressions, can acquire high performance 3D face recognition system. In this research, the recognition analysis is based on defining a set of coherent parts. Those parts can be considered as latent factors in the face shape space. Non-negative matrix Factorisation technique is used to segment the 3D faces to coherent regions. The best recognition performance is achieved when the vertices of 20 face regions are utilised as a feature vector for recognition task. The region-based 3D face recognition approach provides a 96.4% recognition rate in FRGCv2 dataset.


Sign in / Sign up

Export Citation Format

Share Document