Improving Recurrent Neural Networks for Offline Arabic Handwriting Recognition by Combining Different Language Models

In handwriting recognition, the design of relevant features is very important, but it is a daunting task. Deep neural networks are able to extract pertinent features automatically from the input image. This drops the dependency on handcrafted features, which is typically a trial and error process. In this paper, we perform an exhaustive experimental evaluation of learned against handcrafted features for Arabic handwriting recognition task. Moreover, we focus on the optimization of the competing full-word based language models by incorporating different characters and sub-words models. We extensively investigate the use of different sub-word-based language models, mainly characters, pseudo-words, morphemes and hybrid units in order to enhance the full-word handwriting recognition system for Arabic script. The proposed method allows the recognition of any out of vocabulary word as an arbitrary sequence of sub-word units. The KHATT database has been used as a benchmark for the Arabic handwriting recognition. We show that combining multiple language models enhances considerably the recognition performance for a morphologically rich language like Arabic. We achieve the state-of-the-art performance on the KHATT dataset.

Download Full-text

Ear Detection and Localization with Convolutional Neural Networks in Natural Images and Videos

Processes ◽

10.3390/pr7070457 ◽

2019 ◽

Vol 7 (7) ◽

pp. 457 ◽

Cited By ~ 2

Author(s):

William Raveane ◽

Pedro Luis Galdámez ◽

María Angélica González Arrieta

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Detection System ◽

Recognition System ◽

Input Image ◽

Natural Images ◽

System A ◽

Irregular Shapes ◽

Ear Detection ◽

Detection And Localization

The difficulty in precisely detecting and locating an ear within an image is the first step to tackle in an ear-based biometric recognition system, a challenge which increases in difficulty when working with variable photographic conditions. This is in part due to the irregular shapes of human ears, but also because of variable lighting conditions and the ever changing profile shape of an ear’s projection when photographed. An ear detection system involving multiple convolutional neural networks and a detection grouping algorithm is proposed to identify the presence and location of an ear in a given input image. The proposed method matches the performance of other methods when analyzed against clean and purpose-shot photographs, reaching an accuracy of upwards of 98%, but clearly outperforms them with a rate of over 86% when the system is subjected to non-cooperative natural images where the subject appears in challenging orientations and photographic conditions.

Download Full-text

Automated Facial Expression Recognition Using Gradient-Based Ternary Texture Patterns

Chinese Journal of Engineering ◽

10.1155/2013/831747 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 19

Author(s):

Faisal Ahmed ◽

Emam Hossain

Keyword(s):

Facial Expression ◽

Recognition Performance ◽

Texture Feature ◽

Recognition Task ◽

Facial Feature ◽

Recognition System ◽

Feature Descriptor ◽

Expression Recognition ◽

Local Texture ◽

Face Expression

Recognition of human expression from facial image is an interesting research area, which has received increasing attention in the recent years. A robust and effective facial feature descriptor is the key to designing a successful expression recognition system. Although much progress has been made, deriving a face feature descriptor that can perform consistently under changing environment is still a difficult and challenging task. In this paper, we present the gradient local ternary pattern (GLTP)—a discriminative local texture feature for representing facial expression. The proposed GLTP operator encodes the local texture of an image by computing the gradient magnitudes of the local neighborhood and quantizing those values in three discrimination levels. The location and occurrence information of the resulting micropatterns is then used as the face feature descriptor. The performance of the proposed method has been evaluated for the person-independent face expression recognition task. Experiments with prototypic expression images from the Cohn-Kanade (CK) face expression database validate that the GLTP feature descriptor can effectively encode the facial texture and thus achieves improved recognition performance than some well-known appearance-based facial features.

Download Full-text

USING A STATISTICAL LANGUAGE MODEL TO IMPROVE THE PERFORMANCE OF AN HMM-BASED CURSIVE HANDWRITING RECOGNITION SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001401000848 ◽

2001 ◽

Vol 15 (01) ◽

pp. 65-90 ◽

Cited By ~ 235

Author(s):

U.-V. MARTI ◽

H. BUNKE

Keyword(s):

Hidden Markov ◽

Handwriting Recognition ◽

Language Model ◽

Recognition System ◽

Difficult Problem ◽

Language Models ◽

Linguistic Knowledge ◽

Statistical Language Model ◽

Cursive Handwriting ◽

Handwritten Text

In this paper, a system for the reading of totally unconstrained handwritten text is presented. The kernel of the system is a hidden Markov model (HMM) for handwriting recognition. This HMM is enhanced by a statistical language model. Thus linguistic knowledge beyond the lexicon level is incorporated in the recognition process. Another novel feature of the system is that the HMM is applied in such a way that the difficult problem of segmenting a line of text into individual words is avoided. A number of experiments with various language models and large vocabularies have been conducted. The language models used in the system were also analytically compared based on their perplexity.

Download Full-text

On Handwriting Recognition System Performance: Some Experimental Results

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193129203600405 ◽

1992 ◽

Vol 36 (4) ◽

pp. 283-287 ◽

Cited By ~ 6

Author(s):

Paulo J. Santos ◽

Amy J. Baltzer ◽

Albert N. Badre ◽

Richard L. Henneman ◽

Michael S. Miller

Keyword(s):

Character Recognition ◽

Recognition Performance ◽

Handwriting Recognition ◽

Extended Period ◽

Recognition System ◽

Recognition Algorithm ◽

Performance Limits ◽

Novice Users ◽

Continuous Feedback ◽

Future Work

Performance of a rule-based handwriting recognition system is considered. Performance limits of such systems are defined by the robustness of the character templates and the ability of the system to segment characters. Published performance figures, however, are typically based on pre-segmented characters. Six experiments are reported (using a total of 128 subjects) that tested a state-of-the-art recognition system under more realistic conditions. Variables investigated include display format (grid, lined, and blank), surface texture, feedback (location and time delay), amount of training, practice, and effects of use over an extended period. Results indicated that novice users writing on a lined display (the most preferred format) averaged 57% recognition performance. By giving subjects continuous feedback of results, training, and after about 10 minutes of use, the system averaged 90.6% character recognition. Following three hours of interrupted use and with performance incentives, subjects achieved an average 96.8% accuracy with the system. Future work should focus on improving the ability of the recognition algorithm to segment characters and on developing non-obtrusive interaction techniques to train users, to provide feedback and to correct mis-recognized characters.

Download Full-text

On-Line Handwriting Recognition Based on Hopfield Neural Network

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.610.265 ◽

2014 ◽

Vol 610 ◽

pp. 265-269

Author(s):

Jing Ya Zhang ◽

Li Yang ◽

Rong Zhao ◽

Long Hua Yang

Keyword(s):

Neural Network ◽

Disturbance Rejection ◽

Recognition Performance ◽

Handwriting Recognition ◽

Hopfield Neural Network ◽

Recognition System ◽

Performance Simulation ◽

On Line ◽

Neural Network Toolbox ◽

Simulation Results

In this paper, Discrete Hopfield Neural Network (DHNN) is adopted to realize handwritten characters recognition. First, learning samples are preprocessed including binarization, normalization and interpolation. Then pixel features are extracted and used to establish DHNN. The handwritten test samples and noise corrupted samples are finally inputted into the network to verify its recognition performance. Simulation results reveal that DHNN has good fault tolerance and disturbance rejection performance. In addition, the recognition system is realized with MATLAB neural network toolbox and GUI, which verifies the feasibility of the algorithm.

Download Full-text

Using morphemes in language modeling and automatic speech recognition of Amharic

Natural Language Engineering ◽

10.1017/s1351324912000356 ◽

2012 ◽

Vol 20 (2) ◽

pp. 235-259 ◽

Cited By ~ 1

Author(s):

MARTHA YIFIRU TACHBELIE ◽

SOLOMON TEFERRA ABATE ◽

WOLFGANG MENZEL

Keyword(s):

Speech Recognition ◽

Substantial Reduction ◽

Recognition Task ◽

Recognition System ◽

Language Modeling ◽

Language Models ◽

Vocabulary Size ◽

Severe Problem ◽

Morphologically Rich Languages ◽

N Gram

AbstractThis paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.

Download Full-text

Offline cursive handwriting recognition system based on hybrid Markov model and neural networks

Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation ◽

10.1109/cira.2003.1222166 ◽

2004 ◽

Cited By ~ 7

Author(s):

Y.H. Tay ◽

M. Khalid ◽

R. Yusof ◽

C. Viard-Gaudin

Keyword(s):

Neural Networks ◽

Markov Model ◽

Handwriting Recognition ◽

Recognition System ◽

Cursive Handwriting

Download Full-text

COMBINING IMAGE PROCESSING OPERATORS AND NEURAL NETWORKS IN A FACE RECOGNITION SYSTEM

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001492000278 ◽

1992 ◽

Vol 06 (02n03) ◽

pp. 447-467 ◽

Cited By ~ 6

Author(s):

PAOLA FLOCCHINI ◽

FRANCESCO GARDIN ◽

GIANCARLO MAURI ◽

MARIA PIA PENSINI ◽

PAOLO STOFELLA

Keyword(s):

Neural Networks ◽

Face Recognition ◽

Recognition System ◽

Input Image ◽

Generalization Capability ◽

Face Recognition System ◽

Complex Architecture ◽

Human Faces ◽

Different Characteristics ◽

Better Than

This paper describes a system able to recognize human faces from different perspectives, and which have different expressions. It possibly presents some kind of noise in their representation. The problem of face recognition has been approached using a complex architecture based on a hierarchy of neural networks, with a particular self-referencing structure. The system, in fact, is structured as a tree in which nodes correspond to neural networks, each one having different tasks. Each leaf is a recognition module composed by some networks with different characteristics depending on the different preprocessing operators used. These networks are coordinated by a supervisor in a self-referencing structure. During the training phase, the supervisor, called Meta-Net, observes the behaviour of recognition nets and learns which net is more able in which task, while during the test phase it decides, given an input image, which weights to assign to each network and modifies their output in order to obtain the final result. This architecture shows a high generalization capability and allows the recognition of images with different kinds of noise better than what each single network can do, as confirmed by a preliminary experimental evaluation.

Download Full-text

Developing Successful Speakers for an Automatic Speech Recognition System

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193128903300514 ◽

1989 ◽

Vol 33 (5) ◽

pp. 301-304 ◽

Cited By ~ 2

Author(s):

Catalina M. Danis

Keyword(s):

Recognition Performance ◽

Speech Rate ◽

Recognition Task ◽

Recognition System ◽

Read Aloud ◽

Automatic Speech Recognition System ◽

Recognition Error ◽

Speech Input ◽

Isolated Word ◽

Speaking Style

This paper reports on a study of recognition performance for a group of new users during their first month of experience with the Tangora system. Tangora is a 20,000 word, speaker dependent, isolated-word system which transcribes speech input into text in real-time. Twelve users, six males and six females, participated in 21 sessions each, during which they read aloud unrelated sentences selected from a corpus of office correspondence. Their goal was to develop a speaking style which minimized Tangora's recognition error. To this end, starting with the third session, the experimenter generated hypotheses about each users' speech habits which may have resulted in high recognition error and made suggestions to the user on how to modify his/her speaking style. In addition, each user produced a new speech sample each of the four weeks of the experiment which was used to “train” the system to recognize the speaker. On average, recognition error decreased by 33% from the first to the fourth week. This improvement was attributable to “retraining” the system with, apparently, more representative speech samples. A number of speech habits brought by users to the recognition task were identified as contributing to poor recognition performance by Tangora. These included: (a) a too fast speech rate, (b) failure to pause between words, (c) hyper-correct articulation of the final phoneme in words and (d) incomplete articulation of the first phoneme in words. Feedback relating to these speech habits was used successfully by a majority of the users to modify their speaking style into one more successfully recognized by the Tangora system.

Download Full-text

3D Face Factorisation for Face Recognition Using Pattern Recognition Algorithms

Cybernetics and Information Technologies ◽

10.2478/cait-2019-0013 ◽

2019 ◽

Vol 19 (2) ◽

pp. 28-37

Author(s):

Hawraa H. Abbas ◽

Bilal Z. Ahmed ◽

Ahmed Kamil Abbas

Keyword(s):

Face Recognition ◽

High Performance ◽

Recognition Performance ◽

Recognition Rate ◽

Recognition Task ◽

Recognition System ◽

3D Face Recognition ◽

Shape Variation ◽

3D Face ◽

The Face

Abstract The face is the preferable biometrics for person recognition or identification applications because person identifying by face is a human connate habit. In contrast to 2D face recognition, 3D face recognition is practically robust to illumination variance, facial cosmetics, and face pose changes. Traditional 3D face recognition methods describe shape variation across the whole face using holistic features. In spite of that, taking into account facial regions, which are unchanged within expressions, can acquire high performance 3D face recognition system. In this research, the recognition analysis is based on defining a set of coherent parts. Those parts can be considered as latent factors in the face shape space. Non-negative matrix Factorisation technique is used to segment the 3D faces to coherent regions. The best recognition performance is achieved when the vertices of 20 face regions are utilised as a feature vector for recognition task. The region-based 3D face recognition approach provides a 96.4% recognition rate in FRGCv2 dataset.

Download Full-text