Adjustment Method between Phonological Attributes and Phone Boundaries

2013 ◽  
Vol 433-435 ◽  
pp. 316-321
Author(s):  
Lian Hai Zhang ◽  
Qi Chen ◽  
Dan Qu

Two kinds of imperfections, namely the detection errors and the asynchrony between phonological attributes and phone boundaries, can cause a substantial decline in recognition accuracy of a detection-based automatic speech recognition system. To solve these problems, an adjustment method between phonological attributes and phone boundaries is proposed in this paper. At first the prior knowledge of corpus and the detection results are combined, then the asynchronies in the phone boundary area are compensated and the detection errors are corrected; additionally, by selectively deleting some frames with errors, the precision of the phone models are improved. After adoption of this adjustment method, 1.4% of phoneme recognition rate can be improved in the TIMIT phone classification experiments based on Conditional Random Fields.

Author(s):  
George Saon ◽  
Abdel Belaïd

In this paper we present a system for the recognition of handwritten words on literal check amounts which advantageously combine HMMs and Markov random fields (MRFs). It operates at pixel level, in a holistic manner, on height normalized word images which are viewed as random field realizations. The HMM analyzes the image along the horizontal writing direction, in a specific state observation probability given by the column product of causal MRF-like pixel conditional probabilities. Aspects concerning definition, training and recognition via this type of model are developed throughout the paper. We report a 90.08% average word recognition rate on 2378 words and a 79.52% amount rate on 579 amounts of the SRTP* French postal check database (7031 words, 1779 amounts, different scriptors).


2019 ◽  
Vol 9 (10) ◽  
pp. 2166 ◽  
Author(s):  
Mohamed Tamazin ◽  
Ahmed Gouda ◽  
Mohamed Khedr

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB.


2014 ◽  
Vol 989-994 ◽  
pp. 2569-2575
Author(s):  
Feng Gao ◽  
Zhong Jian Dai ◽  
Kun Zhou ◽  
Ya Ping Dai

In order to improve the license plate recognition accuracy under complex environment, a new license location algorithm combining vertical edge detection, color information of the license plate and mathematical morphology is presented in this paper. For balance of computing load and recognition accuracy, a “200-d” character feature rule is designed, and the “200-d” feature is used as the input of BP neural network to recognize the characters. Based on the above-mentioned methods, a license plate recognition system is set up, which can locate and recognize the license plate effectively, even when the resolution of pictures and the position of vehicles in the pictures are not fixed. Experimental results indicate that the recognition rate of the algorithm reaches 90.5%.


Author(s):  
Teddy Surya Gunawan ◽  
Ahmad Fakhrur Razi Mohd Noor ◽  
Mira Kartiwi

Due to the advanced in GPU and CPU, in recent years, Deep Neural Network (DNN) becomes popular to be utilized both as feature extraction and classifier. This paper aims to develop offline handwritten recognition system using DNN. First, two popular English digits and letters database, i.e. MNIST and EMNIST, were selected to provide dataset for training and testing phase of DNN. Altogether, there are 10 digits [0-9] and 52 letters [a-z, A-Z]. The proposed DNN used stacked two autoencoder layers and one softmax layer. Recognition accuracy for English digits and letters is 97.7% and 88.8%, respectively. Performance comparison with other structure of neural networks revealed that the weighted average recognition rate for patternnet, feedforwardnet, and proposed DNN were 80.3%, 68.3%, and 90.4%, respectively. It shows that our proposed system is able to recognize handwritten English digits and letters with high accuracy.


Author(s):  
KIYOHIRO SHIKANO ◽  
TOMOKAZU YAMADA ◽  
TAKESHI KAWABATA ◽  
SHOICHI MATSUNAGA ◽  
SADAOKI FURUI ◽  
...  

This paper describes a phonetic typewriter and a dictation machine that utilize the underlying statistical structure of phoneme or character sequences. The approach of using syllable or character trigrams is applied to language source modeling. The language source models are obtained by calculating trigram probabilities from a large text database. These models are combined with the HMM-LR continuous speech recognition system.3,6 The phonetic typewriter is tested using 274 phrases uttered by one male speaker. The syllable source model achieves a 94.9% phoneme recognition rate with the test-set phoneme perplexity of 3.9. Without the syllable source model, the phoneme recognition rate is only 73.2%. A trigram model based on characters is also evaluated. This character source model can reduce the syllable perplexity significantly to 7.7, compared with 10.5 of the syllable source model. The character source model achieves a 78.5% character transcription rate for the 274 phrase utterances. The experimental results show that a syllable source model and a character source model are very effective for realizing a Japanese dictation machine.


2013 ◽  
Vol 411-414 ◽  
pp. 1238-1246
Author(s):  
Ousanee Sangkathum ◽  
Ohm Sornil

This paper presents a Thai character recognition method based on topological properties. The method first extracts gradient features from a character image. A two-step classification are then applied to recognize the character. In the first step, a conditional random fields model is used to generate a set of possible characters. Then a nearest neighbor model based on hierarchical centroid distance is employed to finally recognize the character. The proposed method is trained by printed characters from documents and vehicle license plates. The technique is evaluated and found to have the recognition rate of 96.96%.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Junjun Huo

Based on deep learning and digital image processing algorithms, we design and implement an accurate automatic recognition system for bank note text and propose an improved recognition method based on ResNet for the problems of difficult image text extraction and insufficient recognition accuracy. Firstly, a deep hyperparameterized convolution (DO-Conv) is used instead of the traditional convolution in the network to improve the recognition rate while reducing the model parameters. Then, the spatial attention model (SAM) and the squeezed excitation block (SE-Block) are fused and applied to a modified ResNet to extract detailed features of bank note images in the channel and spatial domains. Finally, the label-smoothed cross-entropy (LSCE) loss function is used to train the model to automatically calibrate the network to prevent classification errors. The experimental results demonstrate that the improved model is not easily affected by the image quality, and the model in this paper has good performance in text detection and recognition in specific business ticket scenarios.


Author(s):  
Fang Chen ◽  
Cristiano Masi

Many studies have indicted that stress and workload can effect the recognition accuracy of the speech recognition system. This can include noise, vibration, G-force, information overload, vocal quality in noise, vocal quality and psychological stress, concurrent task performance and vocal fatigue. The commercially available speech recognition system has not yet reached the perfect design to recognize natural human speech. The military application of automatic speech recognition systems has been studied in a wide arrangement. Verbex’ Voice Master was recommended in its instruction book as especially suited well for use in a noisy environment. This system was selected as a candidate system for use in cockpits. Before implementing it in the cockpit, its strengths and weaknesses for special utterances need to be tested in a laboratory environment. The purpose of the study was to investigate the effects of noise on recognition accuracy in dual-task performance. The experiment was carried out in a noise-insulated room. The Verbex’ Voice Master speech recognition system was installed into the computer. Eleven male Swedish students were the subjects. Two noise levels were set up with a combination of mental workload and physical workload. The results showed that without noise and mental workload, the recognition accuracy could be as good as 99.4%. With noise and mental workload, the recognition accuracy could be reduced to 95%. The results indicated that noise had significant effects on the computer error while mental workload had significant effects on both subject error and computer error.


Author(s):  
Manish M. Kayasth ◽  
Bharat C. Patel

The entire character recognition system is logically characterized into different sections like Scanning, Pre-processing, Classification, Processing, and Post-processing. In the targeted system, the scanned image is first passed through pre-processing modules then feature extraction, classification in order to achieve a high recognition rate. This paper describes mainly on Feature extraction and Classification technique. These are the methodologies which play an important role to identify offline handwritten characters specifically in Gujarati language. Feature extraction provides methods with the help of which characters can identify uniquely and with high degree of accuracy. Feature extraction helps to find the shape contained in the pattern. Several techniques are available for feature extraction and classification, however the selection of an appropriate technique based on its input decides the degree of accuracy of recognition. 


Sign in / Sign up

Export Citation Format

Share Document