AN EFFICIENT SPEECH RECOGNITION SYSTEM FOR THE INITIALS OF MANDARIN SYLLABLES

Author(s):  
PEI-YIH TING ◽  
CHIU-YU TSENG ◽  
LIN-SHAN LEE

In a long-term research project, the recognition of Mandarin speech for very large vocabulary and unlimited text is considered. Its first stage goal is to recognize the Mandarin syllables. In a previous paper, an initial/final two-phase recognition approach to recognize these very confusing syllables was proposed, in which each syllable is divided into initial and final parts and recognized separately, and efficient recognition techniques for the finals were proposed and discussed. This paper serves as a continuation and proposes an efficient system to recognize the Mandarin initials. In this system, a classification procedure is first used to categorize the unknown initials into two groups C1 and C2; different approaches are then separately applied and independently optimized to recognize C1 and C2. It is found that Finite State Vector Quantization (FSVQ) is very useful, whose two modified versions, Modified FSVQ (MFSVQ) and the Second Order FSVQ (SOFSVQ), can provide the best recognition performance for C1 and C2 by carefully adjusting a design parameter called characteristic interval. Experimental results show that a recognition rate of 94.1% to 94.7% can be achieved using this system. Such a design is accomplished by carefully considering the special characteristics of Mandarin syllables and initials.

2020 ◽  
Vol 5 (2) ◽  
pp. 609
Author(s):  
Segun Aina ◽  
Kofoworola V. Sholesi ◽  
Aderonke R. Lawal ◽  
Samuel D. Okegbile ◽  
Adeniran I. Oluwaranti

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.


2014 ◽  
Vol 2 (2) ◽  
pp. 43-53 ◽  
Author(s):  
S. Rojathai ◽  
M. Venkatesulu

In speech word recognition systems, feature extraction and recognition plays a most significant role. More number of feature extraction and recognition methods are available in the existing speech word recognition systems. In most recent Tamil speech word recognition system has given high speech word recognition performance with PAC-ANFIS compared to the earlier Tamil speech word recognition systems. So the investigation of speech word recognition by various recognition methods is needed to prove their performance in the speech word recognition. This paper presents the investigation process with well known Artificial Intelligence method as Feed Forward Back Propagation Neural Network (FFBNN) and Adaptive Neuro Fuzzy Inference System (ANFIS). The Tamil speech word recognition system with PAC-FFBNN performance is analyzed in terms of statistical measures and Word Recognition Rate (WRR) and compared with PAC-ANFIS and other existing Tamil speech word recognition systems.


2020 ◽  
Vol 39 (4) ◽  
pp. 5749-5760
Author(s):  
Yanfei Hai

The purpose of this paper is to use English specific syllables and prosodic features in spoken speech data to carry out English spoken recognition, and to explore effective methods for the design and application of English speech detection and automatic recognition systems. The method proposed by this study is a combination of SVM_FF based classifier, SVM_IER based classifier and syllable classifier. Compared with the method based on the combination of other phonological characteristics such as phonological rate, intensity, formant and energy statistics and pronunciation rate, and the syllable-based classifier based on specific syllable training, a better recognition rate is obtained. In addition, this study conducts simulation experiments on the proposed English recognition and identification method based on specific syllables and prosodic features and analyzes the experimental results. The result found that the recognition performance of the English spoken recognition system constructed by this study is significantly better than the traditional model.


Author(s):  
Y. S. Huang ◽  
K. Liu ◽  
C. Y. Suen ◽  
Y. Y. Tang

This paper proposes a novel method which enables a Chinese character recognition system to obtain reliable recognition. In this method, two thresholds, i.e. class region thresholdRk and disambiguity thresholdAk, are used by each Chinese character k when the classifier is designed based on the nearest neighbor rule, where Rk defines the pattern distribution region of character k, and Ak prevents the samples not belonging to character k from being ambiguously recognized as character k. A novel algorithm to derive the appropriate thresholds Ak and Rk is developed so that a better recognition reliability can be obtained through iterative learning. Experiments performed on the ITRI printed Chinese character database have achieved highly reliable recognition performance (such as 0.999 reliability with a 95.14% recognition rate), which shows the feasibility and effectiveness of the proposed method.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5185
Author(s):  
Yu Zhai ◽  
Jieyu Lei ◽  
Wenze Xia ◽  
Shaokun Han ◽  
Fei Liu ◽  
...  

This work introduces a super-resolution (SR) algorithm for range images on the basis of self-guided joint filtering (SGJF), adding the range information of the range image as a coefficient of the filter to reduce the influence of the intensity image texture on the super-resolved image. A range image SR recognition system is constructed to study the effect of four SR algorithms including the SGJF algorithm on the recognition of the laser radar (ladar) range image. The effects of different model library sizes, SR algorithms, SR factors and noise conditions on the recognition are tested via experiments. Results demonstrate that all tested SR algorithms can improve the recognition rate of low-resolution (low-res) range images to varying degrees and the proposed SGJF algorithm has a very good comprehensive recognition performance. Finally, suggestions for the use of SR algorithms in actual scene recognition are proposed on the basis of the experimental results.


2016 ◽  
Vol 2016 ◽  
pp. 1-15 ◽  
Author(s):  
Yibing Li ◽  
Jie Chen ◽  
Fang Ye ◽  
Dandan Liu

ATR system has a broad application prospect in the military field, especially in the field of modern defense technology. When paradoxes are in existence in ATR system due to adverse battlefield environment, integration cannot be effectively and reliably carried out only by traditional DS evidence theory. In this paper, a modified DS evidence theory is presented and applied in IR/MMW target recognition system. The improvement of DS evidence theory is realized by three parts: the introduction of sensor priority and evidence credibility to realize the discount processing of evidences, the modification of DS combination rule to enhance the accuracy of synthesis results, and the compound decision-making rule. The application of the modified algorithm in IR/MMW system is designed to deal with paradoxes, improve the target recognition rate, and ensure the reliability of target recognition system. Experiments are given to illustrate that the introduction of the modified DS evidence theory in IR/MMW system is better able to realize satisfactory target recognition performance through multisensor information fusion than any single-mode system.


Author(s):  
Brendan MccCane ◽  
Terry Caelli ◽  
Olivier de Vel

In this paper we further explore the use of machine learning (ML) for the recognition of 3D objects in isolation or embedded in scenes. Of particular interest is the use of a recent ML technique (specifically CRG — Conditional Rule Generation) which generates descriptions of objects in terms of object parts and part-relational attribute bounds. We show how this technique can be combined with intensity-based model and scene–views to locate objects and their pose. The major contributions of this paper are: the extension of the CRG classifier to incorporate fuzzy decisions (FCRG), the application of the FCRG classifier to the problem of learning 3D objects from 2D intensity images, the study of the usefulness of sparse depth data in regards to recognition performance, and the implementation of a complete object recognition system that does not rely on perfect or synthetic data. We report a recognition rate of 80% for unseen single object scenes in a database of 18 non-trivial objects.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Su Yang ◽  
Jose Miguel Sanchez Bornot ◽  
Ricardo Bruña Fernandez ◽  
Farzin Deravi ◽  
KongFatt Wong-Lin ◽  
...  

AbstractMagnetoencephalography (MEG) has been combined with machine learning techniques, to recognize the Alzheimer’s disease (AD), one of the most common forms of dementia. However, most of the previous studies are limited to binary classification and do not fully utilize the two available MEG modalities (extracted using magnetometer and gradiometer sensors). AD consists of several stages of progression, this study addresses this limitation by using both magnetometer and gradiometer data to discriminate between participants with AD, AD-related mild cognitive impairment (MCI), and healthy control (HC) participants in the form of a three-class classification problem. A series of wavelet-based biomarkers are developed and evaluated, which concurrently leverage the spatial, frequency and time domain characteristics of the signal. A bimodal recognition system based on an improved score-level fusion approach is proposed to reinforce interpretation of the brain activity captured by magnetometers and gradiometers. In this preliminary study, it was found that the markers derived from gradiometer tend to outperform the magnetometer-based markers. Interestingly, out of the total 10 regions of interest, left-frontal lobe demonstrates about 8% higher mean recognition rate than the second-best performing region (left temporal lobe) for AD/MCI/HC classification. Among the four types of markers proposed in this work, the spatial marker developed using wavelet coefficients provided the best recognition performance for the three-way classification. Overall, the proposed approach provides promising results for the potential of AD/MCI/HC three-way classification utilizing the bimodal MEG data.


Author(s):  
Soo‐Young Suk ◽  
Hyun‐Yeol Chung

PurposeThe purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile devices. Also, the architecture of a distributed recognition system for providing a more convenient user interface is discussed.Design/methodology/approachIn SCCRE, feature extraction for speech and for character is carried out separately, but the recognition is performed in an engine. The client recognition engine essentially employs a continuous hidden Markov model (CHMM) structure and this CHMM structure consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. This model also adopts the proposed successive state and mixture splitting (SSMS) method for generating context independent model. SSMS optimizes the number of mixtures through splitting in mixture domain and the number of states through splitting in time domain.FindingsThe recognition results show that the developed engine can reduce the total number of Gaussian up to 40 per cent compared with the fixed parameter models at the same recognition performance when applied to speech recognition for mobile devices. It shows that SSMS can reduce the size of memory for models to 65 per cent and that for processing to 82 per cent. Moreover, the recognition time decreases 17 per cent with the SMS model while maintaining the recognition rate.Originality/valueThe proposed system will be very useful for many on‐line multimodal interfaces such as PDAs and mobile applications.


Author(s):  
Naoya Wada ◽  
Shingo Yoshizawa ◽  
Yoshikazu Miyanaga

This paper introduces the extraction of speech features realizing noise robustness for speech recognition. It also explores advanced speech analysis techniques named RSF (Running Spectrum Filtering)/DRA (Dynamic Range Adjustment) in detail. The new experiments on phase recognition were carried out using 40 male and female speakers for training and 5 other male and female speakers for recognition. The result of recognition rate is improved from 17% to 63% under car noise at -10dB SNR for example. It shows the high noise robustness of the proposed system. In addition, the new parallel/pipelined LSI design of the system is proposed. It considerably reduces the calculation time. Using this architecture, the real time speech recognition can be developed. For this system, both of full-custom LSI design and FPGA design are introduced.


Sign in / Sign up

Export Citation Format

Share Document