AN EFFICIENT SPEECH RECOGNITION SYSTEM FOR THE INITIALS OF MANDARIN SYLLABLES

In a long-term research project, the recognition of Mandarin speech for very large vocabulary and unlimited text is considered. Its first stage goal is to recognize the Mandarin syllables. In a previous paper, an initial/final two-phase recognition approach to recognize these very confusing syllables was proposed, in which each syllable is divided into initial and final parts and recognized separately, and efficient recognition techniques for the finals were proposed and discussed. This paper serves as a continuation and proposes an efficient system to recognize the Mandarin initials. In this system, a classification procedure is first used to categorize the unknown initials into two groups C1 and C2; different approaches are then separately applied and independently optimized to recognize C1 and C2. It is found that Finite State Vector Quantization (FSVQ) is very useful, whose two modified versions, Modified FSVQ (MFSVQ) and the Second Order FSVQ (SOFSVQ), can provide the best recognition performance for C1 and C2 by carefully adjusting a design parameter called characteristic interval. Experimental results show that a recognition rate of 94.1% to 94.7% can be achieved using this system. Such a design is accomplished by carefully considering the special characteristics of Mandarin syllables and initials.

Download Full-text

GESTURE RECOGNITION SYSTEM FOR NIGERIAN TRIBAL GREETING POSTURES USING SUPPORT VECTOR MACHINE

MALAYSIAN JOURNAL OF COMPUTING ◽

10.24191/mjoc.v5i2.10347 ◽

2020 ◽

Vol 5 (2) ◽

pp. 609

Author(s):

Segun Aina ◽

Kofoworola V. Sholesi ◽

Aderonke R. Lawal ◽

Samuel D. Okegbile ◽

Adeniran I. Oluwaranti

Keyword(s):

Support Vector Machine ◽

Gesture Recognition ◽

Recognition Rate ◽

Recognition Task ◽

Recognition System ◽

Human Interaction ◽

Support Vector ◽

System A ◽

Extraction Algorithm ◽

Gaussian Blur

This paper presents the application of Gaussian blur filters and Support Vector Machine (SVM) techniques for greeting recognition among the Yoruba tribe of Nigeria. Existing efforts have considered different recognition gestures. However, tribal greeting postures or gestures recognition for the Nigerian geographical space has not been studied before. Some cultural gestures are not correctly identified by people of the same tribe, not to mention other people from different tribes, thereby posing a challenge of misinterpretation of meaning. Also, some cultural gestures are unknown to most people outside a tribe, which could also hinder human interaction; hence there is a need to automate the recognition of Nigerian tribal greeting gestures. This work hence develops a Gaussian Blur – SVM based system capable of recognizing the Yoruba tribe greeting postures for men and women. Videos of individuals performing various greeting gestures were collected and processed into image frames. The images were resized and a Gaussian blur filter was used to remove noise from them. This research used a moment-based feature extraction algorithm to extract shape features that were passed as input to SVM. SVM is exploited and trained to perform the greeting gesture recognition task to recognize two Nigerian tribe greeting postures. To confirm the robustness of the system, 20%, 25% and 30% of the dataset acquired from the preprocessed images were used to test the system. A recognition rate of 94% could be achieved when SVM is used, as shown by the result which invariably proves that the proposed method is efficient.

Download Full-text

Investigation of ANFIS and FFBNN Recognition Methods Performance in Tamil Speech Word Recognition

International Journal of Software Innovation ◽

10.4018/ijsi.2014040103 ◽

2014 ◽

Vol 2 (2) ◽

pp. 43-53 ◽

Cited By ~ 1

Author(s):

S. Rojathai ◽

M. Venkatesulu

Keyword(s):

Feature Extraction ◽

Word Recognition ◽

Recognition Performance ◽

Recognition Rate ◽

Back Propagation ◽

Recognition System ◽

Inference System ◽

Feed Forward Back Propagation ◽

Statistical Measures ◽

Recognition Systems

In speech word recognition systems, feature extraction and recognition plays a most significant role. More number of feature extraction and recognition methods are available in the existing speech word recognition systems. In most recent Tamil speech word recognition system has given high speech word recognition performance with PAC-ANFIS compared to the earlier Tamil speech word recognition systems. So the investigation of speech word recognition by various recognition methods is needed to prove their performance in the speech word recognition. This paper presents the investigation process with well known Artificial Intelligence method as Feed Forward Back Propagation Neural Network (FFBNN) and Adaptive Neuro Fuzzy Inference System (ANFIS). The Tamil speech word recognition system with PAC-FFBNN performance is analyzed in terms of statistical measures and Word Recognition Rate (WRR) and compared with PAC-ANFIS and other existing Tamil speech word recognition systems.

Download Full-text

Computer-aided teaching mode of oral English intelligent learning based on speech recognition and network assistance

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189052 ◽

2020 ◽

Vol 39 (4) ◽

pp. 5749-5760

Author(s):

Yanfei Hai

Keyword(s):

Recognition Performance ◽

Recognition Rate ◽

Recognition System ◽

Prosodic Features ◽

Simulation Experiments ◽

Teaching Mode ◽

Speech Detection ◽

Rate Intensity ◽

Recognition Systems ◽

Better Than

The purpose of this paper is to use English specific syllables and prosodic features in spoken speech data to carry out English spoken recognition, and to explore effective methods for the design and application of English speech detection and automatic recognition systems. The method proposed by this study is a combination of SVM_FF based classifier, SVM_IER based classifier and syllable classifier. Compared with the method based on the combination of other phonological characteristics such as phonological rate, intensity, formant and energy statistics and pronunciation rate, and the syllable-based classifier based on specific syllable training, a better recognition rate is obtained. In addition, this study conducts simulation experiments on the proposed English recognition and identification method based on specific syllables and prosodic features and analyzes the experimental results. The result found that the recognition performance of the English spoken recognition system constructed by this study is significantly better than the traditional model.

Download Full-text

A Reliability Design Methodology for Chinese Character Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001498000117 ◽

1998 ◽

Vol 12 (02) ◽

pp. 159-172 ◽

Cited By ~ 1

Author(s):

Y. S. Huang ◽

K. Liu ◽

C. Y. Suen ◽

Y. Y. Tang

Keyword(s):

Character Recognition ◽

Chinese Character ◽

Nearest Neighbor ◽

Recognition Performance ◽

Recognition Rate ◽

Recognition System ◽

Reliability Design ◽

Chinese Character Recognition ◽

Nearest Neighbor Rule ◽

Pattern Distribution

This paper proposes a novel method which enables a Chinese character recognition system to obtain reliable recognition. In this method, two thresholds, i.e. class region thresholdRk and disambiguity thresholdAk, are used by each Chinese character k when the classifier is designed based on the nearest neighbor rule, where Rk defines the pattern distribution region of character k, and Ak prevents the samples not belonging to character k from being ambiguously recognized as character k. A novel algorithm to derive the appropriate thresholds Ak and Rk is developed so that a better recognition reliability can be obtained through iterative learning. Experiments performed on the ITRI printed Chinese character database have achieved highly reliable recognition performance (such as 0.999 reliability with a 95.14% recognition rate), which shows the feasibility and effectiveness of the proposed method.

Download Full-text

Research on the Enhancement of Laser Radar Range Image Recognition Using a Super-Resolution Algorithm

Sensors ◽

10.3390/s20185185 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5185

Author(s):

Yu Zhai ◽

Jieyu Lei ◽

Wenze Xia ◽

Shaokun Han ◽

Fei Liu ◽

...

Keyword(s):

Recognition Performance ◽

Recognition Rate ◽

Super Resolution ◽

Recognition System ◽

Scene Recognition ◽

Image Texture ◽

Laser Radar ◽

Range Image ◽

Range Images ◽

Model Library

This work introduces a super-resolution (SR) algorithm for range images on the basis of self-guided joint filtering (SGJF), adding the range information of the range image as a coefficient of the filter to reduce the influence of the intensity image texture on the super-resolved image. A range image SR recognition system is constructed to study the effect of four SR algorithms including the SGJF algorithm on the recognition of the laser radar (ladar) range image. The effects of different model library sizes, SR algorithms, SR factors and noise conditions on the recognition are tested via experiments. Results demonstrate that all tested SR algorithms can improve the recognition rate of low-resolution (low-res) range images to varying degrees and the proposed SGJF algorithm has a very good comprehensive recognition performance. Finally, suggestions for the use of SR algorithms in actual scene recognition are proposed on the basis of the experimental results.

Download Full-text

The Improvement of DS Evidence Theory and Its Application in IR/MMW Target Recognition

Journal of Sensors ◽

10.1155/2016/1903792 ◽

2016 ◽

Vol 2016 ◽

pp. 1-15 ◽

Cited By ~ 17

Author(s):

Yibing Li ◽

Jie Chen ◽

Fang Ye ◽

Dandan Liu

Keyword(s):

Target Recognition ◽

Recognition Performance ◽

Recognition Rate ◽

Evidence Theory ◽

Single Mode ◽

Recognition System ◽

Battlefield Environment ◽

Compound Decision ◽

The Military ◽

Decision Making Rule

ATR system has a broad application prospect in the military field, especially in the field of modern defense technology. When paradoxes are in existence in ATR system due to adverse battlefield environment, integration cannot be effectively and reliably carried out only by traditional DS evidence theory. In this paper, a modified DS evidence theory is presented and applied in IR/MMW target recognition system. The improvement of DS evidence theory is realized by three parts: the introduction of sensor priority and evidence credibility to realize the discount processing of evidences, the modification of DS combination rule to enhance the accuracy of synthesis results, and the compound decision-making rule. The application of the modified algorithm in IR/MMW system is designed to deal with paradoxes, improve the target recognition rate, and ensure the reliability of target recognition system. Experiments are given to illustrate that the introduction of the modified DS evidence theory in IR/MMW system is better able to realize satisfactory target recognition performance through multisensor information fusion than any single-mode system.

Download Full-text

Learning to Recognize 3D Objects using Sparse Depth and Intensity Information

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800149700041x ◽

1997 ◽

Vol 11 (06) ◽

pp. 909-931 ◽

Cited By ~ 8

Author(s):

Brendan MccCane ◽

Terry Caelli ◽

Olivier de Vel

Keyword(s):

Recognition Performance ◽

Recognition Rate ◽

Synthetic Data ◽

Recognition System ◽

Rule Generation ◽

3D Objects ◽

Depth Data ◽

Intensity Information ◽

Intensity Images ◽

Object Parts

In this paper we further explore the use of machine learning (ML) for the recognition of 3D objects in isolation or embedded in scenes. Of particular interest is the use of a recent ML technique (specifically CRG — Conditional Rule Generation) which generates descriptions of objects in terms of object parts and part-relational attribute bounds. We show how this technique can be combined with intensity-based model and scene–views to locate objects and their pose. The major contributions of this paper are: the extension of the CRG classifier to incorporate fuzzy decisions (FCRG), the application of the FCRG classifier to the problem of learning 3D objects from 2D intensity images, the study of the usefulness of sparse depth data in regards to recognition performance, and the implementation of a complete object recognition system that does not rely on perfect or synthetic data. We report a recognition rate of 80% for unseen single object scenes in a database of 18 non-trivial objects.

Download Full-text

Integrated space–frequency–time domain feature extraction for MEG-based Alzheimer’s disease classification

Brain Informatics ◽

10.1186/s40708-021-00145-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Su Yang ◽

Jose Miguel Sanchez Bornot ◽

Ricardo Bruña Fernandez ◽

Farzin Deravi ◽

KongFatt Wong-Lin ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Time Domain ◽

Recognition Performance ◽

Binary Classification ◽

Recognition Rate ◽

Recognition System ◽

Disease Classification ◽

Machine Learning Techniques ◽

Left Frontal Lobe

AbstractMagnetoencephalography (MEG) has been combined with machine learning techniques, to recognize the Alzheimer’s disease (AD), one of the most common forms of dementia. However, most of the previous studies are limited to binary classification and do not fully utilize the two available MEG modalities (extracted using magnetometer and gradiometer sensors). AD consists of several stages of progression, this study addresses this limitation by using both magnetometer and gradiometer data to discriminate between participants with AD, AD-related mild cognitive impairment (MCI), and healthy control (HC) participants in the form of a three-class classification problem. A series of wavelet-based biomarkers are developed and evaluated, which concurrently leverage the spatial, frequency and time domain characteristics of the signal. A bimodal recognition system based on an improved score-level fusion approach is proposed to reinforce interpretation of the brain activity captured by magnetometers and gradiometers. In this preliminary study, it was found that the markers derived from gradiometer tend to outperform the magnetometer-based markers. Interestingly, out of the total 10 regions of interest, left-frontal lobe demonstrates about 8% higher mean recognition rate than the second-best performing region (left temporal lobe) for AD/MCI/HC classification. Among the four types of markers proposed in this work, the spatial marker developed using wavelet coefficients provided the best recognition performance for the three-way classification. Overall, the proposed approach provides promising results for the potential of AD/MCI/HC three-way classification utilizing the bimodal MEG data.

Download Full-text

A speech and character combined recognition engine for mobile devices

International Journal of Pervasive Computing and Communications ◽

10.1108/17427370810890409 ◽

2008 ◽

Vol 4 (2) ◽

pp. 232-249 ◽

Cited By ~ 1

Author(s):

Soo‐Young Suk ◽

Hyun‐Yeol Chung

Keyword(s):

Mobile Devices ◽

Recognition Performance ◽

Recognition Rate ◽

Variable Parameter ◽

Recognition System ◽

Multimodal Interfaces ◽

Personal Digital Assistants ◽

Model Parameters ◽

Recognition Time ◽

Content Type

PurposeThe purpose of this paper is to describe a speech and character combined recognition engine (SCCRE) developed for working on personal digital assistants (PDAs) or on mobile devices. Also, the architecture of a distributed recognition system for providing a more convenient user interface is discussed.Design/methodology/approachIn SCCRE, feature extraction for speech and for character is carried out separately, but the recognition is performed in an engine. The client recognition engine essentially employs a continuous hidden Markov model (CHMM) structure and this CHMM structure consists of variable parameter topology in order to minimize the number of model parameters and to reduce recognition time. This model also adopts the proposed successive state and mixture splitting (SSMS) method for generating context independent model. SSMS optimizes the number of mixtures through splitting in mixture domain and the number of states through splitting in time domain.FindingsThe recognition results show that the developed engine can reduce the total number of Gaussian up to 40 per cent compared with the fixed parameter models at the same recognition performance when applied to speech recognition for mobile devices. It shows that SSMS can reduce the size of memory for models to 65 per cent and that for processing to 82 per cent. Moreover, the recognition time decreases 17 per cent with the SMS model while maintaining the recognition rate.Originality/valueThe proposed system will be very useful for many on‐line multimodal interfaces such as PDAs and mobile applications.

Download Full-text

A Real Time Noise-Robust Speech Recognition System

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.200512.51835 ◽

1970 ◽

Vol 1 (2) ◽

pp. 75-83

Author(s):

Naoya Wada ◽

Shingo Yoshizawa ◽

Yoshikazu Miyanaga

Keyword(s):

Speech Recognition ◽

Real Time ◽

Dynamic Range ◽

Recognition Rate ◽

Recognition System ◽

Noise Robustness ◽

Fpga Design ◽

Male And Female ◽

Parallel Pipelined ◽

Phase Recognition

This paper introduces the extraction of speech features realizing noise robustness for speech recognition. It also explores advanced speech analysis techniques named RSF (Running Spectrum Filtering)/DRA (Dynamic Range Adjustment) in detail. The new experiments on phase recognition were carried out using 40 male and female speakers for training and 5 other male and female speakers for recognition. The result of recognition rate is improved from 17% to 63% under car noise at -10dB SNR for example. It shows the high noise robustness of the proposed system. In addition, the new parallel/pipelined LSI design of the system is proposed. It considerably reduces the calculation time. Using this architecture, the real time speech recognition can be developed. For this system, both of full-custom LSI design and FPGA design are introduced.

Download Full-text