SPEECH RECOGNITION OF KV-PATTERNED INDONESIAN SYLLABLE USING  MFCC, WAVELET AND HMM

The Indonesian language is an agglutinative language which has complex suffixes and affixes attached on its root. For this reason there is a high possibility to recognize Indonesian speech based on its syllables. The syllable-based Indonesian speech recognition could reduce the database and recognize new Indonesian vocabularies which evolve as the result of language development. MFCC and WPT daubechies 3rd (DB3) and 7th (DB7) order methods are used in feature extraction process and HMM with Euclidean distance probability is applied for classification. The results shows that the best recognition rateis 75% and 70.8% for MFCC and WPT method respectively, which come from the testing using training data test. Meanwhile, for testing using external data test WPT method excel the MFCC method, where the best recognition rate is 53.1% for WPT and 47% for MFCC. For MFCC the accuracy increased asthe data length and the frame length increased. In WPT, the increase in accuracy is influenced by the length of data, type of the wavelet and decomposition level. It is also found that as the variation of state increased the recognition for both methods decreased.

Download Full-text

Continuous kannada speech segmentation and speech recognition based on threshold using MFCC And VQ

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp4684-4695 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4684

Author(s):

Vanajakshi Puttaswamy Gowda ◽

Mathivanan Murugavelu ◽

Senthil Kumaran Thangamuthu

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Speech Signal ◽

Recognition Rate ◽

Recognition System ◽

Training Data ◽

Speech Segmentation ◽

Significant Feature ◽

Mel Frequency Cepstral Coefficients ◽

Simple Method

<p><span>Continuous speech segmentation and its recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal for recognition system is quite exciting for researchers. In this paper proposed method is divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out by computing average short term energy and its spectral centroid coefficients of the speech signal present in the specified window. The segmented outputs are completely meaningful segmentation for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less number of codebooks using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient and effective segmentation with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>

Download Full-text

Real-Time Indonesian Language Speech Recognition with MFCC Algorithms and Python-Based SVM

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.49426 ◽

2019 ◽

Vol 3 (2) ◽

pp. 55

Author(s):

Wening Mustikarini ◽

Risanuri Hidayat ◽

Agus Bejo

Keyword(s):

Speech Recognition ◽

Recognition Rate ◽

Recognition System ◽

Training Data ◽

Support Vector ◽

Data Set ◽

Human Voice ◽

Average Accuracy ◽

Speech Data ◽

Mel Frequency Cepstral Coefficient

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.

Download Full-text

HMM-Based Techniques for Speech Segments Extraction

Scientific Programming ◽

10.1155/2002/819429 ◽

2002 ◽

Vol 10 (3) ◽

pp. 221-239 ◽

Cited By ~ 1

Author(s):

Waleed H. Abdulla

Keyword(s):

Speech Recognition ◽

Recognition Rate ◽

Extraction Process ◽

Noisy Environments ◽

Voice Command ◽

Recorded Signal ◽

Speech Segment ◽

Command Systems ◽

Speech Segments

The goal of the speech segments extraction process is to separate acoustic events of interest (the speech segment to be recognised) in a continuously recorded signal from other parts of the signal (background). The recognition rate of many voice command systems is very much dependent on speech segment extraction accuracy. This paper discusses two novel HMM based techniques that segregate a speech segment from its concurrent background. The first method can be reliably used in clean environments while the second method, which makes use of the wavelets denoising technique, is effective in noisy environments. These methods have been implemented and shown superiority over other popular techniques, thus, indicating that they have the potential to achieve greater levels of accuracy in speech recognition rates.

Download Full-text

Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition

Applied Sciences ◽

10.3390/app11062866 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2866

Author(s):

Damheo Lee ◽

Donghyun Kim ◽

Seung Yun ◽

Sanghun Kim

Keyword(s):

Speech Recognition ◽

Language Model ◽

Reduction Rate ◽

Code Switching ◽

Training Data ◽

Target Domain ◽

Phonetic Variation ◽

Language Model Adaptation ◽

Imbalanced Training Data ◽

Lm Adaptation

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.

Download Full-text

The Performance of Post-Fall Detection Using the Cross-Dataset: Feature Vectors, Classifiers and Processing Conditions

Sensors ◽

10.3390/s21144638 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4638

Author(s):

Bummo Koo ◽

Jongman Kim ◽

Yejin Nam ◽

Youngho Kim

Keyword(s):

Time Series ◽

Fall Detection ◽

Discrete Data ◽

Classification Performance ◽

Training Data ◽

Processing Conditions ◽

Feature Vectors ◽

External Data ◽

Public Dataset ◽

The Cross

In this study, algorithms to detect post-falls were evaluated using the cross-dataset according to feature vectors (time-series and discrete data), classifiers (ANN and SVM), and four different processing conditions (normalization, equalization, increase in the number of training data, and additional training with external data). Three-axis acceleration and angular velocity data were obtained from 30 healthy male subjects by attaching an IMU to the middle of the left and right anterior superior iliac spines (ASIS). Internal and external tests were performed using our lab dataset and SisFall public dataset, respectively. The results showed that ANN and SVM were suitable for the time-series and discrete data, respectively. The classification performance generally decreased, and thus, specific feature vectors from the raw data were necessary when untrained motions were tested using a public dataset. Normalization made SVM and ANN more and less effective, respectively. Equalization increased the sensitivity, even though it did not improve the overall performance. The increase in the number of training data also improved the classification performance. Machine learning was vulnerable to untrained motions, and data of various movements were needed for the training.

Download Full-text

Limited Training Data Robust Speech Recognition Using Kernel-Based Acoustic Models

2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings ◽

10.1109/icassp.2006.1660226 ◽

2006 ◽

Cited By ~ 1

Author(s):

M. Schaffoner ◽

S.E. Kruger ◽

E. Andelic ◽

M. Katz ◽

A. Wendemuth

Keyword(s):

Speech Recognition ◽

Training Data ◽

Robust Speech Recognition ◽

Acoustic Models

Download Full-text

Improving Pattern Recognition Rate by Gaussian Hopfield Neural Network

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.189-193.2042 ◽

2011 ◽

Vol 189-193 ◽

pp. 2042-2045 ◽

Cited By ~ 1

Author(s):

Shang Jen Chuang ◽

Chiung Hsing Chen ◽

Chien Chih Kao ◽

Fang Tsung Liu

Keyword(s):

Neural Network ◽

Pattern Recognition ◽

Gaussian Distribution ◽

Test Pattern ◽

Recognition Rate ◽

Hopfield Neural Network ◽

Training Data ◽

New Method ◽

Gaussian Filter ◽

Testing Data

English letters cannot be recognized by the Hopfield Neural Network if it contains noise over 50%. This paper proposes a new method to improve recognition rate of the Hopfield Neural Network. To advance it, we add the Gaussian distribution feature to the Hopfield Neural Network. The Gaussian filter was added to eliminate noise and improve Hopfield Neural Network’s recognition rate. We use English letters from ‘A’ to ‘Z’ as training data. The noises from 0% to 100% were generated randomly for testing data. Initially, we use the Gaussian filter to eliminate noise and then to recognize test pattern by Hopfield Neural Network. The results are we found that if letters contain noise between 50% and 53% will become reverse phenomenon or unable recognition [6]. In this paper, we propose to uses multiple filters to improve recognition rate when letters contain noise between 50% and 53%.

Download Full-text

A Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels Recognition

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.665 ◽

2014 ◽

Vol 571-572 ◽

pp. 665-671 ◽

Cited By ~ 1

Author(s):

Sen Xu ◽

Xu Zhao ◽

Cheng Hua Duan ◽

Xiao Lin Cao ◽

Hui Yan Li ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Emotion Recognition ◽

Recognition Rate ◽

Morphological Processing ◽

Speech Emotion Recognition ◽

Normal Tone ◽

Tone Recognition ◽

Tone Signal ◽

The Neural Networks

As One of Features from other Languages, the Chinese Tone Changes of Chinese are Mainly Decided by its Vowels, so the Vowel Variation of Chinese Tone Becomes Important in Speech Recognition Research. the Normal Tone Recognition Ways are Always Based on Fundamental Frequency of Signal, which can Not Keep Integrity of Tone Signal. we Bring Forward to a Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels. Firstly, we will have Pretreatment to Recording Good Tone Signal by Using Cooledit Pro Software, and Converted into Spectrograms; Secondly, we will do Smooth and the Normalized Pretreatment to Spectrograms by Mathematical Morphological Processing; Finally, we get Whole Direction Angle Statistics of Tone Signal by Skeletonization way. the Neural Networks Stimulation Shows that the Speech Emotion Recognition Rate can Reach 92.50%.

Download Full-text