scholarly journals SPEECH RECOGNITION OF KV-PATTERNED INDONESIAN SYLLABLE USING MFCC, WAVELET AND HMM

Kursor ◽  
2016 ◽  
Vol 8 (2) ◽  
pp. 67 ◽  
Author(s):  
Syahroni Hidayat

The Indonesian language is an agglutinative language which has complex suffixes and affixes attached on its root. For this reason there is a high possibility to recognize Indonesian speech based on its syllables. The syllable-based Indonesian speech recognition could reduce the database and recognize new Indonesian vocabularies which evolve as the result of language development. MFCC and WPT daubechies 3rd (DB3) and 7th (DB7) order methods are used in feature extraction process and HMM with Euclidean distance probability is applied for classification. The results shows that the best recognition rateis 75% and 70.8% for MFCC and WPT method respectively, which come from the testing using training data test. Meanwhile, for testing using external data test WPT method excel the MFCC method, where the best recognition rate is 53.1% for WPT and 47% for MFCC. For MFCC the accuracy increased asthe data length and the frame length increased. In WPT, the increase in accuracy is influenced by the length of data, type of the wavelet and decomposition level. It is also found that as the variation of state increased the recognition for both methods decreased.

Author(s):  
Vanajakshi Puttaswamy Gowda ◽  
Mathivanan Murugavelu ◽  
Senthil Kumaran Thangamuthu

<p><span>Continuous speech segmentation and its  recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends  on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal  for recognition system is quite exciting for researchers. In this paper proposed method  is  divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out  by computing  average short term energy and its spectral centroid coefficients of  the speech signal present in the specified window. The segmented outputs are completely  meaningful  segmentation  for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less  number of codebooks  using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient  and effective segmentation    with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data.</span></p>


Author(s):  
Wening Mustikarini ◽  
Risanuri Hidayat ◽  
Agus Bejo

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.


2002 ◽  
Vol 10 (3) ◽  
pp. 221-239 ◽  
Author(s):  
Waleed H. Abdulla

The goal of the speech segments extraction process is to separate acoustic events of interest (the speech segment to be recognised) in a continuously recorded signal from other parts of the signal (background). The recognition rate of many voice command systems is very much dependent on speech segment extraction accuracy. This paper discusses two novel HMM based techniques that segregate a speech segment from its concurrent background. The first method can be reliably used in clean environments while the second method, which makes use of the wavelets denoising technique, is effective in noisy environments. These methods have been implemented and shown superiority over other popular techniques, thus, indicating that they have the potential to achieve greater levels of accuracy in speech recognition rates.


2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4638
Author(s):  
Bummo Koo ◽  
Jongman Kim ◽  
Yejin Nam ◽  
Youngho Kim

In this study, algorithms to detect post-falls were evaluated using the cross-dataset according to feature vectors (time-series and discrete data), classifiers (ANN and SVM), and four different processing conditions (normalization, equalization, increase in the number of training data, and additional training with external data). Three-axis acceleration and angular velocity data were obtained from 30 healthy male subjects by attaching an IMU to the middle of the left and right anterior superior iliac spines (ASIS). Internal and external tests were performed using our lab dataset and SisFall public dataset, respectively. The results showed that ANN and SVM were suitable for the time-series and discrete data, respectively. The classification performance generally decreased, and thus, specific feature vectors from the raw data were necessary when untrained motions were tested using a public dataset. Normalization made SVM and ANN more and less effective, respectively. Equalization increased the sensitivity, even though it did not improve the overall performance. The increase in the number of training data also improved the classification performance. Machine learning was vulnerable to untrained motions, and data of various movements were needed for the training.


2011 ◽  
Vol 189-193 ◽  
pp. 2042-2045 ◽  
Author(s):  
Shang Jen Chuang ◽  
Chiung Hsing Chen ◽  
Chien Chih Kao ◽  
Fang Tsung Liu

English letters cannot be recognized by the Hopfield Neural Network if it contains noise over 50%. This paper proposes a new method to improve recognition rate of the Hopfield Neural Network. To advance it, we add the Gaussian distribution feature to the Hopfield Neural Network. The Gaussian filter was added to eliminate noise and improve Hopfield Neural Network’s recognition rate. We use English letters from ‘A’ to ‘Z’ as training data. The noises from 0% to 100% were generated randomly for testing data. Initially, we use the Gaussian filter to eliminate noise and then to recognize test pattern by Hopfield Neural Network. The results are we found that if letters contain noise between 50% and 53% will become reverse phenomenon or unable recognition [6]. In this paper, we propose to uses multiple filters to improve recognition rate when letters contain noise between 50% and 53%.


2014 ◽  
Vol 571-572 ◽  
pp. 665-671 ◽  
Author(s):  
Sen Xu ◽  
Xu Zhao ◽  
Cheng Hua Duan ◽  
Xiao Lin Cao ◽  
Hui Yan Li ◽  
...  

As One of Features from other Languages, the Chinese Tone Changes of Chinese are Mainly Decided by its Vowels, so the Vowel Variation of Chinese Tone Becomes Important in Speech Recognition Research. the Normal Tone Recognition Ways are Always Based on Fundamental Frequency of Signal, which can Not Keep Integrity of Tone Signal. we Bring Forward to a Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels. Firstly, we will have Pretreatment to Recording Good Tone Signal by Using Cooledit Pro Software, and Converted into Spectrograms; Secondly, we will do Smooth and the Normalized Pretreatment to Spectrograms by Mathematical Morphological Processing; Finally, we get Whole Direction Angle Statistics of Tone Signal by Skeletonization way. the Neural Networks Stimulation Shows that the Speech Emotion Recognition Rate can Reach 92.50%.


Sign in / Sign up

Export Citation Format

Share Document