scholarly journals Robust Perceptual Wavelet Packet Features for Recognition of Continuous Kannada Speech

Author(s):  
Mahadeva Swamy ◽  
D J Ravi

Abstract An ASR system is built for the Continuous Kannada Speech Recognition. The acoustic and language models are created with the help of the Kaldi toolkit. The speech database is created with the native male and female Kannada speakers. The 75% of collected speech data is used for training the acoustic models and 25% of speech database is used for the system testing. The Performance of the system is presented interms of Word Error Rate (WER). Wavelet Packet Decomposition along with Mel filter bank is used to achieve feature extraction. The proposed feature extraction performs slightly better than the conventional features such as MFCC, PLP interms of WRA and WER under uncontrolled conditions. For the speech corpus collected in Kannada Language, the proposed features shows an improvement in WRA of 1.79% over baseline features.

2014 ◽  
Vol 1070-1072 ◽  
pp. 1941-1944
Author(s):  
Yong Hao Liao ◽  
Bo Liu

In order to improve classification ability and diagnostic accuracy of centrifugal fan signals, a new feature extraction method from fault signals of centrifugal fan vibration based on manifold learning method (MLM) that is a kind of reduction method of data dimension is proposed in this paper.The MLM is able to remain nonlinear information of original signal, to improve the classification and diagnostic ability of fault better than traditional reducing dimension methods. The results in this paper show that, fault feature information of centrifugal fan vibration is extracted effectively by the MLM and the fault feature information of different types are separated effectively in themselves areas. The diagnostic accuracy by feature extracted by the MLM is significantly higher than by the wavelet packet analysis method.


Author(s):  
Feng Chen ◽  
Jian Yang ◽  
Lixuan Zhao

English as a second language is widely used in countries such as Malaysia and Indonesia, and it is common for English words to appear in Malay and Indonesian sentences. Malay and Indonesian have high homology and relatively few electronic language resources. We combine the corpus datasets of these two similar languages to design and implement a HMM–DNN-based cross-lingual speech synthesis system for Malay (including English words) and Indonesian (including English words). The methods used include: sharing synthesis units between Malay, Indonesian, and English, designing unified context attributes and question set in the process of acoustic model training, speaker-adaptive training with speech corpus of these three languages, and synthesizing speech using speaker-dependent Malay and Indonesian acoustic models. Experimental results show that the speech synthesis quality of the system is better than the traditional Hidden Markov model-based cross-lingual speech synthesis system.


Speech classification is one of the challenging issues in speech processing. In this paper, we have done speech classification for the Kannada language. We have gathered a speech database from children aged 4-6 years. The dataset collected are pre-processed and speech feature extraction is done using Mel Frequency Cepstral Coefficients (MFCC) technique. After feature extraction Kannada alphabets are classified using six different Machine Learning (ML) classifiers. The classifier accuracies are compared with each other. Amongst the Deep Learning classifiers, Recursive Neural Network (RNN) gave the highest accuracy of around 93.6 %( for 300 epochs) and Random Forest (RF) gave the highest accuracy of around 88.9% which is a Machine Learning classifier.


Author(s):  
Danny Henry Galatang ◽  
◽  
Suyanto Suyanto ◽  

The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words shows that the proposed MSASR produces much higher word accuracy (76.57%) than the monophone-based one (63.36%). Its performance is comparable to the character-based MDS-E2EASR, which produces 76.90%, and the character-based KST-E2EASR (78.00%). In the future, this monosyllable-based ASR is possible to be improved to the bisyllable-based one to give higher word accuracy. Nevertheless, extensive bisyllable acoustic models must be handled using an advanced method.


2020 ◽  
Vol 14 (4) ◽  
pp. 445-453
Author(s):  
Qian Fan ◽  
Yiqun Zhu

AbstractIn order to solve the problem that the moving span of basic local mean decomposition (LMD) method is difficult to choose reasonably, an improved LMD method (ILMD), which uses three cubic spline interpolation to replace the sliding average, is proposed. On this basis, with the help of noise aided calculation, an ensemble improved LMD method (EILMD) is proposed to effectively solve the modal aliasing problem in original LMD. On the basis of using EILMD to effectively decompose the data of GNSS deformation monitoring series, GNSS deformation feature extraction model based on EILMD threshold denoising is given by means of wavelet soft threshold processing mode and threshold setting method in empirical mode decomposition denoising. Through the analysis of simulated data and the actual GNSS monitoring data in the mining area, the results show that denoising effect of the proposed method is better than EILMD, ILMD and LMD direct coercive denoising methods. It is also better than wavelet analysis denoising method, and has good adaptability. This fully demonstrates the feasibility and effectiveness of the proposed method in GNSS feature extraction.


2020 ◽  
Vol 13 (3) ◽  
pp. 365-388
Author(s):  
Asha Sukumaran ◽  
Thomas Brindha

PurposeThe humans are gifted with the potential of recognizing others by their uniqueness, in addition with more other demographic characteristics such as ethnicity (or race), gender and age, respectively. Over the decades, a vast count of researchers had undergone in the field of psychological, biological and cognitive sciences to explore how the human brain characterizes, perceives and memorizes faces. Moreover, certain computational advancements have been developed to accomplish several insights into this issue.Design/methodology/approachThis paper intends to propose a new race detection model using face shape features. The proposed model includes two key phases, namely. (a) feature extraction (b) detection. The feature extraction is the initial stage, where the face color and shape based features get mined. Specifically, maximally stable extremal regions (MSER) and speeded-up robust transform (SURF) are extracted under shape features and dense color feature are extracted as color feature. Since, the extracted features are huge in dimensions; they are alleviated under principle component analysis (PCA) approach, which is the strongest model for solving “curse of dimensionality”. Then, the dimensional reduced features are subjected to deep belief neural network (DBN), where the race gets detected. Further, to make the proposed framework more effective with respect to prediction, the weight of DBN is fine tuned with a new hybrid algorithm referred as lion mutated and updated dragon algorithm (LMUDA), which is the conceptual hybridization of lion algorithm (LA) and dragonfly algorithm (DA).FindingsThe performance of proposed work is compared over other state-of-the-art models in terms of accuracy and error performance. Moreover, LMUDA attains high accuracy at 100th iteration with 90% of training, which is 11.1, 8.8, 5.5 and 3.3% better than the performance when learning percentage (LP) = 50%, 60%, 70%, and 80%, respectively. More particularly, the performance of proposed DBN + LMUDA is 22.2, 12.5 and 33.3% better than the traditional classifiers DCNN, DBN and LDA, respectively.Originality/valueThis paper achieves the objective detecting the human races from the faces. Particularly, MSER feature and SURF features are extracted under shape features and dense color feature are extracted as color feature. As a novelty, to make the race detection more accurate, the weight of DBN is fine tuned with a new hybrid algorithm referred as LMUDA, which is the conceptual hybridization of LA and DA, respectively.


2012 ◽  
Vol 572 ◽  
pp. 25-30
Author(s):  
Li Jing Han ◽  
Jian Hong Yang ◽  
Min Lin ◽  
Jin Wu Xu

Hot strip tail flick is an abnormal production phenomenon, which brings many damages. To recognize the tail flick signals from all throwing steel strip signals, a feature extraction method based on morphological pattern spectrum is proposed in this paper. The area between signal curves after multiscale opening operation and the horizontal axis is computed as the pattern spectrum value and it reflects the geometric information differences. Then, support vector machine is used as the classifier. Experimental results show that the total correct rate based on pattern spectrum feature reached 96.5%. Compared with wavelet packet energy feature, the total correct rate is 92.1%. So, the feasibility and availability of this new feature extraction method are verified.


Sign in / Sign up

Export Citation Format

Share Document