THE COMPARATIVE PERFORMANCE EVALUATION OF WINDOW FUNCTIONS UNDER NOISY ENVIRONMENT FOR SPEECH RECOGNITION

2016 ◽  
Vol 78 (5-7) ◽  
Author(s):  
Syifaun Nafisah ◽  
Oyas Wahyunggoro ◽  
Lukito Edi Nugroho

The accuracy and user acceptance of speech recognition systems is increasing in the last few years especially for automated identification and biomedical applications.  In implementation, it works based on the feature of utterance that will be recognized through a feature extraction process.  One process in feature extraction is windowing that is done for minimizing the disruptions at the first and last of the frame.  Basically, many window functions exist such as rectangular window, flat top window, hamming window, etc, but in the real application only hamming or Hanning function that are usually used as  a function in the windowing.  This article will analyzed the performance of all of window functions to prove the performance of those function.  The method that was used are mel-frequencies cepstral coefficients (MFCCs) as feature extractor technique and back propagation neural networks (BPNNs) as classifier.  The result shows that it can produce an accuracy at least 99%.  The optimal accuracy up to 99.86% is achieved using rectangle window with the duration of process is 15.47 msec.  This results show the superior performance of rectangle window as reference to recognize an isolated word based on speech.

Author(s):  
Gurpreet Kaur ◽  
Mohit Srivastava ◽  
Amod Kumar

In command and control applications, feature extraction process is very important for good accuracy and less learning time. In order to deal with these metrics, we have proposed an automated combined speaker and speech recognition technique. In this paper five isolated words are recorded with four speakers, two males and two females. We have used the Mel Frequency Cepstral Coefficient (MFCC)  feature extraction method with Genetic Algorithm to optimize the extracted features and generate an appropriate feature set. In first phase, feature extraction using MFCC is executed following the feature optimization using Genetic Algorithm and in last & third phase, training is conducted using the Deep Neural Network. In the end, evaluation and validation of the proposed work model is done by setting real environment. To check the efficiency of the proposed work, we have calculated the parameters like accuracy, precision rate, recall rate, sensitivity and specificity..


2016 ◽  
Vol 13 (10) ◽  
pp. 6616-6627
Author(s):  
B Kanisha ◽  
G Balakrishnan

Speech recognition process applications are emerging as ever-zooming and efficient mechanisms in the hi-tech universe. There is a host of diverse interactive speech-aware applications in the market. With the rocketing requirement for upcoming embedded platforms and with the incredible increase in the demand for embedded computing, it is highly indispensable that the speech recognition systems (SRS) are put in place at the right time and in the proper form so that it is easily possible to perform multimedia tasks on these mechanisms. In this work, primarily through preprocessing the speech signal is processed where for the recognition of the particular signal, the noise is detached and then it enters into feature extraction in that peak signal frequency and it is compared with the standard signal and recognized. The signal is processed and noise free signal is produced by processing the signal to Mel frequency cepstral coefficients (MFCC), Tri-spectral feature, and discrete wave transform (DWT). To the input of the multi-class Support vector machine, the output of the above mentioned features is given. The processed signal is converted in to text by multi SVM. It is proved that our proposed technique is better than the existing technique by comparing the existing technique (FFBN) feed forward back propagation with the proposed technique. The proposed technique is implemented in the working platform of MATLAB.


Author(s):  
Wida Astuti ◽  
Danang Lenono ◽  
Faizah Faizah

During this time to identify pure and formalin tofu based on color and aroma involving human taster. But this tofu tester still has weaknesses such as subjective. Besides that, the standard chemical analytical methods requires a high cost and need expertise to analyzing it. Basically aroma of tofu is determined by volatile compounds such as heksanal, ethanol, and 1-hexanol, while aroma of formalin tofu is determined by volatile compounds such as OH, CO, and hydrocarbon. Electronic nose based on unselected gas sensor array has the ability to analyze samples with complex compositions that can be known characteristics and qualitative analysis of the samples. Stimulus aroma is transformed by electronic nose into fingerprint data then it is used by feature extraction process using the differential method. The results of feature extraction is used to process the back propagation neural network training to obtain optimal parameters. The parameters have been optimized is then tested on a random tofus. Based on test results, ANN-BP can identify samples with 100% accuracy rate so that the identification of a pure tofu and tofu formalin with electronic nose using back propagation neural network analysis has been successfully carried out.


2020 ◽  
Vol 2 (2) ◽  
pp. 100-108
Author(s):  
Zaurarista Dyarbirru ◽  
Syahroni Hidayat

Voice is the sound emitted from living things. With the development of Automatic Speech Recognition (ASR) technology, voice can be used to make it easier for humans to do something. In the ASR extraction process the features have an important role in the recognition process. The feature extraction methods that are commonly applied to ASR are MFCC and Wavelet. Each of them has advantages and disadvantages. Therefore, this study will combine the wavelet feature extraction method and MFCC to maximize the existing advantages. The proposed method is called Wavelet-MFCC. Voice recognition method that does not use recommendations. Determination of system performance using the Word Recoginition Rate (WRR) method which is validated with the K-Fold Cross Validation with the number of folds is 5. The research dataset used is voice recording digits 0-9 in English. The results show that the digit speech recognition system that has been built gives the highest average value of 63% for digit 4 using wavelet daubechies DB3 and wavelet dyadic transform method. As for the comparison results of the wavelet decomposition method used, that the use of dyadic wavelet transformation is better than the wavelet package.


2012 ◽  
Vol 5 (4) ◽  
pp. 545-550 ◽  
Author(s):  
Aitzol Ezeiza ◽  
Karmele López de Ipiña ◽  
Carmen Hernández ◽  
Nora Barroso

2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


2020 ◽  
Vol 20 (S12) ◽  
Author(s):  
Juan C. Mier ◽  
Yejin Kim ◽  
Xiaoqian Jiang ◽  
Guo-Qiang Zhang ◽  
Samden Lhatoo

Abstract Background Sudden Unexpected Death in Epilepsy (SUDEP) has increased in awareness considerably over the last two decades and is acknowledged as a serious problem in epilepsy. However, the scientific community remains unclear on the reason or possible bio markers that can discern potentially fatal seizures from other non-fatal seizures. The duration of postictal generalized EEG suppression (PGES) is a promising candidate to aid in identifying SUDEP risk. The length of time a patient experiences PGES after a seizure may be used to infer the risk a patient may have of SUDEP later in life. However, the problem becomes identifying the duration, or marking the end, of PGES (Tomson et al. in Lancet Neurol 7(11):1021–1031, 2008; Nashef in Epilepsia 38:6–8, 1997). Methods This work addresses the problem of marking the end to PGES in EEG data, extracted from patients during a clinically supervised seizure. This work proposes a sensitivity analysis on EEG window size/delay, feature extraction and classifiers along with associated hyperparameters. The resulting sensitivity analysis includes the Gradient Boosted Decision Trees and Random Forest classifiers trained on 10 extracted features rooted in fundamental EEG behavior using an EEG specific feature extraction process (pyEEG) and 5 different window sizes or delays (Bao et al. in Comput Intell Neurosci 2011:1687–5265, 2011). Results The machine learning architecture described above scored a maximum AUC score of 76.02% with the Random Forest classifier trained on all extracted features. The highest performing features included SVD Entropy, Petrosan Fractal Dimension and Power Spectral Intensity. Conclusion The methods described are effective in automatically marking the end to PGES. Future work should include integration of these methods into the clinical setting and using the results to be able to predict a patient’s SUDEP risk.


Sign in / Sign up

Export Citation Format

Share Document