Neural response based phoneme classification under noisy condition

Author(s):  
Md.Shariful Alam ◽  
Wissam A. Jassim ◽  
Muhammad S.A. Zilany
2011 ◽  
Author(s):  
Christopher B. Sturdy ◽  
Marc T. Avey ◽  
Marisa Hoeschele ◽  
Michele K. Moscicki ◽  
Laurie L. Bloomfield
Keyword(s):  

2011 ◽  
Author(s):  
James Mcpartland ◽  
Danielle Perszyk ◽  
Michael Crowley ◽  
Adam Naples ◽  
Linda C. Mayes

2016 ◽  
Vol 7 (2) ◽  
pp. 76-82
Author(s):  
Hugeng Hugeng ◽  
Edbert Hansel

We have built an application of speech recognition for Indonesian geography dictionary based on Android operating system, named GAIA. This application uses a smartphone as a device to receive input in the form of a spoken word from a user. The approach used in recognition is Hidden Markov Model which is contained in the Pocketsphinx library. The phonemes used are Indonesian phonemes’ rule. The advantage of this application is that it can be used without internet access. In the application testing, word detection is done with four conditions to determine the level of accuracy. The four conditions are near silent, near noisy, far silent, and far noisy. From the testing and analysis conducted, it can be concluded that GAIA application can be built as a speech recognition application on Android for Indonesian geography dictionary; with the results in the near silent condition accuracy of word recognition reaches an average of 52.87%, in the near noisy reaches an average of 14.5%, in the far silent condition reaches an average of 23.2%, and in the far noisy condition reaches an average of 2.8%. Index Terms—speech recognition, Indonesian geography dictionary, Hidden Markov Model, Pocketsphinx, Android.


Author(s):  
Khamis A. Al-Karawi

Background & Objective: Speaker Recognition (SR) techniques have been developed into a relatively mature status over the past few decades through development work. Existing methods typically use robust features extracted from clean speech signals, and therefore in idealized conditions can achieve very high recognition accuracy. For critical applications, such as security and forensics, robustness and reliability of the system are crucial. Methods: The background noise and reverberation as often occur in many real-world applications are known to compromise recognition performance. To improve the performance of speaker verification systems, an effective and robust technique is proposed to extract features for speech processing, capable of operating in the clean and noisy condition. Mel Frequency Cepstrum Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCC) are the mature techniques and the most common features, which are used for speaker recognition. MFCCs are calculated from the log energies in frequency bands distributed over a mel scale. While GFCC has been acquired from a bank of Gammatone filters, which was originally suggested to model human cochlear filtering. This paper investigates the performance of GFCC and the conventional MFCC feature in clean and noisy conditions. The effects of the Signal-to-Noise Ratio (SNR) and language mismatch on the system performance have been taken into account in this work. Conclusion: Experimental results have shown significant improvement in system performance in terms of reduced equal error rate and detection error trade-off. Performance in terms of recognition rates under various types of noise, various Signal-to-Noise Ratios (SNRs) was quantified via simulation. Results of the study are also presented and discussed.


Author(s):  
Brady D Nelson ◽  
Johanna M Jarcho

Abstract An aberrant neural response to rewards has been linked to both depression and social anxiety. Most studies have focused on the neural response to monetary rewards, and few have tested different modalities of reward (e.g., social) that are more salient to particular forms of psychopathology. In addition, most studies contain critical confounds, including contrasting positive and negative feedback and failing to disentangle being correct from obtaining positive feedback. In the present study, 204 participants underwent electroencephalography during monetary and social feedback tasks that were matched in trial structure, timing, and feedback stimuli. The reward positivity (RewP) was measured in response to correctly identifying stimuli that resulted in monetary win, monetary loss, social like, or social dislike feedback. All monetary and social tasks elicited a RewP, which were positively correlated. Across all tasks, the RewP was negatively associated with depression and positively associated with social anxiety. The RewP to social dislike feedback, independent of monetary and social like feedback, was also associated with social anxiety. The present study suggests that a domain-general neural response to correct feedback demonstrates a differential association with depression and social anxiety, but a domain-specific neural response to social dislike feedback is uniquely associated with social anxiety.


2021 ◽  
Vol 11 (1) ◽  
pp. 428
Author(s):  
Donghoon Oh ◽  
Jeong-Sik Park ◽  
Ji-Hwan Kim ◽  
Gil-Jin Jang

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.


Sign in / Sign up

Export Citation Format

Share Document