The Effects of Channel Number on Classification Performance for sEMG-based Speech Recognition

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

Download Full-text

Neural networks used for speech recognition

Journal of Automatic Control ◽

10.2298/jac1001001g ◽

2010 ◽

Vol 20 (1) ◽

pp. 1-7 ◽

Cited By ~ 28

Author(s):

Wouter Gevaert ◽

Georgi Tsenov ◽

Valeri Mladenov

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Recognition ◽

Radial Basis Functions ◽

Back Propagation ◽

Classification Performance ◽

Basis Functions ◽

Back Propagation Algorithm ◽

Feed Forward Neural Network ◽

Propagation Algorithm

In this paper is presented an investigation of the speech recognition classification performance. This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier. The utilized standard neural network types include Feed-forward Neural Network (NN) with back propagation algorithm and a Radial Basis Functions Neural Networks.

Download Full-text

Measuring Mandarin Speech Recognition Thresholds Using the Method of Adaptive Tracking

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-h-18-0162 ◽

2019 ◽

Vol 62 (6) ◽

pp. 2009-2017

Author(s):

Yuxia Wang ◽

Zhaoyu Lu ◽

Xiaohu Yang ◽

Chang Liu

Keyword(s):

Speech Recognition ◽

Adaptive Tracking ◽

Mandarin Speech Recognition

Download Full-text

Selecting the Optimal FM System for Children With Cochlear Implants

Perspectives on Hearing and Hearing Disorders in Childhood ◽

10.1044/hhdc18.1.19 ◽

2008 ◽

Vol 18 (1) ◽

pp. 19-24

Author(s):

Erin C. Schafer

Keyword(s):

Speech Recognition ◽

Cochlear Implants ◽

Empirical Research ◽

Background Noise ◽

Signal To Noise Ratio ◽

Evidence Based ◽

Signal To Noise ◽

Speech Processor ◽

System Input ◽

Optimal Type

Children who use cochlear implants experience significant difficulty hearing speech in the presence of background noise, such as in the classroom. To address these difficulties, audiologists often recommend frequency-modulated (FM) systems for children with cochlear implants. The purpose of this article is to examine current empirical research in the area of FM systems and cochlear implants. Discussion topics will include selecting the optimal type of FM receiver, benefits of binaural FM-system input, importance of DAI receiver-gain settings, and effects of speech-processor programming on speech recognition. FM systems significantly improve the signal-to-noise ratio at the child's ear through the use of three types of FM receivers: mounted speakers, desktop speakers, or direct-audio input (DAI). This discussion will aid audiologists in making evidence-based recommendations for children using cochlear implants and FM systems.

Download Full-text

Effects of Aging on Response Criteria in Speech-Recognition Tasks

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2902.155 ◽

1986 ◽

Vol 29 (2) ◽

pp. 155-162 ◽

Cited By ~ 21

Author(s):

Sandra Gordon-Salant

Keyword(s):

Speech Recognition ◽

Response Criteria ◽

Effects Of Aging

Download Full-text

Aided Speech Recognition Abilities of Adults With a Severe or Severe-to-Profound Hearing Loss

Journal of Speech Language and Hearing Research ◽

10.1044/jslhr.4102.285 ◽

1998 ◽

Vol 41 (2) ◽

pp. 285-299 ◽

Cited By ~ 18

Author(s):

Mark C. Flynn ◽

Richard C. Dowell ◽

Graeme M. Clark

Keyword(s):

Hearing Loss ◽

Speech Recognition ◽

Profound Hearing Loss

Download Full-text

The Sound of Enemies and Friends in the Neighborhood

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000113 ◽

2011 ◽

Vol 58 (6) ◽

pp. 454-463 ◽

Cited By ~ 1

Author(s):

Diane Pecher ◽

Inge Boot ◽

Saskia van Dantzig ◽

Carol J. Madden ◽

David E. Huber ◽

...

Keyword(s):

Word Recognition ◽

Visual Word Recognition ◽

Neighborhood Effects ◽

Classification Performance ◽

Visual Word ◽

Orthographic Neighborhood ◽

Semantic Classification ◽

Semantic Class ◽

Orthographic Neighbors ◽

Target Words

Previous studies (e.g., Pecher, Zeelenberg, & Wagenmakers, 2005) found that semantic classification performance is better for target words with orthographic neighbors that are mostly from the same semantic class (e.g., living) compared to target words with orthographic neighbors that are mostly from the opposite semantic class (e.g., nonliving). In the present study we investigated the contribution of phonology to orthographic neighborhood effects by comparing effects of phonologically congruent orthographic neighbors (book-hook) to phonologically incongruent orthographic neighbors (sand-wand). The prior presentation of a semantically congruent word produced larger effects on subsequent animacy decisions when the previously presented word was a phonologically congruent neighbor than when it was a phonologically incongruent neighbor. In a second experiment, performance differences between target words with versus without semantically congruent orthographic neighbors were larger if the orthographic neighbors were also phonologically congruent. These results support models of visual word recognition that assume an important role for phonology in cascaded access to meaning.

Download Full-text