The Use of Speech Recognition Systems to Select a Useful Signal in Noisy Speech at a Low Signal-To-Noise Ratio

Author(s):  
Sh. R. Salimov ◽  
N. A. Volkov ◽  
A. V. Ivanov
2021 ◽  
Author(s):  
S.V. Zimina

Setting up artificial neural networks using iterative algorithms is accompanied by fluctuations in weight coefficients. When an artificial neural network solves the problem of allocating a useful signal against the background of interference, fluctuations in the weight vector lead to a deterioration of the useful signal allocated by the network and, in particular, losses in the output signal-to-noise ratio. The goal of the research is to perform a statistical analysis of an artificial neural network, that includes analysis of losses in the output signal-to-noise ratio associated with fluctuations in the weight coefficients of an artificial neural network. We considered artificial neural networks that are configured using discrete gradient, fast recurrent algorithms with restrictions, and the Hebb algorithm. It is shown that fluctuations lead to losses in the output signal/noise ratio, the level of which depends on the type of algorithm under consideration and the speed of setting up an artificial neural network. Taking into account the fluctuations of the weight vector in the analysis of the output signal-to-noise ratio allows us to correlate the permissible level of loss in the output signal-to-noise ratio and the speed of network configuration corresponding to this level when working with an artificial neural network.


2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


2020 ◽  
Vol 19 (03) ◽  
pp. 2050027
Author(s):  
Thandar Oo ◽  
Pornchai Phukpattaranont

When electromyography (EMG) signals are collected from muscles in the torso, they can be perturbed by the electrocardiography (ECG) signals from heart activity. In this paper, we present a novel signal-to-noise ratio (SNR) estimate for an EMG signal contaminated by an ECG signal. We use six features that are popular in assessing EMG signals, namely skewness, kurtosis, mean average value, waveform length, zero crossing and mean frequency. The features were calculated from the raw EMG signals and the detail coefficients of the discrete stationary wavelet transform. Then, these features are used as inputs to a neural network that outputs the estimate of SNR. While we used simulated EMG signals artificially contaminated with simulated ECG signals as the training data, the testing was done with simulated EMG signals artificially contaminated with real ECG signals. The results showed that the waveform length determined with raw EMG signals was the best feature for estimating SNR. It gave the highest average correlation coefficient of 0.9663. These results suggest that the waveform length could be deployed not only in EMG recognition systems but also in EMG signal quality measurements when the EMG signals are contaminated by ECG interference.


2019 ◽  
Vol 28 (1) ◽  
pp. 101-113 ◽  
Author(s):  
Jenna M. Browning ◽  
Emily Buss ◽  
Mary Flaherty ◽  
Tim Vallier ◽  
Lori J. Leibold

Purpose The purpose of this study was to evaluate speech-in-noise and speech-in-speech recognition associated with activation of a fully adaptive directional hearing aid algorithm in children with mild to severe bilateral sensory/neural hearing loss. Method Fourteen children (5–14 years old) who are hard of hearing participated in this study. Participants wore laboratory hearing aids. Open-set word recognition thresholds were measured adaptively for 2 hearing aid settings: (a) omnidirectional (OMNI) and (b) fully adaptive directionality. Each hearing aid setting was evaluated in 3 listening conditions. Fourteen children with normal hearing served as age-matched controls. Results Children who are hard of hearing required a more advantageous signal-to-noise ratio than children with normal hearing to achieve comparable performance in all 3 conditions. For children who are hard of hearing, the average improvement in signal-to-noise ratio when comparing fully adaptive directionality to OMNI was 4.0 dB in noise, regardless of target location. Children performed similarly with fully adaptive directionality and OMNI settings in the presence of the speech maskers. Conclusions Compared to OMNI, fully adaptive directionality improved speech recognition in steady noise for children who are hard of hearing, even when they were not facing the target source. This algorithm did not affect speech recognition when the background noise was speech. Although the use of hearing aids with fully adaptive directionality is not proposed as a substitute for remote microphone systems, it appears to offer several advantages over fixed directionality, because it does not depend on children facing the target talker and provides access to multiple talkers within the environment. Additional experiments are required to further evaluate children's performance under a variety of spatial configurations in the presence of both noise and speech maskers.


2017 ◽  
Vol 28 (05) ◽  
pp. 404-414 ◽  
Author(s):  
Dorothy Neave-DiToro ◽  
Adrienne Rubinstein ◽  
Arlene C. Neuman

Background: Limited attention has been given to the effects of classroom acoustics at the college level. Many studies have reported that nonnative speakers of English are more likely to be affected by poor room acoustics than native speakers. An important question is how classroom acoustics affect speech perception of nonnative college students. Purpose: The combined effect of noise and reverberation on the speech recognition performance of college students who differ in age of English acquisition was evaluated under conditions simulating classrooms with reverberation times (RTs) close to ANSI recommended RTs. Research Design: A mixed design was used in this study. Study Sample: Thirty-six native and nonnative English-speaking college students with normal hearing, ages 18–28 yr, participated. Intervention: Two groups of nine native participants (native monolingual [NM] and native bilingual) and two groups of nine nonnative participants (nonnative early and nonnative late) were evaluated in noise under three reverberant conditions (0.03, 0.06, and 0.08 sec). Data Collection and Analysis: A virtual test paradigm was used, which represented a signal reaching a student at the back of a classroom. Speech recognition in noise was measured using the Bamford–Kowal–Bench Speech-in-Noise (BKB-SIN) test and signal-to-noise ratio required for correct repetition of 50% of the key words in the stimulus sentences (SNR-50) was obtained for each group in each reverberant condition. A mixed-design analysis of variance was used to determine statistical significance as a function of listener group and RT. Results: SNR-50 was significantly higher for nonnative listeners as compared to native listeners, and a more favorable SNR-50 was needed as RT increased. The most dramatic effect on SNR-50 was found in the group with later acquisition of English, whereas the impact of early introduction of a second language was subtler. At the ANSI standard’s maximum recommended RT (0.6 sec), all groups except the NM group exhibited a mild signal-to-noise ratio (SNR) loss. At the 0.8 sec RT, all groups exhibited a mild SNR loss. Conclusion: Acoustics in the classroom are an important consideration for nonnative speakers who are proficient in English and enrolled in college. To address the need for a clearer speech signal by nonnative students (and for all students), universities should follow ANSI recommendations, as well as minimize background noise in occupied classrooms. Behavioral/instructional strategies should be considered to address factors that cannot be compensated for through acoustic design.


2011 ◽  
Vol 22 (06) ◽  
pp. 375-386 ◽  
Author(s):  
Stella L. Ng ◽  
Christine N. Meston ◽  
Susan D. Scollie ◽  
Richard C. Seewald

Background: There is a need for objective pediatric hearing aid outcome measurement and thus a need for the evaluation of outcome measures. We explored a commercially available pediatric sentence-in-noise measure adapted for use as an aided outcome measure. Purpose: The purposes of the current study were (1) to administer an adapted BKB-SIN (Bamford-Kowal-Bench Speech-in-Noise test) to adults and children who have normal hearing and children who use hearing aids and (2) to evaluate the utility of this adapted BKB-SIN as an aided, within-subjects outcome measure for amplification strategies. Research Design: We used a mixed within and between groups design to evaluate speech recognition in noise for the three groups of participants. The children who use hearing aids were tested under the omnidirectional, directional, and digital noise reduction (DNR) conditions. Results from each group were compared to each other, and we compared results of each aided condition for the children who use hearing aids to evaluate the test utility as an aided outcome measure. Study Sample: The study sample consisted of 14 adults with normal hearing (aged 22–28 yr) and 15 children with normal hearing (aged 6–18 yr), recruited through word of mouth, and 14 children who use hearing aids (aged 9–16 yr) recruited from local audiology clinics. Data Collection and Analysis: List pairs of the BKB-SIN test were presented at 50 dB HL as follows: four list pairs to each participant with normal hearing, four list pairs in the omnidirectional condition, and two list pairs in the directional and DNR conditions. Children who use hearing aids were fitted bilaterally with laboratory devices and completed the BKB-SIN test aided. Data were plotted as mean percent of key words correct at each signal-to-noise ratio (SNR). Further, we conducted an analysis of variance for group differences and within-groups for the three aided conditions. Results: Adult participants outperformed children with normal hearing, who outperformed the children who use hearing aids. SNR-50 (signal-to-noise ratio at which listener can obtain a speech recognition score of 50% correct) scores demonstrated reliability of the adapted test implementation. The BKB-SIN test measured significant differences in performance for omnidirectional versus directional microphone conditions but not between omnidirectional and DNR conditions. Conclusions: We conclude that the adapted implementation of the BKB-SIN test can be administered reliably and feasibly. Further study is warranted to develop norms for the adapted implementation as well as to determine if an adapted implementation can be sensitive to age effects. Until such norms are developed, clinicians should refrain from comparing results from the adapted test to the test manual norms and should instead use the adapted implementation as a within-subject measure.


1980 ◽  
Vol 68 (S1) ◽  
pp. S71-S71
Author(s):  
M. M. Sondhi ◽  
C. E. Schmidt ◽  
L. R. Rabiner

2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


Sign in / Sign up

Export Citation Format

Share Document