Speech signal modification to increase intelligibility in noisy environments

2007 ◽  
Vol 122 (2) ◽  
pp. 1138-1149 ◽  
Author(s):  
Sungyub D. Yoo ◽  
J. Robert Boston ◽  
Amro El-Jaroudi ◽  
Ching-Chung Li ◽  
John D. Durrant ◽  
...  
2014 ◽  
Vol 8 (1) ◽  
pp. 508-511
Author(s):  
Zhongbao Chen ◽  
Zhigang Fang ◽  
Jie Xu ◽  
Pengying Du ◽  
Xiaoping Luo

Speech can be broadly categorized into voiceless, voiced, and mute signal, in which voiced speech can be further classified into vowel and voiced consonant. With the ever increasing demand of the speech synthesis applications, it is urgent to develop an effective classification method to differentiate vowel and voiced consonant signal since they are two distinct components that affect the naturalness of the synthetic speech signal. State-of-the-arts algorithms for speech signal classification are effective in classifying voiceless, voiced and mute speech signal, however, not effective in further classifying the voiced signal. In view of the issue, a new algorithm for speech classification based on Gaussian Mixture Model (GMM) is proposed, which can directly classify a speech into voiceless, voiced consonant, vowel and mute signal. Simulation results demonstrate that the proposed algorithm is effective even under the noisy environments.


2012 ◽  
Vol 2012 ◽  
pp. 1-10
Author(s):  
Ahmad R. Abu-El-Quran ◽  
Adrian D. C. Chan ◽  
Rafik A. Goubran

We introduce a multiengine speech processing system that can detect the location and the type of audio signal in variable noisy environments. This system detects the location of the audio source using a microphone array; the system examines the audio first, determines if it is speech/nonspeech, then estimates the value of the signal to noise (SNR) using a Discrete-Valued SNR Estimator. Using this SNR value, instead of trying to adapt the speech signal to the speech processing system, we adapt the speech processing system to the surrounding environment of the captured speech signal. In this paper, we introduced the Discrete-Valued SNR Estimator and a multiengine classifier, using Multiengine Selection or Multiengine Weighted Fusion. Also we use the SI as example of the speech processing. The Discrete-Valued SNR Estimator achieves an accuracy of 98.4% in characterizing the environment's SNR. Compared to a conventional single engine SI system, the improvement in accuracy was as high as 9.0% and 10.0% for the Multiengine Selection and Multiengine Weighted Fusion, respectively.


2012 ◽  
Vol 532-533 ◽  
pp. 1253-1257
Author(s):  
Li Hai Yao ◽  
Jie Xu ◽  
Hao Jiang

Speech can be broadly categorized into voiceless, voiced, and mute signal, in which voiced speech can be further classified into vowel and voiced consonant. With the ever increasing demand of the speech synthesis applications, it is urgent to develop an effective classification method to differentiate vowel and voiced consonant signal since they are two distinct components that affect the naturalness of the synthetic speech signal. State-of-the-arts algorithms for speech signal classification are effective in classifying voiceless, voiced and mute speech signal, however, not effective in further classifying the voiced signal. In view of the issue, a new algorithm for speech classification based on Gaussian Mixture Model (GMM) is proposed, which can directly classify a speech into voiceless, voiced consonant, vowel and mute signal. Specifically, a new speech feature is proposed, and the GMM is also modified for speech classification. Simulation results demonstrate that the proposed algorithm is effective even under the noisy environments.


2019 ◽  
Vol 8 (4) ◽  
pp. 12692-12695

The introduction of third component in conventional turbo codes improved the code performance for a wide range of block lengths and coding rates with very low error rates. But the parameters such as permeability and permittivity rates were static under noisy environments and hence their adaptability to noisy environment was poor. The proposed A3D-TC has overcome the aforesaid problem. The parameters are made adaptive by generating a Genetic Algorithm (GA) based knowledge source. The bit error rate was minimized by generating parameters based on noise and signal strengths. The improvement is observed for speech signal. At high noise rates the speech signal exhibits minimum bit error rate using this GA based knowledge source and for very few iterations they gave error free signal at low values of signal to noise ratio.


2016 ◽  
Vol 116 (5) ◽  
pp. 2356-2367 ◽  
Author(s):  
Alessandro Presacco ◽  
Jonathan Z. Simon ◽  
Samira Anderson

The ability to understand speech is significantly degraded by aging, particularly in noisy environments. One way that older adults cope with this hearing difficulty is through the use of contextual cues. Several behavioral studies have shown that older adults are better at following a conversation when the target speech signal has high contextual content or when the background distractor is not meaningful. Specifically, older adults gain significant benefit in focusing on and understanding speech if the background is spoken by a talker in a language that is not comprehensible to them (i.e., a foreign language). To understand better the neural mechanisms underlying this benefit in older adults, we investigated aging effects on midbrain and cortical encoding of speech when in the presence of a single competing talker speaking in a language that is meaningful or meaningless to the listener (i.e., English vs. Dutch). Our results suggest that neural processing is strongly affected by the informational content of noise. Specifically, older listeners' cortical responses to the attended speech signal are less deteriorated when the competing speech signal is an incomprehensible language rather than when it is their native language. Conversely, temporal processing in the midbrain is affected by different backgrounds only during rapid changes in speech and only in younger listeners. Additionally, we found that cognitive decline is associated with an increase in cortical envelope tracking, suggesting an age-related over (or inefficient) use of cognitive resources that may explain their difficulty in processing speech targets while trying to ignore interfering noise.


Author(s):  
Imad Qasim Habeeb ◽  
Tamara Z. Fadhil ◽  
Yaseen Naser Jurn ◽  
Zeyad Qasim Habeeb ◽  
Hanan Najm Abdulkhudhur

<span>Automatic speech recognition (ASR) is a technology that allows a computer and mobile device to recognize and translate spoken language into text. ASR systems often produce poor accuracy for the noisy speech signal. Therefore, this research proposed an ensemble technique that does not rely on a single filter for perfect noise reduction but incorporates information from multiple noise reduction filters to improve the final ASR accuracy. The main factor of this technique is the generation of K-copies of the speech signal using three noise reduction filters. The speech features of these copies differ slightly in order to extract different texts from them when processed by the ASR system. Thus, the best among these texts can be elected as final ASR output. The ensemble technique was compared with three related current noise reduction techniques in terms of CER and WER. The test results were encouraging and showed a relatively decreased by 16.61% and 11.54% on CER and WER compared with the best current technique. ASR field will benefit from the contribution of this research to increase the recognition accuracy of a human speech in the presence of background noise.</span>


2011 ◽  
Vol 128-129 ◽  
pp. 749-752 ◽  
Author(s):  
Da Li Hu ◽  
Liang Zhong Yi ◽  
Zheng Pei ◽  
Bing Luo

An improved project based on double thresholds method in noisy environments is proposed for robust endpoints detection. Firstly, in this method, the distribution of zero crossing rate (ZCR) on the preprocessed signal is taken into account, and then the speech signal is divided into different parts to obtain appropriate thresholds with decision trees on the basis of the ZCR distribution. Finally, the double thresholds method, focusing on different importance of the energy and ZCR, is taken in the corresponding situation to determine the input segment is speech or non-speech. Simulation results indicate that the proposed method with decision trees obtains more accurate data than the traditional double thresholds method.


Author(s):  
Martin Chavant ◽  
Alexis Hervais-Adelman ◽  
Olivier Macherey

Purpose An increasing number of individuals with residual or even normal contralateral hearing are being considered for cochlear implantation. It remains unknown whether the presence of contralateral hearing is beneficial or detrimental to their perceptual learning of cochlear implant (CI)–processed speech. The aim of this experiment was to provide a first insight into this question using acoustic simulations of CI processing. Method Sixty normal-hearing listeners took part in an auditory perceptual learning experiment. Each subject was randomly assigned to one of three groups of 20 referred to as NORMAL, LOWPASS, and NOTHING. The experiment consisted of two test phases separated by a training phase. In the test phases, all subjects were tested on recognition of monosyllabic words passed through a six-channel “PSHC” vocoder presented to a single ear. In the training phase, which consisted of listening to a 25-min audio book, all subjects were also presented with the same vocoded speech in one ear but the signal they received in their other ear differed across groups. The NORMAL group was presented with the unprocessed speech signal, the LOWPASS group with a low-pass filtered version of the speech signal, and the NOTHING group with no sound at all. Results The improvement in speech scores following training was significantly smaller for the NORMAL than for the LOWPASS and NOTHING groups. Conclusions This study suggests that the presentation of normal speech in the contralateral ear reduces or slows down perceptual learning of vocoded speech but that an unintelligible low-pass filtered contralateral signal does not have this effect. Potential implications for the rehabilitation of CI patients with partial or full contralateral hearing are discussed.


2011 ◽  
Vol 21 (2) ◽  
pp. 44-54
Author(s):  
Kerry Callahan Mandulak

Spectral moment analysis (SMA) is an acoustic analysis tool that shows promise for enhancing our understanding of normal and disordered speech production. It can augment auditory-perceptual analysis used to investigate differences across speakers and groups and can provide unique information regarding specific aspects of the speech signal. The purpose of this paper is to illustrate the utility of SMA as a clinical measure for both clinical speech production assessment and research applications documenting speech outcome measurements. Although acoustic analysis has become more readily available and accessible, clinicians need training with, and exposure to, acoustic analysis methods in order to integrate them into traditional methods used to assess speech production.


Sign in / Sign up

Export Citation Format

Share Document