speech spectrum
Recently Published Documents


TOTAL DOCUMENTS

158
(FIVE YEARS 26)

H-INDEX

19
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Chunliu Shi

Abstract In order to improve the effect of intelligent language translation, this paper analyzes the problems of the MSE cost function used by most of the current DNN-based speech enhancement algorithms, and proposes a deep learning speech enhancement algorithm based on perception-related cost functions. Moreover, this paper embeds the suppression gain parameter estimation into the architecture of the traditional speech enhancement algorithm, and converts the relationship between the noisy speech spectrum and the enhanced speech spectrum into a simple multiplication relationship based on suppression gain combined with deep learning algorithms to construct an intelligent language translation system. Moreover, this paper evaluates the translation effect of the system, analyzes the actual results, and uses simulation tests to verify the performance of the intelligent language translation model constructed in this paper. From the experimental results, it can be seen that the intelligent language translation system based on deep learning algorithms has good results.


2021 ◽  
Vol 6 (2) ◽  
pp. 127-134
Author(s):  
Vijaya Kumar Narne ◽  
Nachiketa Tiwari

Purpose: The Long-Term Average Speech Spectrum (LTASS) and Dynamic Range (DR) of speech strongly influence estimates of Speech Intelligibility Index (SII), gain and compression required for hearing aid fitting. It is also known that acoustic and linguistic characteristics of a language have a bearing on its LTASS and DR. Thus, there is a need to estimate LTASS and DR for Indian languages. The present work on three Indian languages fills this gap and contrasts LTASS and DR attributes of these languages against British English.Methods: For this purpose, LTASS and DR were measured for 21 one-third octave bands in the frequency range of 0.1 to 10 kHz for Hindi, Kannada, Indian English and British English.Results: Our work shows that the DR of Indian languages studied is 7-10 dB less relative to that of British English. We also report that LTASS levels for Indian languages are 7 dB lower relative to British English for frequencies above 1 kHz. Finally, we observed that LTASS and DR attributes across genders were more or less the same.Conclusions: Given the evidence presented in this work that LTASS and DR characteristics for Indian languages analyzed are markedly different than those for BE, there is a need to determine Indian language specific SII, as well as gain and compression parameters used in hearing aids.


2021 ◽  
Author(s):  
Rohun Nisa

In the speech communication process, the desirable speech needs to be addressed under the influence of noise encountered in diverse environments that degrade the speech quality and intelligibility. In opposition to the unfavorable scenario particularly lowered signal-to-noiseratio, the progress of traditional noise suppressive algorithms is hindered, introducing further distortion in speech, making them non-applicable for real-time applications. In order to reduce the complicacies of current algorithms, a hybrid approach for upgrading the quality together with intelligibility of speech is proposed for dealing with real-world hearing scenario. For improving the intelligibility of speech of interest, multiple sub-frame analysis using over-spectral subtractive factor with phase recompense approach is implemented on the multi-channel noise corrupted speech, yielding approximated speech spectrum, that constitutes the pre-processing stage. The approximated speech spectrum and clean speech spectrum forming the training set are further fed to Fully Connected Layered Deep Neural Network to reduce the mean square error with the incorporation of regression network resulting in improved quality for speech. The proposed hybrid network results in upgraded intelligibility and quality in speech signal with improved SNR measured in terms of Short-Time-Objective-Intelligibility (STOI) score, Perceptual-Evaluation-of-Speech-Quality (PESQ) score, Segmental SNR level, and Mean Square Error (MSE) in contrast to prior noise suppressive algorithms together with less complexity of the hybrid network.<br>


2021 ◽  
Author(s):  
Rohun Nisa

In the speech communication process, the desirable speech needs to be addressed under the influence of noise encountered in diverse environments that degrade the speech quality and intelligibility. In opposition to the unfavorable scenario particularly lowered signal-to-noiseratio, the progress of traditional noise suppressive algorithms is hindered, introducing further distortion in speech, making them non-applicable for real-time applications. In order to reduce the complicacies of current algorithms, a hybrid approach for upgrading the quality together with intelligibility of speech is proposed for dealing with real-world hearing scenario. For improving the intelligibility of speech of interest, multiple sub-frame analysis using over-spectral subtractive factor with phase recompense approach is implemented on the multi-channel noise corrupted speech, yielding approximated speech spectrum, that constitutes the pre-processing stage. The approximated speech spectrum and clean speech spectrum forming the training set are further fed to Fully Connected Layered Deep Neural Network to reduce the mean square error with the incorporation of regression network resulting in improved quality for speech. The proposed hybrid network results in upgraded intelligibility and quality in speech signal with improved SNR measured in terms of Short-Time-Objective-Intelligibility (STOI) score, Perceptual-Evaluation-of-Speech-Quality (PESQ) score, Segmental SNR level, and Mean Square Error (MSE) in contrast to prior noise suppressive algorithms together with less complexity of the hybrid network.<br>


2021 ◽  
Author(s):  
Venkatesh Parvathala ◽  
Sri Rama Murty Kodukula ◽  
Siva Ganesh Andhavarapu

<div>In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.</div>


2021 ◽  
Author(s):  
Venkatesh Parvathala ◽  
Sri Rama Murty Kodukula ◽  
Siva Ganesh Andhavarapu

<div>In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in deep neural network (DNN) based speech enhancement. We propose a sliding-window attention network to regress the spectral magnitude mask (SMM) from the noisy speech signal. Even though the network parameters can be estimated by minimizing the mask loss, it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. The network parameters are estimated using a combination of the mask loss and cepstral loss. The proposed network architecture functions like an adaptive comb filter on voiced segments, and emphasizes the pitch harmonics in the speech spectrum. The proposed approach achieves comparable performance with the state-of-the-art methods with much lesser computational complexity.</div>


2021 ◽  
Vol 11 (2) ◽  
pp. 227-243
Author(s):  
Naveen K. Nagaraj

The effect of non-informational speech spectrum noise as a distractor on cognitive and listening comprehension ability was examined in fifty-three young, normal hearing adults. Time-controlled tasks were used to measure auditory working memory (WM) capacity and attention switching (AS) ability. Listening comprehension was measured using a lecture, interview, and spoken narratives test. Noise level was individually set to achieve at least 90% or higher speech intelligibility. Participants’ listening comprehension in the presence of distracting noise was better on inference questions compared to listening in quiet. Their speed of information processing was also significantly faster in WM and AS tasks in noise. These results were consistent with the view that noise may enhance arousal levels leading to faster information processing during cognitive tasks. Whereas the speed of AS was faster in noise, this rapid switching of attention resulted in more errors in updating items. Participants who processed information faster in noise and did so accurately, more effectively switched their attention to refresh/rehearse recall items within WM. More efficient processing deployed in the presence of noise appeared to have led to improvements in WM performance and making inferences in a listening comprehension task. Additional research is required to examine these findings using background noise that can cause informational masking.


Author(s):  
O N Korsun ◽  
V N Yurko ◽  
S A Beyseev ◽  
A S Naukenova ◽  
A K Tulekbaeva
Keyword(s):  

Akustika ◽  
2021 ◽  
pp. 163-167
Author(s):  
Sergei Levin ◽  
Gaziz Tufatulin ◽  
Inna Koroleva ◽  
Viktoriia Vasilyeva ◽  
Elena Levina

The aim was to study amount of attenuation of input signal at the hearing aid (HA) or cochlear implant sound processor (SP) microphone by different protective tools or clothes. Materials and methods. The acoustic measurements were conducted in the soundproof cabin using artificial head with HA/SP and different protective tools, which can influence on microphone function. Probe microphone was integrated in the microphone input of SP and connected with HA verification system. Results. The biggest amount of signal attenuation was observed using water-resistant cases for SP. Changes affect the speech spectrum, therefore using such protective tools can lead to decrease of speech intelligibility. Maximum attenuation was 9.36±0.33 dB at 4000 Hz. Non-hermetic membrane protective cases gave maximum attenuation 7.67±0.18 dB (5000 Hz). Clothes which cover head lead to significant change of signal at microphone up to 9.24±0.16 dB mostly at high-frequencies, which less influences on speech intelligibility. The results confirm that clothes and protective tools for HA of SP show significant attenuation of sounds.


Sign in / Sign up

Export Citation Format

Share Document