Tamil speech enhancement using non-linear spectral subtraction

In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems’ input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.

Download Full-text

Speech enhancement based on a combined spectral subtraction with spectral estimation in various noise environment

2008 International Conference on Audio, Language and Image Processing ◽

10.1109/icalip.2008.4590225 ◽

2008 ◽

Cited By ~ 1

Author(s):

Guangyan Wang ◽

Xia Wang ◽

Xiaoqun Zhao

Keyword(s):

Speech Enhancement ◽

Spectral Estimation ◽

Spectral Subtraction ◽

Noise Environment

Download Full-text

Speech intelligibility enhancement for Thai-speaking cochlear implant listeners

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i3.pp866-875 ◽

2019 ◽

Vol 13 (3) ◽

pp. 866

Author(s):

Siriporn Dachasilaruk ◽

Niphat Jantharamin ◽

Apichai Rungruang

Keyword(s):

Cochlear Implant ◽

Speech Enhancement ◽

Speech Intelligibility ◽

English Language ◽

Single Channel ◽

Spectral Subtraction ◽

Monosyllabic Words ◽

Listening Environments ◽

Babble Noise ◽

Vocoded Speech

Cochlear implant (CI) listeners encounter difficulties in communicating with other persons in noisy listening environments. However, most CI research has been carried out using the English language. In this study, single-channel speech enhancement (SE) strategies as a pre-processing approach for the CI system were investigated in terms of Thai speech intelligibility improvement. Two SE algorithms, namely multi-band spectral subtraction (MBSS) and Weiner filter (WF) algorithms, were evaluated. Speech signals consisting of monosyllabic and bisyllabic Thai words were degraded by speech-shaped noise and babble noise at SNR levels of 0, 5, and 10 dB. Then the noisy words were enhanced using SE algorithms. The enhanced words were fed into the CI system to synthesize vocoded speech. The vocoded speech was presented to twenty normal-hearing listeners. The results indicated that speech intelligibility was marginally improved by the MBSS algorithm and significantly improved by the WF algorithm in some conditions. The enhanced bisyllabic words showed a noticeably higher intelligibility improvement than the enhanced monosyllabic words in all conditions, particularly in speech-shaped noise. Such outcomes may be beneficial to Thai-speaking CI listeners.

Download Full-text

Speech Enhancement by Spectral Subtraction Method

International Journal of Computer Applications ◽

10.5120/16858-6739 ◽

2014 ◽

Vol 96 (13) ◽

pp. 45-48 ◽

Cited By ~ 4

Author(s):

Kaladharan N

Keyword(s):

Speech Enhancement ◽

Spectral Subtraction ◽

Subtraction Method ◽

Spectral Subtraction Method

Download Full-text

GUI Based Execution of Discourse Upgrade System

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1415.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 2151-2155

Keyword(s):

Mean Square Error ◽

Speech Enhancement ◽

Minimum Mean Square Error ◽

Spectral Subtraction ◽

Mean Square

Discourse, being a key method for correspondence, has been inserted in different applications. In numerous unavoidable circumstances, we are rendered vulnerable attempting to conclude the understandability of the discourse and this is the place Speech improving strategy i.e. evacuation of undesirable foundation commotion, comes into picture. In this paper, an endeavor has been made towards contemplating Speech Enhancement methods, for example, Spectral Subtraction, Minimum Mean Square Error (MMSE), Kalman and Wiener channel. In view of our perceptions and investigation of different execution parameters, we finish up which of the strategies is most reasonable for discourse improvement. The usage of the code for different channels is finished utilizing.

Download Full-text