An Improved Spectral Subtraction Algorithm Study Based on Voice Human-Computer Interaction in Cockpit

2010 ◽  
Vol 139-141 ◽  
pp. 2154-2157
Author(s):  
Ji Xiang Lu ◽  
Ping Wang ◽  
Long Yi

The voice interaction in cockpit mainly includes speech recognition, enhancement and synthesis. This interaction transfers the speech information to the corresponding orders to make machines in cockpit work unmistaken, also feedback the execution results to users by speech output devices or some other ways. The speech enhancement technology is studied in this paper, aiming at the Voice Interactive. We propose an improved spectral subtraction (SS) algorithm based on auditory masking effect, by using two steps SS. The simulated results based on the segment SNR compared to the traditional SS show the effectiveness and superiority of the improved algorithm.

2011 ◽  
Vol 267 ◽  
pp. 762-767
Author(s):  
Ji Xiang Lu ◽  
Ping Wang ◽  
Hong Zhong Shi ◽  
Xin Wang

As the primary research area of the Multimoda1 Human-computer Interaction, Voice Interaction mainly involves extraction and identification of the natural speech signal, where the former provides the reliable signal sources, which are analyzed by the latter. The multichannel speech enhancement technology is studied in this paper, aiming at the Voice Interactive. The simulated results show the effectiveness and superiority of the improved algorithm proposed in the paper.


2014 ◽  
Vol 543-547 ◽  
pp. 2784-2787
Author(s):  
Ying Ma ◽  
Xiao Hua Zhang ◽  
Bing Lei Xing

Interference is inevitable process of voice communication will be from the surrounding environment and transmission medium noise, communication equipment, electronic noise, and other speakers. These interference makes the voice receiver received for noisy speech signal with noise pollution. According to the traditional spectral subtraction residual musical noise is too strong, the weighted processing is reduced and the power spectrum correction, spectral subtraction method was adopted to improve the traditional. According to the analysis of real speech data collection simulation, improved spectral subtraction can effectively reduce the musical noise, can satisfy the requirement of speech enhancement.


2013 ◽  
Vol 2013 ◽  
pp. 1-7
Author(s):  
Chabane Boubakir ◽  
Daoud Berkani

This paper describes a new speech enhancement approach which employs the minimum mean square error (MMSE) estimator based on the generalized gamma distribution of the short-time spectral amplitude (STSA) of a speech signal. In the proposed approach, the human perceptual auditory masking effect is incorporated into the speech enhancement system. The algorithm is based on a criterion by which the audible noise may be masked rather than being attenuated, thereby reducing the chance of speech distortion. Performance assessment is given to show that our proposal can achieve a more significant noise reduction as compared to the perceptual modification of Wiener filtering and the gamma based MMSE estimator.


2012 ◽  
Vol 433-440 ◽  
pp. 5620-5627 ◽  
Author(s):  
Feng Yu Zhou ◽  
Jin Huan Li ◽  
Guo Hui Tian ◽  
Bao Ye Song ◽  
Cai Hong Li

This paper presents the design and implementation of voice interaction system based on ARM in service robots intelligent space. ARM Cortex-M3 core based STM32F103 is used for main controller. And real-time embedded operating system μC/OS-II is used to schedule different tasks and manage peripheral devices. LD3320 and XFS4041CN are used for speech recognition and speech synthesis, respectively. Establish a dialogue set by defining the set of two-dimensional array, and then intelligent space updates the dialogue set dynamically via ZigBee wireless network for multiple scenes. If the recognition result is got by speech recognition module, dialogue management module will send corresponding texts to speech synthesis module, enabling text-to-speech output. At the same time, intelligent space decision support system will send corresponding commands to equipments by ZigBee. From lots of experiments and practical applications, we can conclude that voice interaction systems designed in the paper can gracefully satisfy the current request of voice interaction in service robots intelligent space, having great application value.


2011 ◽  
Vol 36 (3) ◽  
pp. 519-532 ◽  
Author(s):  
Zhi Tao ◽  
He-Ming Zhao ◽  
Xiao-Jun Zhang ◽  
Di Wu

Abstract This paper proposes a speech enhancement method using the multi-scales and multi-thresholds of the auditory perception wavelet transform, which is suitable for a low SNR (signal to noise ratio) environment. This method achieves the goal of noise reduction according to the threshold processing of the human ear's auditory masking effect on the auditory perception wavelet transform parameters of a speech signal. At the same time, in order to prevent high frequency loss during the process of noise suppression, we first make a voicing decision based on the speech signals. Afterwards, we process the unvoiced sound segment and the voiced sound segment according to the different thresholds and different judgments. Lastly, we perform objective and subjective tests on the enhanced speech. The results show that, compared to other spectral subtractions, our method keeps the components of unvoiced sound intact, while it suppresses the residual noise and the background noise. Thus, the enhanced speech has better clarity and intelligibility.


2010 ◽  
Vol 19 (02) ◽  
pp. 159-173 ◽  
Author(s):  
IOSIF MPORAS ◽  
TODOR GANCHEV ◽  
OTILIA KOCSIS ◽  
NIKOS FAKOTAKIS

In the present work, we investigate the performance of a number of traditional and recent speech enhancement algorithms in the adverse non-stationary conditions, which are distinctive for motorcycles on the move. The performance of these algorithms is ranked in terms of the improvement they contribute to the speech recognition accuracy, when compared to the baseline performance, i.e. without speech enhancement. The experiments on the MoveOn motorcycle speech and noise database indicated that there is no equivalence between the ranking of algorithms based on the human perception of speech quality and the speech recognition performance. The Multi-band spectral subtraction method was observed to lead to the highest speech recognition performance.


Indoor and outdoor Navigation is a tough task for a visually impaired person and they would most of the time require human assistance. Existing solutions to this problem are in the form of smart canes and wearable. Both of which use sensors like on - board proximity and obstacle detection, as well as a haptic or auditory feedback system to warn the user of stationary or incoming obstacles so that they do not collide with any of them as they move. This approach has many drawbacks as it is not yet a stand - alone reliable device for the user to trust when navigating, and when frequently triggered in crowded areas, the feedback system will confuse the user with too many requests resulting in loss of actual information. Our Goal here is to create a Personalized assistant to the user, which they can interact naturally with their voice to mimic the aid of an actual human assistance while they are on the move. It works by using its object detection module with a high reliability training accuracy to detect the boundaries of objects in motion per frame and once the bounding box crosses the threshold accuracy, recognize the object in the box and pass the information to the system core, where it verifies if the information needed to be passed onto the user or not, if yes it passes the converted speech information to the voice interaction model. The voice interaction model is consent-based, it would accept and respond to navigation queries from the user and will intelligently inform them about the obstacle which needs to be avoided. This ensures only the essential information in the form of voice requests is passed onto the user, which they can use to navigate and also interact with the assistant for more information.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7025
Author(s):  
Jenifa Gnanamanickam ◽  
Yuvaraj Natarajan ◽  
Sri Preethaa K. R.

In recent years, speech recognition technology has become a more common notion. Speech quality and intelligibility are critical for the convenience and accuracy of information transmission in speech recognition. The speech processing systems used to converse or store speech are usually designed for an environment without any background noise. However, in a real-world atmosphere, background intervention in the form of background noise and channel noise drastically reduces the performance of speech recognition systems, resulting in imprecise information transfer and exhausting the listener. When communication systems’ input or output signals are affected by noise, speech enhancement techniques try to improve their performance. To ensure the correctness of the text produced from speech, it is necessary to reduce the external noises involved in the speech audio. Reducing the external noise in audio is difficult as the speech can be of single, continuous or spontaneous words. In automatic speech recognition, there are various typical speech enhancement algorithms available that have gained considerable attention. However, these enhancement algorithms work well in simple and continuous audio signals only. Thus, in this study, a hybridized speech recognition algorithm to enhance the speech recognition accuracy is proposed. Non-linear spectral subtraction, a well-known speech enhancement algorithm, is optimized with the Hidden Markov Model and tested with 6660 medical speech transcription audio files and 1440 Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio files. The performance of the proposed model is compared with those of various typical speech enhancement algorithms, such as iterative signal enhancement algorithm, subspace-based speech enhancement, and non-linear spectral subtraction. The proposed cascaded hybrid algorithm was found to achieve a minimum word error rate of 9.5% and 7.6% for medical speech and RAVDESS speech, respectively. The cascading of the speech enhancement and speech-to-text conversion architectures results in higher accuracy for enhanced speech recognition. The evaluation results confirm the incorporation of the proposed method with real-time automatic speech recognition medical applications where the complexity of terms involved is high.


Sign in / Sign up

Export Citation Format

Share Document