scholarly journals Voice-Controlled In-Vehicle Systems: Effects of Voice-Recognition Accuracy in the Presence of Background Noise

Author(s):  
Neil Sokol ◽  
Huei-Yen Winnie Chen ◽  
Birsen Donmez
Author(s):  
Song Li ◽  
Mustafa Ozkan Yerebakan ◽  
Yue Luo ◽  
Ben Amaba ◽  
William Swope ◽  
...  

Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of these use cases has unique characteristics that may impact the effectiveness of voice recognition, which could impact industrial productivity, performance, or even safety. One of the most prominent among them is the unique background noises that are dominant in each industry. The existence of different machinery and different work layouts are primary contributors to this. Another important characteristic is the type of communication that is present in these settings. Daily communication often involves longer sentences uttered under relatively silent conditions, whereas communication in industrial settings is often short and conducted in loud conditions. In this study, we demonstrated the importance of taking these two elements into account by comparing the performances of two voice recognition algorithms under several background noise conditions: a regular Convolutional Neural Network (CNN) based voice recognition algorithm to an Auto Speech Recognition (ASR) based model with a denoising module. Our results indicate that there is a significant performance drop between the typical background noise use (white noise) and the rest of the background noises. Also, our custom ASR model with the denoising module outperformed the CNN based model with an overall performance increase between 14-35% across all background noises. . Both results give proof that specialized voice recognition algorithms need to be developed for these environments to reliably deploy them as control mechanisms.


2020 ◽  
Vol 14 ◽  
Author(s):  
Stephanie Haro ◽  
Christopher J. Smalt ◽  
Gregory A. Ciccarelli ◽  
Thomas F. Quatieri

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at −20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at −22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.


Perception ◽  
1983 ◽  
Vol 12 (2) ◽  
pp. 223-226 ◽  
Author(s):  
Ray Bull ◽  
Harriet Rathborn ◽  
Brian R Clifford

A research programme has been carried out that concerns the accuracy with which listeners can identify a speaker heard once before. The present study examined the voice-recognition abilities of blind listeners, and it was found that they could more accurately select target voices from the test arrays than could sighted people. However, the degree of blindness, the age at onset of blindness, and the number of years of blindness all failed to relate to voice-recognition accuracy.


2018 ◽  
Vol 5 (2) ◽  
pp. 83-98
Author(s):  
Tesar Kurniawan ◽  
Nursin Nursin ◽  
Muhamad Amin Bakrie ◽  
Seta Samsiana

App inventor adalah media pengembang perangkat lunak untuk sistem android, yang memudahkan para  pengembangnya mengembangkan  idenya,  salah satunya aplikasi yang mampu mengendalikan peralatan listrik rumah menggunakan suara melalui  telepon  pintar  yang dapat mengontrol aktivasi peralatan listrik rumah. Google  Speech  digunakan untuk pengenalan suara yang  kemudian memberikan input ke Arduino untuk mengendalikan aktivasi peralatan listrik rumah, Peralatan listrik rumah seperti lampu, motor pompa akuarium, kipas, door lock dan motor servo yang memanfaatkan relay sebagai driver, kemudian dilakukanlah pengujian dan penelitian pada laporan ini berisi tentang pengujian akurasi pengenalan suara google  Speech dan pengujian jarak koneksi Bluetooth. Tingkat keakurasian pada google  Speech  yang paling baik dari 3 bahasa yaitu Bahasa Indonesia disusulBahasa jawa  dan terakhir Bahasa sunda, sedangkan untuk jarak koneksi pada Bluetooth dapat dioperasikan jarak maksimal pada ruang bebas adalah 20 m dan jarak maksimal pada ruang berhalangan adalah 13 m. App inventor is a software developer media for android systems, which makes it easy for developers to develop their ideas, i.e an application that is able to control home electrical appliances using voice over smart phones that can control the activation of home electrical appliances. Google Speech is used for voice recognition which then provides input to Arduino to control the activation of home electrical appliances, such as lamps, aquarium pump motors, fans, door locks. A servo motors is used as drivers, then test and research on this report Contains about Speech google speech recognition accuracy testing and Bluetooth connection distance testing. Level of accuracy on google Speech the best of 3 languages ie Indonesian followed by Java and last language Sundanese, while for the distance on the Bluetooth connection can be operated the maximum distance in free space is 20 m and the maximum distance in the absence room is 13 m.


Fractals ◽  
1997 ◽  
Vol 05 (supp01) ◽  
pp. 165-172 ◽  
Author(s):  
G. van de Wouwer ◽  
P. Scheunders ◽  
D. van Dyck ◽  
M. de Bodt ◽  
F. Wuyts ◽  
...  

The performance of a pattern recognition technique is usually determined by the ability of extracting useful features from the available data so as to effectively characterize and discriminate between patterns. We describe a novel method for feature extraction from speech signals. For this purpose, we generate spectrograms, which are time-frequency representations of the original signal. We show that, by considering this spectrogram as a textured image, a wavelet transform can be applied to generate useful features for recognizing the speech signal. This method is used for the classification of voice dysphonia. Its performance is compared with another technique taken from the literature. A recognition accuracy of 98% is achieved for the classification between normal an dysphonic voices.


1982 ◽  
Vol 26 (3) ◽  
pp. 217-217
Author(s):  
Gary K. Poock ◽  
Norman D. Schwalm ◽  
Ellen F. Roland

The purpose of this study was to investigate the extent to which off-the-shelf voice recognition equipment can perform as a speaker-independent system. By “speaker independence” is meant the ability of the system to be used by a larger number of individuals than that which trained the system (but including those who trained it), and by individuals different from those who trained it. Several independent groups of five subjects trained a threshold T600 voice recognition unit, each subject training the same 50 utterances for a total of 250 utterances. Later, a number of testing trials were conducted wherein subjects tested the system using 1) the 50 utterances that they trained, 2) the utterances that they trained plus the utterances trained by the other four subjects, and 3) the utterances trained only by the other four subjects. Measures of recognition accuracy were percent nonrecognitions and percent misrecognitions. Results will be discussed in light of additional applications for voice recognition if currently available systems prove to be able to function well in a speaker independent mode.


1981 ◽  
Vol 5 (2-3) ◽  
pp. 201-208 ◽  
Author(s):  
Brian R. Clifford ◽  
Harriet Rathborn ◽  
Ray Bull

1989 ◽  
Vol 69 (2) ◽  
pp. 591-594
Author(s):  
Machiko Sannomiya

The present study examined three factors and their interactions which make us feel a voice-recognition system is inconvenient as a device for sending information: sending unit, accuracy of recognition per unit, and response time per unit. The main results were as follows: (1) All three factors influenced feelings of the inconvenience of a voice recognition system. (2) Sending tasks even with the smallest unit such as monosyllable do not cause feelings of inconvenience when recognition accuracy is high (95% or 100%) and response occurs almost in real time. (3) Recognition accuracy and response time cannot compensate for each other to reduce feelings of inconvenience.


1992 ◽  
Author(s):  
Lori R. Van Wallendael ◽  
Amy Surace ◽  
Melissa Brown ◽  
Debbie Hall

Sign in / Sign up

Export Citation Format

Share Document