THE EFFECT OF DIFFERENT OCCUPATIONAL BACKGROUND NOISES ON VOICE RECOGNITION ACCURACY

Author(s):  
Song Li ◽  
Mustafa Ozkan Yerebakan ◽  
Yue Luo ◽  
Ben Amaba ◽  
William Swope ◽  
...  

Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of these use cases has unique characteristics that may impact the effectiveness of voice recognition, which could impact industrial productivity, performance, or even safety. One of the most prominent among them is the unique background noises that are dominant in each industry. The existence of different machinery and different work layouts are primary contributors to this. Another important characteristic is the type of communication that is present in these settings. Daily communication often involves longer sentences uttered under relatively silent conditions, whereas communication in industrial settings is often short and conducted in loud conditions. In this study, we demonstrated the importance of taking these two elements into account by comparing the performances of two voice recognition algorithms under several background noise conditions: a regular Convolutional Neural Network (CNN) based voice recognition algorithm to an Auto Speech Recognition (ASR) based model with a denoising module. Our results indicate that there is a significant performance drop between the typical background noise use (white noise) and the rest of the background noises. Also, our custom ASR model with the denoising module outperformed the CNN based model with an overall performance increase between 14-35% across all background noises. . Both results give proof that specialized voice recognition algorithms need to be developed for these environments to reliably deploy them as control mechanisms.

Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-16 ◽  
Author(s):  
Feng-Ping An

Pedestrian re-recognition is an important research because it affects applications such as intelligent monitoring, content-based video retrieval, and human-computer interaction. It can help relay tracking and criminal suspect detection in large-scale video surveillance systems. Although the existing traditional pedestrian re-recognition methods have been widely applied to address practical problems, they have deficiencies such as low recognition accuracy, inefficient computation, and difficulty to adapt to specific applications. In recent years, the pedestrian re-recognition algorithms based on deep learning have been widely used in the pedestrian re-recognition field because of their strong adaptive ability and high recognition accuracy. The deep learning models provide a technical approach for pedestrian re-recognition tasks with their powerful learning ability. However, the pedestrian re-recognition method based on deep learning also has the following problems: First, the existing deep learning pedestrian re-recognition methods lack memory and prediction mechanisms, and the deep learning methods offer only limited improvement to pedestrian re-recognition accuracy. Second, they exhibit overfitting problems. Finally, initializing the existing LSTM parameters is problematic. In view of this, this paper introduces a revertive connection into the pedestrian re-recognition detector, making it more similar to the human cognitive process by converting a single image into an image sequence; then, the memory image sequence pattern reidentifies the pedestrian image. This approach endows deep learning-based pedestrian re-recognition algorithms with the ability to memorize image sequence patterns and allows them to reidentify pedestrians in images. At the same time, this paper proposes a selective dropout method for shallow learning. Selective dropout uses the classifier obtained through shallow learning to modify the probability that a node weight in the hidden layer is set to 0, thereby eliminating the overfitting phenomenon of the deep learning model. Therefore, this paper also proposes a greedy layer-by-layer pretraining algorithm for initializing LSTM and obtains better generalization performance. Based on the above explanation, this paper proposes a pedestrian re-recognition algorithm based on an optimized LSTM deep learning-sequence memory learning model. Experiments show that the pedestrian re-recognition method proposed in this paper not only has strong self-adaptive ability but also identifies the average accuracy. The proposed method also demonstrates a significant improvement compared with other mainstream methods because it can better memorize and learn the continuous motion of pedestrians and effectively avoid overfitting and parameter initialization in the deep learning model. This proposal provides a technical method and approach for adaptive pedestrian re-recognition algorithms.


2021 ◽  
pp. 1-13
Author(s):  
Shikhar Tyagi ◽  
Bhavya Chawla ◽  
Rupav Jain ◽  
Smriti Srivastava

Single biometric modalities like facial features and vein patterns despite being reliable characteristics show limitations that restrict them from offering high performance and robustness. Multimodal biometric systems have gained interest due to their ability to overcome the inherent limitations of the underlying single biometric modalities and generally have been shown to improve the overall performance for identification and recognition purposes. This paper proposes highly accurate and robust multimodal biometric identification as well as recognition systems based on fusion of face and finger vein modalities. The feature extraction for both face and finger vein is carried out by exploiting deep convolutional neural networks. The fusion process involves combining the extracted relevant features from the two modalities at score level. The experimental results over all considered public databases show a significant improvement in terms of identification and recognition accuracy as well as equal error rates.


2020 ◽  
Vol 14 ◽  
Author(s):  
Stephanie Haro ◽  
Christopher J. Smalt ◽  
Gregory A. Ciccarelli ◽  
Thomas F. Quatieri

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at −20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at −22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.


2019 ◽  
Vol 32 (2) ◽  
pp. 87-109 ◽  
Author(s):  
Galit Buchs ◽  
Benedetta Heimler ◽  
Amir Amedi

Abstract Visual-to-auditory Sensory Substitution Devices (SSDs) are a family of non-invasive devices for visual rehabilitation aiming at conveying whole-scene visual information through the intact auditory modality. Although proven effective in lab environments, the use of SSDs has yet to be systematically tested in real-life situations. To start filling this gap, in the present work we tested the ability of expert SSD users to filter out irrelevant background noise while focusing on the relevant audio information. Specifically, nine blind expert users of the EyeMusic visual-to-auditory SSD performed a series of identification tasks via SSDs (i.e., shape, color, and conjunction of the two features). Their performance was compared in two separate conditions: silent baseline, and with irrelevant background sounds from real-life situations, using the same stimuli in a pseudo-random balanced design. Although the participants described the background noise as disturbing, no significant performance differences emerged between the two conditions (i.e., noisy; silent) for any of the tasks. In the conjunction task (shape and color) we found a non-significant trend for a disturbing effect of the background noise on performance. These findings suggest that visual-to-auditory SSDs can indeed be successfully used in noisy environments and that users can still focus on relevant auditory information while inhibiting irrelevant sounds. Our findings take a step towards the actual use of SSDs in real-life situations while potentially impacting rehabilitation of sensory deprived individuals.


2020 ◽  
Vol 34 (09) ◽  
pp. 13583-13589
Author(s):  
Richa Singh ◽  
Akshay Agarwal ◽  
Maneet Singh ◽  
Shruti Nagpal ◽  
Mayank Vatsa

Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications. Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged. This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working. Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed. We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms. The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models.


2018 ◽  
Vol 28 (10) ◽  
pp. 1850028 ◽  
Author(s):  
Chen Yang ◽  
Xu Han ◽  
Yijun Wang ◽  
Rami Saab ◽  
Shangkai Gao ◽  
...  

The past decade has witnessed rapid development in the field of brain–computer interfaces (BCIs). While the performance is no longer the biggest bottleneck in the BCI application, the tedious training process and the poor ease-of-use have become the most significant challenges. In this study, a spatio-temporal equalization dynamic window (STE-DW) recognition algorithm is proposed for steady-state visual evoked potential (SSVEP)-based BCIs. The algorithm can adaptively control the stimulus time while maintaining the recognition accuracy, which significantly improves the information transfer rate (ITR) and enhances the adaptability of the system to different subjects. Specifically, a spatio-temporal equalization algorithm is used to reduce the adverse effects of spatial and temporal correlation of background noise. Based on the theory of multiple hypotheses testing, a stimulus termination criterion is used to adaptively control the dynamic window. The offline analysis which used a benchmark dataset and an offline dataset collected from 16 subjects demonstrated that the STE-DW algorithm is superior to the filter bank canonical correlation analysis (FBCCA), canonical variates with autoregressive spectral analysis (CVARS), canonical correlation analysis (CCA) and CCA reducing variation (CCA-RV) algorithms in terms of accuracy and ITR. The results show that in the benchmark dataset, the STE-DW algorithm achieved an average ITR of 134 bits/min, which exceeds the FBCCA, CVARS, CCA and CCA-RV. In off-line experiments, the STE-DW algorithm also achieved an average ITR of 116 bits/min. In addition, the online experiment also showed that the STE-DW algorithm can effectively expand the number of applicable users of the SSVEP-based BCI system. We suggest that the STE-DW algorithm can be used as a reliable identification algorithm for training-free SSVEP-based BCIs, because of the good balance between ease of use, recognition accuracy, ITR and user applicability.


Pain Medicine ◽  
2021 ◽  
Author(s):  
Cristina Muñoz Ladrón de Guevara ◽  
Gustavo A Reyes del Paso ◽  
María José Fernández-Serrano ◽  
Stefan Duschek

Abstract Objective The ability to accurately identify facial expressions of emotions is crucial in human interaction. While a previous study suggested deficient emotional face recognition in patients with fibromyalgia, not much is known about the origin of this impairment. Against this background, this study investigated the role of executive functions. Executive functions refer to cognitive control mechanisms enabling implementation and coordination of basic mental operations. Deficits in this domain are prevalent in fibromyalgia. Methods Fifty-two fibromyalgia patients and thirty-two healthy individuals completed the Ekman-60 Faces Test, which requires classification of facial displays of happiness, sadness, anger, fear, surprise and disgust. They also completed eight tasks assessing the executive function components of shifting, updating and inhibition. Effects of comorbid depression and anxiety disorders, and medication use, were tested in stratified analyses of patient subgroups. Results Patients made more errors overall than controls in classifying the emotional expressions. Moreover, their recognition accuracy correlated positively with performance on most of the executive function tasks. Emotion recognition did not vary as a function of comorbid psychiatric disorders or medication use. Conclusions The study supports impaired facial emotion recognition in fibromyalgia, which may contribute to the interaction problems and poor social functioning characterizing this condition. Facial emotion recognition is regarded as a complex process, which may be particularly reliant on efficient coordination of various basic operations by executive functions. As such, the correlations between cognitive task performance and recognition accuracy suggest that deficits in higher cognitive functions underlie impaired emotional communication in fibromyalgia.


2013 Africon ◽  
2013 ◽  
Author(s):  
Dimitris Tsiamasiotis ◽  
Ioannis Papaefstathiou ◽  
Charalampos Manifavas

2012 ◽  
Vol 433-440 ◽  
pp. 5951-5956
Author(s):  
Fu Jin Zhang ◽  
Yu Chun Ma ◽  
Hong Xu Wang ◽  
Qing Zhang

Definition of conversion from a single value data to the Vague value data is given; two conversion formulas from a single value data to the Vague value data are given; a similarity measure formula between Vague sets are given; Vague pattern recognition algorithm is given. The algorithm is applied to irrigation system design, application examples show that theVague pattern recognition algorithms and formulas are all useful.


Sign in / Sign up

Export Citation Format

Share Document