scholarly journals Deep Neural Network Model of Hearing-Impaired Speech-in-Noise Perception

2020 ◽  
Vol 14 ◽  
Author(s):  
Stephanie Haro ◽  
Christopher J. Smalt ◽  
Gregory A. Ciccarelli ◽  
Thomas F. Quatieri

Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at −20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at −22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.

2021 ◽  
Vol 9 (2) ◽  
pp. 73-84
Author(s):  
Md. Shahadat Hossain ◽  
Md. Anwar Hossain ◽  
AFM Zainul Abadin ◽  
Md. Manik Ahmed

The recognition of handwritten Bangla digit is providing significant progress on optical character recognition (OCR). It is a very critical task due to the similar pattern and alignment of handwriting digits. With the progress of modern research on optical character recognition, it is reducing the complexity of the classification task by several methods, a few problems encounter during recognition and wait to be solved with simpler methods. The modern emerging field of artificial intelligence is the Deep Neural Network, which promises a solid solution to these few handwritten recognition problems. This paper proposed a fine regulated deep neural network (FRDNN) for the handwritten numeric character recognition problem that uses convolutional neural network (CNN) models with regularization parameters which makes the model generalized by preventing the overfitting. This paper applied Traditional Deep Neural Network (TDNN) and Fine regulated deep neural network (FRDNN) models with a similar layer experienced on BanglaLekha-Isolated databases and the classification accuracies for the two models were 96.25% and 96.99%, respectively over 100 epochs. The network performance of the FRDNN model on the BanglaLekha-Isolated digit dataset was more robust and accurate than the TDNN model and depend on experimentation. Our proposed method is obtained a good recognition accuracy compared with other existing available methods.


2021 ◽  
Vol 49 (1) ◽  
Author(s):  
Toufik Datsi ◽  
◽  
Khalid Aznag ◽  
Ahmed El Oirrak ◽  
◽  
...  

Current artificial neural network image recognition techniques use all the pixels of an image as input. In this paper, we present an efficient method for handwritten digit recognition that involves extracting the characteristics of a digit image by coding each row of the image as a decimal value, i.e., by transforming the binary representation into a decimal value. This method is called the decimal coding of rows. The set of decimal values calculated from the initial image is arranged as a vector and normalized; these values represent the inputs to the artificial neural network. The approach proposed in this work uses a multilayer perceptron neural network for the classification, recognition, and prediction of handwritten digits from 0 to 9. In this study, a dataset of 1797 samples were obtained from a digit database imported from the Scikit-learn library. Backpropagation was used as a learning algorithm to train the multilayer perceptron neural network. The results show that the proposed approach achieves better performance than two other schemes in terms of recognition accuracy and execution time.


2020 ◽  
Vol 13 (4) ◽  
pp. 650-656
Author(s):  
Somayeh Khajehasani ◽  
Louiza Dehyadegari

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhonghua Xia ◽  
Jinming Xing ◽  
Changzai Wang ◽  
Xiaofeng Li

There are some problems in the current human motion target gesture recognition algorithms, such as classification accuracy, overlap ratio, low recognition accuracy and recall, and long recognition time. A gesture recognition algorithm of human motion based on deep neural network was proposed. First, Kinect interface equipment was used to collect the coordinate information of human skeleton joints, extract the characteristics of motion gesture nodes, and construct the whole structure of key node network by using deep neural network. Second, the local recognition region was introduced to generate high-dimensional feature map, and the sampling kernel function was defined. The minimum space-time domain of node structure map was located by sampling in the space-time domain. Finally, the deep neural network classifier was constructed to integrate and classify the human motion target gesture data features to realize the recognition of human motion target. The results show that the proposed algorithm has high classification accuracy and overlap ratio of human motion target gesture, the recognition accuracy is as high as 93%, the recall rate is as high as 88%, and the recognition time is 17.8 s, which can effectively improve the human motion target attitude recognition effect.


2019 ◽  
pp. 47-52
Author(s):  
R. Yu. Belorutsky ◽  
S. V. Zhitnik

The problem of recognizing a human speech in the form of digits from one to ten recorded by dictaphone is considered. The method of the sound signal spectrogram recognition by means of convolutional neural networks is used. The algorithms for input data preliminary processing, networks training and words recognition are realized. The recognition accuracy for different number of convolution layers is estimated. Its number is determined and the structure of neural network is proposed. The comparison of recognition accuracy when the input data for the network is spectrogram or first two formants is carried out. The recognition algorithm is tested by male and female voices with different duration of pronunciation.


2020 ◽  
Vol 2020 ◽  
pp. 1-8 ◽  
Author(s):  
Wei Wang ◽  
Yutao Li ◽  
Ting Zou ◽  
Xin Wang ◽  
Jieyu You ◽  
...  

As a lightweight deep neural network, MobileNet has fewer parameters and higher classification accuracy. In order to further reduce the number of network parameters and improve the classification accuracy, dense blocks that are proposed in DenseNets are introduced into MobileNet. In Dense-MobileNet models, convolution layers with the same size of input feature maps in MobileNet models are taken as dense blocks, and dense connections are carried out within the dense blocks. The new network structure can make full use of the output feature maps generated by the previous convolution layers in dense blocks, so as to generate a large number of feature maps with fewer convolution cores and repeatedly use the features. By setting a small growth rate, the network further reduces the parameters and the computation cost. Two Dense-MobileNet models, Dense1-MobileNet and Dense2-MobileNet, are designed. Experiments show that Dense2-MobileNet can achieve higher recognition accuracy than MobileNet, while only with fewer parameters and computation cost.


2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Marc Vander Ghinst ◽  
Mathieu Bourguignon ◽  
Vincent Wens ◽  
Gilles Naeije ◽  
Cecile Ducène ◽  
...  

Abstract Impaired speech perception in noise despite normal peripheral auditory function is a common problem in young adults. Despite a growing body of research, the pathophysiology of this impairment remains unknown. This magnetoencephalography study characterizes the cortical tracking of speech in a multi-talker background in a group of highly selected adult subjects with impaired speech perception in noise without peripheral auditory dysfunction. Magnetoencephalographic signals were recorded from 13 subjects with impaired speech perception in noise (six females, mean age: 30 years) and matched healthy subjects while they were listening to 5 different recordings of stories merged with a multi-talker background at different signal to noise ratios (No Noise, +10, +5, 0 and −5 dB). The cortical tracking of speech was quantified with coherence between magnetoencephalographic signals and the temporal envelope of (i) the global auditory scene (i.e. the attended speech stream and the multi-talker background noise), (ii) the attended speech stream only and (iii) the multi-talker background noise. Functional connectivity was then estimated between brain areas showing altered cortical tracking of speech in noise in subjects with impaired speech perception in noise and the rest of the brain. All participants demonstrated a selective cortical representation of the attended speech stream in noisy conditions, but subjects with impaired speech perception in noise displayed reduced cortical tracking of speech at the syllable rate (i.e. 4–8 Hz) in all noisy conditions. Increased functional connectivity was observed in subjects with impaired speech perception in noise in Noiseless and speech in noise conditions between supratemporal auditory cortices and left-dominant brain areas involved in semantic and attention processes. The difficulty to understand speech in a multi-talker background in subjects with impaired speech perception in noise appears to be related to an inaccurate auditory cortex tracking of speech at the syllable rate. The increased functional connectivity between supratemporal auditory cortices and language/attention-related neocortical areas probably aims at supporting speech perception and subsequent recognition in adverse auditory scenes. Overall, this study argues for a central origin of impaired speech perception in noise in the absence of any peripheral auditory dysfunction.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-5
Author(s):  
Huafeng Chen ◽  
Maosheng Zhang ◽  
Zhengming Gao ◽  
Yunhong Zhao

Current methods of chaos-based action recognition in videos are limited to the artificial feature causing the low recognition accuracy. In this paper, we improve ChaosNet to the deep neural network and apply it to action recognition. First, we extend ChaosNet to deep ChaosNet for extracting action features. Then, we send the features to the low-level LSTM encoder and high-level LSTM encoder for obtaining low-level coding output and high-level coding results, respectively. The agent is a behavior recognizer for producing recognition results. The manager is a hidden layer, responsible for giving behavioral segmentation targets at the high level. Our experiments are executed on two standard action datasets: UCF101 and HMDB51. The experimental results show that the proposed algorithm outperforms the state of the art.


Author(s):  
Teddy Surya Gunawan ◽  
Ahmad Fakhrur Razi Mohd Noor ◽  
Mira Kartiwi

Due to the advanced in GPU and CPU, in recent years, Deep Neural Network (DNN) becomes popular to be utilized both as feature extraction and classifier. This paper aims to develop offline handwritten recognition system using DNN. First, two popular English digits and letters database, i.e. MNIST and EMNIST, were selected to provide dataset for training and testing phase of DNN. Altogether, there are 10 digits [0-9] and 52 letters [a-z, A-Z]. The proposed DNN used stacked two autoencoder layers and one softmax layer. Recognition accuracy for English digits and letters is 97.7% and 88.8%, respectively. Performance comparison with other structure of neural networks revealed that the weighted average recognition rate for patternnet, feedforwardnet, and proposed DNN were 80.3%, 68.3%, and 90.4%, respectively. It shows that our proposed system is able to recognize handwritten English digits and letters with high accuracy.


Author(s):  
Asha

The optimization of the problems significantly improves the solution of the complex problems. The reduction in the feature dimensionality is enormously salient to reduce the redundant features and improve the system accuracy. In this paper, an amalgamation of different concepts is proposed to optimize the features and improve the system classification. The experiment is performed on the facial expression detection application by proposing the amalgamation of deep neural network models with the variants of the gravitational search algorithm. Facial expressions are the movement of the facial components such as lips, nose, eyes that are considered as the features to classify human emotions into different classes. The initial feature extraction is performed with the local binary pattern. The extracted feature set is optimized with the variants of gravitational search algorithm (GSA) as standard gravitational search algorithm (SGSA), binary gravitational search algorithm (BGSA) and fast discrete gravitational search algorithm (FDGSA). The deep neural network models of deep convolutional neural network (DCNN) and extended deep convolutional neural network (EDCNN) are employed for the classification of emotions from imagery datasets of JAFFE and KDEF. The fixed pose images of both the datasets are acquired and comparison based on average recognition accuracy is performed. The comparative analysis of the mentioned techniques and state-of-the-art techniques illustrates the superior recognition accuracy of the FDGSA with the EDCNN technique.


Sign in / Sign up

Export Citation Format

Share Document