Acoustic Scenery Recognition Using CWT and Deep Neural Network

2021 ◽  
Author(s):  
Francisco Mondragon ◽  
Jonathan Jimenez ◽  
Mariko Nakano ◽  
Toru Nakashika ◽  
Hector Perez-Meana

The development of acoustic scenes recognition systems has been a topic of extensive research due to its applications in several fields of science and engineering. This paper proposes an environmental system in which firstly a time-frequency representation is obtained using the Continuous Wavelet Transform (CWT). The time frequency representation is then represented as a color image using the Viridis color map, which is then inserted into a Deep Neural Network (DNN) to carry out the classification task. Evaluation results using several public data bases show that proposed scheme provides a classification performance better than the performance provided by other previously proposed schemes.

Biometrics provides greater security and usability than conventional personal authentication methods. Fingerprints, facial identification systems and voice recognition systems are the features that biometric systems can use. To improve biometric authentication, the proposed method considered that the input image is iris and fingerprint; at first, pre-processing is performed through histogram equalization for all image inputs to enhance the image quality. Then the extraction process of the feature will be performed. The suggested method uses modified Local Binary Pattern (MLBP), GLCM with orientation transformation, and DWT features next to the extracted features to be combined for feature extraction. Then the optimum function is found with the Rider Optimization Algorithm (ROA) for all MLBP, GLCM and DWT. Eventually, the approach suggested is accepted. Deep Neural Network (DNN) performs the proposed authentication process. A DNN is a multilayered artificial neural network between the layers of input and output. The DNN finds the right mathematical manipulation to turn the input into the output, whether it is an acknowledged image or not. Suggested process quality is measured in terms of reliability recognition. In the MATLAB platform, the suggested approach is implemented.


Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1199 ◽  
Author(s):  
Hyeon Kyu Lee ◽  
Young-Seok Choi

The motor imagery-based brain-computer interface (BCI) using electroencephalography (EEG) has been receiving attention from neural engineering researchers and is being applied to various rehabilitation applications. However, the performance degradation caused by motor imagery EEG with very low single-to-noise ratio faces several application issues with the use of a BCI system. In this paper, we propose a novel motor imagery classification scheme based on the continuous wavelet transform and the convolutional neural network. Continuous wavelet transform with three mother wavelets is used to capture a highly informative EEG image by combining time-frequency and electrode location. A convolutional neural network is then designed to both classify motor imagery tasks and reduce computation complexity. The proposed method was validated using two public BCI datasets, BCI competition IV dataset 2b and BCI competition II dataset III. The proposed methods were found to achieve improved classification performance compared with the existing methods, thus showcasing the feasibility of motor imagery BCI.


2017 ◽  
Vol 1 (4) ◽  
pp. 271-277 ◽  
Author(s):  
Abdullah Caliskan ◽  
Mehmet Emin Yuksel

Abstract In this study, a deep neural network classifier is proposed for the classification of coronary artery disease medical data sets. The proposed classifier is tested on reference CAD data sets from the literature and also compared with popular representative classification methods regarding its classification performance. Experimental results show that the deep neural network classifier offers much better accuracy, sensitivity and specificity rates when compared with other methods. The proposed method presents itself as an easily accessible and cost-effective alternative to currently existing methods used for the diagnosis of CAD and it can be applied for easily checking whether a given subject under examination has at least one occluded coronary artery or not.


Author(s):  
Yong Du ◽  
Yangyang Xu ◽  
Taizhong Ye ◽  
Qiang Wen ◽  
Chufeng Xiao ◽  
...  

Color dimensionality reduction is believed as a non-invertible process, as re-colorization results in perceptually noticeable and unrecoverable distortion. In this article, we propose to convert a color image into a grayscale image that can fully recover its original colors, and more importantly, the encoded information is discriminative and sparse, which saves storage capacity. Particularly, we design an invertible deep neural network for color encoding and decoding purposes. This network learns to generate a residual image that encodes color information, and it is then combined with a base grayscale image for color recovering. In this way, the non-differentiable compression process (e.g., JPEG) of the base grayscale image can be integrated into the network in an end-to-end manner. To further reduce the size of the residual image, we present a specific layer to enhance Sparsity Enforcing Priors (SEP), thus leading to negligible storage space. The proposed method allows color embedding on a sparse residual image while keeping a high, 35dB PSNR on average. Extensive experiments demonstrate that the proposed method outperforms state-of-the-arts in terms of image quality and tolerability to compression.


Passive acoustic target classification is an exceptionally challenging problem due to the complex phenomena associated with the channel and the relatively low Signal to Noise Ratio (SNR) manifested by the pervasive ambient noise field. Inspired by the overwhelming success of Deep Neural Networks (DNNs) in many such hard problems, a carefully crafted network specifically for target recognition application has been employed in this work. Although deep neural networks can learn characteristic features or representations directly from the raw observations, domain specific intermediate representations can mitigate the computational requirements as well as the sample complexity required to achieve an acceptable error rate in prediction. As the sonar target records are essentially a time series, spectro-temporal representations can make the intricate relationship between time and spectral components more explicit. In a passive sonar target recognition scenario, since most of the defining spectral components reside at the lower part of the spectrum, a nonlinear dilated spectral scale having an emphasis on low frequencies is highly desirable. This can be easily achieved using a filterbank based time-frequency decomposition, which allows more filters to be positioned at the desired frequency ranges of interest. In this work, a rigorous analysis of the performance of time-frequency representations initialized at various frequency scales, is conducted independently as well as in combination. A convolutional neural network based spectro-temporal feature learner has been utilized as the initial layers, while a deep stack of Long Short Term Memories (LSTMs) with residual connections has been used for learning the intricate temporal relationships hidden in the intermediate representations. From the experimental results it can be observed that a linear scale spectrogram achieves an accuracy of 92.4% and 90.2% respectively for validation and test sets in the single feature configuration, whereas the gammatone spectrogram is capable of attaining an accuracy in the order of 96.7% and 96.1% respectively for the same. In a multifeatured setup however, the accuracy reaches up to 97.3% and 96.6% respectively, which reveals that a combination of properly initialized intermediate representations can improve the classification performance significantly.


Sign in / Sign up

Export Citation Format

Share Document