scholarly journals Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification

2021 ◽  
Author(s):  
Yuzhong Wu ◽  
Tan Lee

Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded. The log-mel feature and convolutional neural network (CNN) have recently become the most popular time-frequency (TF) feature representation and classifier in ASC. An audio signal recorded in a scene may include various sounds overlapping in time and frequency. The previous study suggests that separately considering the long-duration sounds and short-duration sounds in CNN may improve ASC accuracy. This study addresses the problem of the generalization ability of acoustic scene classifiers. In practice, acoustic scene signals' characteristics may be affected by various factors, such as the choice of recording devices and the change of recording locations. When an established ASC system predicts scene classes on audios recorded in unseen scenarios, its accuracy may drop significantly. The long-duration sounds not only contain domain-independent acoustic scene information, but also contain channel information determined by the recording conditions, which is prone to over-fitting. For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN. The RFL framework down-weights CNN learning specifically on long-duration sounds. The proposed method is to train an auxiliary classifier with only long-duration sound information as input. The auxiliary classifier is trained with an auxiliary loss function that assigns less learning weight to poorly classified examples than the standard cross-entropy loss. The experimental results show that the proposed RFL framework can obtain a more robust acoustic scene classifier towards unseen devices and cities.

Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3496
Author(s):  
Jiacan Xu ◽  
Hao Zheng ◽  
Jianhui Wang ◽  
Donglin Li ◽  
Xiaoke Fang

Recognition of motor imagery intention is one of the hot current research focuses of brain-computer interface (BCI) studies. It can help patients with physical dyskinesia to convey their movement intentions. In recent years, breakthroughs have been made in the research on recognition of motor imagery task using deep learning, but if the important features related to motor imagery are ignored, it may lead to a decline in the recognition performance of the algorithm. This paper proposes a new deep multi-view feature learning method for the classification task of motor imagery electroencephalogram (EEG) signals. In order to obtain more representative motor imagery features in EEG signals, we introduced a multi-view feature representation based on the characteristics of EEG signals and the differences between different features. Different feature extraction methods were used to respectively extract the time domain, frequency domain, time-frequency domain and spatial features of EEG signals, so as to made them cooperate and complement. Then, the deep restricted Boltzmann machine (RBM) network improved by t-distributed stochastic neighbor embedding(t-SNE) was adopted to learn the multi-view features of EEG signals, so that the algorithm removed the feature redundancy while took into account the global characteristics in the multi-view feature sequence, reduced the dimension of the multi-visual features and enhanced the recognizability of the features. Finally, support vector machine (SVM) was chosen to classify deep multi-view features. Applying our proposed method to the BCI competition IV 2a dataset we obtained excellent classification results. The results show that the deep multi-view feature learning method further improved the classification accuracy of motor imagery tasks.


2021 ◽  
Vol 13 (3) ◽  
pp. 433
Author(s):  
Junge Shen ◽  
Tong Zhang ◽  
Yichen Wang ◽  
Ruxin Wang ◽  
Qi Wang ◽  
...  

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.


Author(s):  
Yan Bai ◽  
Yihang Lou ◽  
Yongxing Dai ◽  
Jun Liu ◽  
Ziqian Chen ◽  
...  

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.


2021 ◽  
Vol 13 (12) ◽  
pp. 168781402110670
Author(s):  
Yanxiang Chen ◽  
Zuxing Zhao ◽  
Euiyoul Kim ◽  
Haiyang Liu ◽  
Juan Xu ◽  
...  

As wheels are important components of train operation, diagnosing and predicting wheel faults are essential to ensure the reliability of rail transit. Currently, the existing studies always separately deal with two main types of wheel faults, namely wheel radius difference and wheel flat, even though they are both reflected by wheel radius changes. Moreover, traditional diagnostic methods, such as mechanical methods or a combination of data analysis methods, have limited abilities to efficiently extract data features. Deep learning models have become useful tools to automatically learn features from raw vibration signals. However, research on improving the feature-learning capabilities of models under noise interference to yield higher wheel diagnostic accuracies has not yet been conducted. In this paper, a unified training framework with the same model architecture and loss function is established for two homologous wheel faults. After selecting deep residual networks (ResNets) as the backbone network to build the model, we add the squeeze and excitation (SE) module based on a multichannel attention mechanism to the backbone network to learn the global relationships among feature channels. Then the influence of noise interference features is reduced while the extraction of useful information features is enhanced, leading to the improved feature-learning ability of ResNet. To further obtain effective feature representation using the model, we introduce supervised contrastive loss (SCL) on the basis of ResNet + SE to enlarge the feature distances of different fault classes through a comparison between positive and negative examples under label supervision to obtain a better class differentiation and higher diagnostic accuracy. We also complete a regression task to predict the fault degrees of wheel radius difference and wheel flat without changing the network architecture. The extensive experimental results show that the proposed model has a high accuracy in diagnosing and predicting two types of wheel faults.


2021 ◽  
Author(s):  
Shahrzad Esmaili

This research focuses on the application of joint time-frequency (TF) analysis for watermarking and classifying different audio signals. Time frequency analysis which originated in the 1930s has often been used to model the non-stationary behaviour of speech and audio signals. By taking into consideration the human auditory system which has many non-linear effects and its masking properties, we can extract efficient features from the TF domain to watermark or classify signals. This novel audio watermarking scheme is based on spread spectrum techniques and uses content-based analysis to detect the instananeous mean frequency (IMF) of the input signal. The watermark is embedded in this perceptually significant region such that it will resist attacks. Audio watermarking offers a solution to data privacy and helps to protect the rights of the artists and copyright holders. Using the IMF, we aim to keep the watermark imperceptible while maximizing its robustness. In this case, 25 bits are embedded and recovered witin a 5 s sample of an audio signal. This scheme has shown to be robust against various signal processing attacks including filtering, MP3 compression, additive moise and resampling with a bit error rate in the range of 0-13%. In addition content-based classification is performed using TF analysis to classify sounds into 6 music groups consisting of rock, classical, folk, jazz and pop. The features that are extracted include entropy, centroid, centroid ratio, bandwidth, silence ratio, energy ratio, frequency location of minimum and maximum energy. Using a database of 143 signals, a set of 10 time-frequncy features are extracted and an accuracy of classification of around 93.0% using regular linear discriminant analysis or 92.3% using leave one out method is achieved.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-8
Author(s):  
Ming-xin Jiang ◽  
Xian-xian Luo ◽  
Tao Hai ◽  
Hai-yan Wang ◽  
Song Yang ◽  
...  

Visual object tracking is a fundamental component in many computer vision applications. Extracting robust features of object is one of the most important steps in tracking. As trackers, only formulated on RGB data, are usually affected by occlusions, appearance, or illumination variations, we propose a novel RGB-D tracking method based on genetic feature learning in this paper. Our approach addresses feature learning as an optimization problem. As owning the advantage of parallel computing, genetic algorithm (GA) has fast speed of convergence and excellent global optimization performance. At the same time, unlike handcrafted feature and deep learning methods, GA can be employed to solve the problem of feature representation without prior knowledge, and it has no use for a large number of parameters to be learned. The candidate solution in RGB or depth modality is represented as an encoding of an image in GA, and genetic feature is learned through population initialization, fitness evaluation, selection, crossover, and mutation. The proposed RGB-D tracker is evaluated on popular benchmark dataset, and experimental results indicate that our method achieves higher accuracy and faster tracking speed.


Sign in / Sign up

Export Citation Format

Share Document