scholarly journals Efficient Classification of Environmental Sounds through Multiple Features Aggregation and Data Enhancement Techniques for Spectrogram Images

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1822
Author(s):  
Zohaib Mushtaq ◽  
Shun-Feng Su

Over the past few years, the study of environmental sound classification (ESC) has become very popular due to the intricate nature of environmental sounds. This paper reports our study on employing various acoustic features aggregation and data enhancement approaches for the effective classification of environmental sounds. The proposed data augmentation techniques are mixtures of the reinforcement, aggregation, and combination of distinct acoustics features. These features are known as spectrogram image features (SIFs) and retrieved by different audio feature extraction techniques. All audio features used in this manuscript are categorized into two groups: one with general features and the other with Mel filter bank-based acoustic features. Two novel and innovative features based on the logarithmic scale of the Mel spectrogram (Mel), Log (Log-Mel) and Log (Log (Log-Mel)) denoted as L2M and L3M are introduced in this paper. In our study, three prevailing ESC benchmark datasets, ESC-10, ESC-50, and Urbansound8k (Us8k) are used. Most of the audio clips in these datasets are not fully acquired with sound and include silence parts. Therefore, silence trimming is implemented as one of the pre-processing techniques. The training is conducted by using the transfer learning model DenseNet-161, which is further fine-tuned with individual optimal learning rates based on the discriminative learning technique. The proposed methodologies attain state-of-the-art outcomes for all used ESC datasets, i.e., 99.22% for ESC-10, 98.52% for ESC-50, and 97.98% for Us8k. This work also considers real-time audio data to evaluate the performance and efficiency of the proposed techniques. The implemented approaches also have competitive results on real-time audio data.

Author(s):  
Jinfang Zeng ◽  
Youming Li ◽  
Yu Zhang ◽  
Da Chen

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.


Author(s):  
Sridharan Naveen Venkatesh ◽  
Vaithiyanathan Sugumaran

Fault diagnosis plays a significant role in enhancing the useful lifetime, power output, and reliability of photovoltaic modules (PVM). Visual faults such as burn marks, delamination, discoloration, glass breakage, and snail trails make detection of faults difficult under harsh environmental conditions. Various researchers have made several attempts to identify visual faults in a PVM. However, much of the previous studies were centered on the identification and analysis of limited number of faults. This article presents the use of a deep convolutional neural network (CNN) to extract image features and perform an effective classification of faults by machine learning (ML) algorithms. In contrast to the present-day work, five different fault conditions were considered in the study. The proposed solution consists of three phases, to effectively analyze various PVM defects. First, the module images are acquired using unmanned aerial vehicles (UAVs) and data augmentation is performed to generate a uniform dataset. Afterward, a pre-trained deep CNN is adopted for image feature extraction. Finally, the extracted image features are classified with the help of various ML classifiers. The final results show the effectiveness of pre-trained deep CNN and accurate performance of ML classifiers. The best-in-class ML classifier for multiple fault classification is suggested based on the performance comparison.


2019 ◽  
Vol 22 (64) ◽  
pp. 14-35
Author(s):  
José Antonio Alves Menezes ◽  
Giordano Cabral ◽  
Bruno Gomes ◽  
Paulo Pereira

To choice audio features has been a very interesting theme for audio classification experts. They have seen that this process is probably the most important effort to solve the classification problem. In this sense, there are techniques of Feature Learning for generate new features more suitable for classification model than conventional features. However, these techniques generally do not depend on knowledge domain and they can apply in various types of raw data. However, less agnostic approaches learn a type of knowledge restricted to the area studded. The audio data requires a specific knowledge type. There are many techniques that seek to improve the performance of the new generation of acoustic features, among which stands the technique that use evolutionary algorithms to explore analytical space of function. However, the efforts made leave opportunities for improvement. The purpose of this work is to propose and evaluate a multi-objective alternative to the exploitation of analytical audio features. In addition, experiments were arranged to be validated the method, with the help a computational prototype that implemented the proposed solution. After it was found the effectiveness of the model and ensuring that there is still opportunity for improvement in the chosen segment.


2021 ◽  
Vol 52 (4) ◽  
Author(s):  
José F. Reyes ◽  
Elías Contreras ◽  
Christian Correa ◽  
Pedro Melin

An image analysis algorithm for the classification of cherries in real time by processing their digitalized colour images was developed, and tested. A set of five digitalized images of colour pattern, corresponding to five colour classes defined for commercial cherries, was characterized. The algorithm performs the segmentation of the cheery image by rejecting the pixels of the background and keeping the image features corresponding to the coloured area of the fruit. A histogram analysis was carried out for the RGB and HSV colour spaces, where the Red and Hue components showed differences between each of the specified colour patterns of the exporting reference system. This information led to the development of a hybrid Bayesian classification algorithm based on the components R and H. Its accuracy was tested with a set of cherry samples within the colour range of interest. The algorithm was implemented by means of a real time C++ code in Microsoft Visual Studio environment. When testing, the algorithm showed a 100% effectiveness in classifying a sample set of cherries into the five standardized cherry classes. The components of the hardware-software system for implementing the methodology are low cost, thus ensuring an affordable commercial deployment.


Author(s):  
Behzad Javaheri

herein, we have compared the performance of SVM and MLP in emotion recognition using speech and song channels of the RAVDESS dataset. We have undertaken a journey to extract various audio features, identify optimal scaling strategy and hyperparameter for our models. To increase sample size, we have performed audio data augmentation and addressed data imbalance using SMOTE. Our data indicate that optimised SVM outperforms MLP with an accuracy of 82 compared to 75%. Following data augmentation, the performance of both algorithms was identical at ~79%, however, overfitting was evident for the SVM. Our final exploration indicated that the performance of both SVM and MLP were similar in which both resulted in lower accuracy for the speech channel compared to the song channel. Our findings suggest that both SVM and MLP are powerful classifiers for emotion recognition in a vocal-dependent manner.


2002 ◽  
Vol 7 (1) ◽  
pp. 31-42
Author(s):  
J. Šaltytė ◽  
K. Dučinskas

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yong He ◽  
Hong Zeng ◽  
Yangyang Fan ◽  
Shuaisheng Ji ◽  
Jianjian Wu

In this paper, we proposed an approach to detect oilseed rape pests based on deep learning, which improves the mean average precision (mAP) to 77.14%; the result increased by 9.7% with the original model. We adopt this model to mobile platform to let every farmer able to use this program, which will diagnose pests in real time and provide suggestions on pest controlling. We designed an oilseed rape pest imaging database with 12 typical oilseed rape pests and compared the performance of five models, SSD w/Inception is chosen as the optimal model. Moreover, for the purpose of the high mAP, we have used data augmentation (DA) and added a dropout layer. The experiments are performed on the Android application we developed, and the result shows that our approach surpasses the original model obviously and is helpful for integrated pest management. This application has improved environmental adaptability, response speed, and accuracy by contrast with the past works and has the advantage of low cost and simple operation, which are suitable for the pest monitoring mission of drones and Internet of Things (IoT).


2021 ◽  
pp. 1-11
Author(s):  
Yaning Liu ◽  
Lin Han ◽  
Hexiang Wang ◽  
Bo Yin

Papillary thyroid carcinoma (PTC) is a common carcinoma in thyroid. As many benign thyroid nodules have the papillary structure which could easily be confused with PTC in morphology. Thus, pathologists have to take a lot of time on differential diagnosis of PTC besides personal diagnostic experience and there is no doubt that it is subjective and difficult to obtain consistency among observers. To address this issue, we applied deep learning to the differential diagnosis of PTC and proposed a histological image classification method for PTC based on the Inception Residual convolutional neural network (IRCNN) and support vector machine (SVM). First, in order to expand the dataset and solve the problem of histological image color inconsistency, a pre-processing module was constructed that included color transfer and mirror transform. Then, to alleviate overfitting of the deep learning model, we optimized the convolution neural network by combining Inception Network and Residual Network to extract image features. Finally, the SVM was trained via image features extracted by IRCNN to perform the classification task. Experimental results show effectiveness of the proposed method in the classification of PTC histological images.


1989 ◽  
Vol 32 (7) ◽  
pp. 862-871 ◽  
Author(s):  
Clement Yu ◽  
Wei Sun ◽  
Dina Bitton ◽  
Qi Yang ◽  
Richard Bruno ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document