Animal Sound Classification Using Dissimilarity Spaces

The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using 4 different backbones, with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. Different clustering methods reduce the spectrograms in the dataset to a set of centroids that generate (in both a supervised and unsupervised fashion) the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, additional experiments process the spectrograms using the Heterogeneous Auto-Similarities of Characteristics. Once the similarity spaces are computed, a vector space representation of each pattern is generated that is then trained on a Support Vector Machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best stand-alone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset. The MATLAB code used in this study is available at https://github.com/LorisNanni.

Download Full-text

Animal Sound Classification Using Dissimilarity Spaces

Applied Sciences ◽

10.3390/app10238578 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8578

Author(s):

Loris Nanni ◽

Sheryl Brahnam ◽

Alessandra Lumini ◽

Gianluca Maguolo

Keyword(s):

Ad Hoc ◽

Space Representation ◽

Support Vector ◽

Clustering Methods ◽

Audio Classification ◽

Environmental Sound ◽

Clustering Techniques ◽

Sound Classification ◽

Animal Vocalization ◽

Siamese Networks

The classifier system proposed in this work combines the dissimilarity spaces produced by a set of Siamese neural networks (SNNs) designed using four different backbones with different clustering techniques for training SVMs for automated animal audio classification. The system is evaluated on two animal audio datasets: one for cat and another for bird vocalizations. The proposed approach uses clustering methods to determine a set of centroids (in both a supervised and unsupervised fashion) from the spectrograms in the dataset. Such centroids are exploited to generate the dissimilarity space through the Siamese networks. In addition to feeding the SNNs with spectrograms, experiments process the spectrograms using the heterogeneous auto-similarities of characteristics. Once the similarity spaces are computed, each pattern is “projected” into the space to obtain a vector space representation; this descriptor is then coupled to a support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Results demonstrate that the proposed approach performs competitively (without ad-hoc optimization of the clustering methods) on both animal vocalization datasets. To further demonstrate the power of the proposed system, the best standalone approach is also evaluated on the challenging Dataset for Environmental Sound Classification (ESC50) dataset.

Download Full-text

Spectrogram Classification Using Dissimilarity Space

Applied Sciences ◽

10.3390/app10124176 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4176 ◽

Cited By ~ 1

Author(s):

Loris Nanni ◽

Andrea Rigo ◽

Alessandra Lumini ◽

Sheryl Brahnam

Keyword(s):

Ad Hoc ◽

Classification Problem ◽

Space Representation ◽

Support Vector ◽

Clustering Methods ◽

Audio Classification ◽

Classification Problems ◽

Clustering Techniques ◽

Vector Space Representation ◽

Better Than

In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs.

Download Full-text

Experiments of Image Classification Using Dissimilarity Spaces Built with Siamese Networks

Sensors ◽

10.3390/s21051573 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1573

Author(s):

Loris Nanni ◽

Giovanni Minchio ◽

Sheryl Brahnam ◽

Gianluca Maguolo ◽

Alessandra Lumini

Keyword(s):

Vector Space ◽

Image Classification ◽

Ad Hoc ◽

Feature Space ◽

Medical Data ◽

Training Data ◽

Data Sets ◽

Large Set ◽

Clustering Methods ◽

Siamese Networks

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

Download Full-text

A Multi-Resolution Approach for Audio Classification

10.20944/preprints201804.0258.v1 ◽

2018 ◽

Author(s):

Sergey Voronin ◽

Alexander Grushin

Keyword(s):

Open Data ◽

Audio Classification ◽

Class Membership ◽

Coarse Scale ◽

Environmental Sound ◽

Data Set ◽

Membership Probability ◽

Sound Classification ◽

Multi Class Classification ◽

Initial Results

We describe a multi-resolution approach for audio classification and illustrate its application to the open data set for environmental sound classification. The proposed approach utilizes a multi-resolution based ensemble consisting of targeted feature extraction of approximation (coarse scale) and detail (fine scale) portions of the signal under the action of multiple transforms. This is paired with an automatic machine learning engine for algorithm and parameter selection and the LSTM algorithm, capable of mapping several sequences of features to a predicted class membership probability distribution. A conditional probability approach is outlined for combining the predictions of different classifiers, trained over distinct scale feature sets. Initial results show an improvement in multi-class classification accuracy.

Download Full-text

A Multi-Resolution Approach for Audio Classification

10.20944/preprints201804.0258.v2 ◽

2018 ◽

Author(s):

Sergey Voronin ◽

Alexander Grushin

Keyword(s):

Open Data ◽

Audio Classification ◽

Class Membership ◽

Coarse Scale ◽

Environmental Sound ◽

Data Set ◽

Membership Probability ◽

Sound Classification ◽

Multi Class Classification ◽

Initial Results

We describe a multi-resolution approach for audio classification and illustrate its application to the open data set for environmental sound classification. The proposed approach utilizes a multi-resolution based ensemble consisting of targeted feature extraction of approximation (coarse scale) and detail (fine scale) portions of the signal under the action of multiple transforms. This is paired with an automatic machine learning engine for algorithm and parameter selection and the LSTM algorithm, capable of mapping several sequences of features to a predicted class membership probability distribution. Initial results show an improvement in multi-class classification accuracy.

Download Full-text

Eco-Environmental Sound Classification Based on Matching Pursuit and Support Vector Machine

2010 2nd International Conference on Information Engineering and Computer Science ◽

10.1109/iciecs.2010.5677677 ◽

2010 ◽

Cited By ~ 4

Author(s):

Yong Li ◽

Ying Li

Keyword(s):

Support Vector Machine ◽

Matching Pursuit ◽

Support Vector ◽

Environmental Sound ◽

Sound Classification

Download Full-text

An Ensemble of Convolutional Neural Networks for Audio Classification

Applied Sciences ◽

10.3390/app11135796 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5796

Author(s):

Loris Nanni ◽

Gianluca Maguolo ◽

Sheryl Brahnam ◽

Michelangelo Paci

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Extensive Study ◽

Urban Environments ◽

Sound Recognition ◽

Audio Classification ◽

Environmental Sound ◽

Ensembles Of Classifiers ◽

Sound Classification

Research in sound classification and recognition is rapidly advancing in the field of pattern recognition. One important area in this field is environmental sound recognition, whether it concerns the identification of endangered species in different habitats or the type of interfering noise in urban environments. Since environmental audio datasets are often limited in size, a robust model able to perform well across different datasets is of strong research interest. In this paper, ensembles of classifiers are combined that exploit six data augmentation techniques and four signal representations for retraining five pre-trained convolutional neural networks (CNNs); these ensembles are tested on three freely available environmental audio benchmark datasets: (i) bird calls, (ii) cat sounds, and (iii) the Environmental Sound Classification (ESC-50) database for identifying sources of noise in environments. To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification. The best-performing ensembles are compared and shown to either outperform or perform comparatively to the best methods reported in the literature on these datasets, including on the challenging ESC-50 dataset. We obtained a 97% accuracy on the bird dataset, 90.51% on the cat dataset, and 88.65% on ESC-50 using different approaches. In addition, the same ensemble model trained on the three datasets managed to reach the same results on the bird and cat datasets while losing only 0.1% on ESC-50. Thus, we have managed to create an off-the-shelf ensemble that can be trained on different datasets and reach performances competitive with the state of the art.

Download Full-text