Combining visual and acoustic features for audio classification tasks

2017 ◽  
Vol 88 ◽  
pp. 49-56 ◽  
Author(s):  
L. Nanni ◽  
Y.M.G. Costa ◽  
D.R. Lucio ◽  
C.N. Silla ◽  
S. Brahnam
Author(s):  
Simone Scardapane ◽  
Danilo Comminiello ◽  
Michele Scarpiniti ◽  
Raffaele Parisi ◽  
Aurelio Uncini

Author(s):  
Ching-Hua Chuan

This paper presents an audio classification and retrieval system using wavelets for extracting low-level acoustic features. The author performed multiple-level decomposition using discrete wavelet transform to extract acoustic features from audio recordings at different scales and times. The extracted features are then translated into a compact vector representation. Gaussian mixture models with expectation maximization algorithm are used to build models for audio classes and individual audio examples. The system is evaluated using three audio classification tasks: speech/music, male/female speech, and music genre. They also show how wavelets and Gaussian mixture models are used for class-based audio retrieval in two approaches: indexing using only wavelets versus indexing by Gaussian components. By evaluating the system through 10-fold cross-validation, the author shows the promising capability of wavelets and Gaussian mixture models for audio classification and retrieval. They also compare how parameters including frame size, wavelet level, Gaussian components, and sampling size affect performance in Gaussian models.


2021 ◽  
Author(s):  
Khaled Koutini ◽  
Hamid Eghbal-zadeh ◽  
Florian Henkel ◽  
Jan Schlüter ◽  
Gerhard Widmer

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.


2004 ◽  
Author(s):  
Lyle E. Bourne ◽  
Alice F. Healy ◽  
James A. Kole ◽  
William D. Raymond

2012 ◽  
Author(s):  
Eitan Menahem ◽  
Lior Rokach ◽  
Yuval Elovici
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document