machine listening Latest Research Papers

Acoustic scene classification (ASC) is one of the most popular problems in the field of machine listening. The objective of this problem is to classify an audio clip into one of the predefined scenes using only the audio data. This problem has considerably progressed over the years in the different editions of DCASE. It usually has several subtasks that allow to tackle this problem with different approaches. The subtask presented in this report corresponds to a ASC problem that is constrained by the complexity of the model as well as having audio recorded from different devices, known as mismatch devices (real and simulated). The work presented in this report follows the research line carried out by the team in previous years. Specifically, a system based on two steps is proposed: a two-dimensional representation of the audio using the Gamamtone filter bank and a convolutional neural network using squeeze-excitation techniques. The presented system outperforms the baseline by about 17 percentage points.

Download Full-text

Over-Parameterization and Generalization in Audio Classification

10.31219/osf.io/umc9w ◽

2021 ◽

Author(s):

Khaled Koutini ◽

Hamid Eghbal-zadeh ◽

Florian Henkel ◽

Jan Schlüter ◽

Gerhard Widmer

Keyword(s):

Neural Networks ◽

Language Processing ◽

Recording Device ◽

Classification Models ◽

Audio Classification ◽

Scene Classification ◽

Machine Listening ◽

Substantial Problem ◽

Classification Tasks ◽

The Relationship

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

Download Full-text

On Open-Set Classification with L3-Net Embeddings for Machine Listening Applications

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287705 ◽

2021 ◽

Author(s):

Kevin Wilkinghoff

Keyword(s):

Open Set ◽

Machine Listening

Download Full-text

The folded space of machine listening

SoundEffects - An Interdisciplinary Journal of Sound and Sound Experience ◽

10.7146/se.v10i1.124205 ◽

2021 ◽

Vol 10 (1) ◽

pp. 173-189

Author(s):

Domenico Napolitano ◽

Renato Grieco

Keyword(s):

Dimensional Space ◽

Social Representations ◽

Three Dimensional ◽

Physical Space ◽

Sound Recognition ◽

Mathematical Representation ◽

Processing Level ◽

Recognition Algorithms ◽

Machine Listening ◽

Sound Event

The paper investigates new machine listening technologies through a comparison of phenomenological and empirical/media-archeological approaches. While phenomenology associates listening with subjectivity, empiricism takes into account the technical operations involved with listening processes in both human and non-human apparatuses. Based on this theoretical framework, the paper undertakes a media-archeological investigation of two algorithms employed in copyright detection: “acoustic fi ngerprinting” and “audio watermarking”. In the technical operations of sound recognition algorithms, empirical analysis suggests the coexistence of a multiplicity of spatialities: from the “sound event”, which occurs in three-dimensional physical space, to its mathematical representation in vector space, and to the one-dimensional informational space of data processing and machine-to-machine communication. Recalling Deleuze’s defi nition of “the fold”, we defi ne these coexistent spatial dimensions in techno-culturally mediated sound as “the folded space” of machine listening. We go on to argue that the issue of space in machine listening consists of the virtually infi nite variability of the sound event being subjected to automatic recognition. The diffi culty lies in conciliating the theoretically enduring information transmitted by sound with the contingent manifestation of sound affected by space. To make machines able to deal with the site-specifi city of sound, recognition algorithms need to reconstruct the three-dimensional space on a signal processing level, in a sort of reverse-engineering of the sound phenomenon that recalls the concept of “implicit sonicity” defi ned by Wolfgang Ernst. While the metaphors and social representations adopted to describe machine listening are often anthropomorphic – and the very term “listening”, when referring to numerical operations, can be seen as a metaphor in itself – we argue that both human listening and machine listening are co-defi ned in a socio-technical network, in which the listening space no longer coincides with the position of the listening subject, but is negotiated between human and nonhuman agencies.

Download Full-text

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding

Trends in Hearing ◽

10.1177/23312165211046135 ◽

2021 ◽

Vol 25 ◽

pp. 233121652110461

Author(s):

Björn Schuller ◽

Alice Baird ◽

Alexander Gebhard ◽

Shahin Amiriparian ◽

Gil Keren ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Real Life ◽

Feature Representation ◽

Agent Based ◽

Machine Listening ◽

Automated Learning ◽

Single Time Point ◽

Hearing Abilities ◽

Human Listener

Computer audition (i.e., intelligent audio) has made great strides in recent years; however, it is still far from achieving holistic hearing abilities, which more appropriately mimic human-like understanding. Within an audio scene, a human listener is quickly able to interpret layers of sound at a single time-point, with each layer varying in characteristics such as location, state, and trait. Currently, integrated machine listening approaches, on the other hand, will mainly recognise only single events. In this context, this contribution aims to provide key insights and approaches, which can be applied in computer audition to achieve the goal of a more holistic intelligent understanding system, as well as identifying challenges in reaching this goal. We firstly summarise the state-of-the-art in traditional signal-processing-based audio pre-processing and feature representation, as well as automated learning such as by deep neural networks. This concerns, in particular, audio interpretation, decomposition, understanding, as well as ontologisation. We then present an agent-based approach for integrating these concepts as a holistic audio understanding system. Based on this, concluding, avenues are given towards reaching the ambitious goal of ‘holistic human-parity’ machine listening abilities.

Download Full-text

Interactive Computation of Timbre Spaces for Sound Synthesis Control

10.25370/array.v20152528 ◽

2020 ◽

pp. 69-78

Author(s):

Stefano Fasciani

Keyword(s):

Ad Hoc ◽

Dimensional Space ◽

Mapping Method ◽

Machine Learning Techniques ◽

Interactive Approach ◽

Learning Machines ◽

Interactive Computation ◽

Machine Listening ◽

Learning Techniques ◽

Low Dimensional

Expressive sonic interaction with sound synthesizers requires the control of a continuous and high dimensional space. Further, the relationship between synthesis variables and timbre of the generated sound is typically complex or unknown to users. In previous works, we presented an unsupervised mapping method based on machine listening and machine learning techniques, which addresses these challenges by providing a low-dimensional and perceptually related timbre control space. The mapping maximizes the breadth of the explorable sonic space covered by the sound synthesizer, and minimizes possible timbre losses due to the low-dimensional control. The mapping is generated automatically by a system requiring little input from users. In this paper we present an improved method and an optimized implementation that drastically reduce the time for timbre analysis and mapping computation. Here we introduce the use of the extreme learning machines for the regression from control to timbre spaces, and an interactive approach for the analysis of the synthesizer sonic response, performed as users explore the parameters of the instrument. This work is implemented in a generic and open-source tool that enables the computation of ad hoc synthesis mappings through timbre spaces, facilitating and speeding up the workflow to get a customized sonic control system.

Download Full-text

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus

IEEE Journal of Biomedical and Health Informatics ◽

10.1109/jbhi.2019.2955281 ◽

2020 ◽

Vol 24 (7) ◽

pp. 2082-2092 ◽

Cited By ~ 2

Author(s):

Fengquan Dong ◽

Kun Qian ◽

Zhao Ren ◽

Alice Baird ◽

Xinjian Li ◽

...

Keyword(s):

Heart Sounds ◽

Status Monitoring ◽

Machine Listening

Download Full-text

Reliable Local Explanations for Machine Listening

2020 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn48605.2020.9207444 ◽

2020 ◽

Author(s):

Saumitra Mishra ◽

Emmanouil Benetos ◽

Bob L. T. Sturm ◽

Simon Dixon

Keyword(s):

Machine Listening

Download Full-text

Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Proceedings of the Northern Lights Deep Learning Workshop ◽

10.7557/18.5151 ◽

2020 ◽

Vol 1 ◽

pp. 6

Author(s):

Jörn Anemüller

Keyword(s):

Source Localization ◽

Sound Field ◽

Support Vector ◽

Specific Information ◽

Acoustic Source ◽

Linear Classifier ◽

Linear Approach ◽

Machine Listening ◽

Acoustic Source Localization ◽

Multi Speech

Multi-channel acoustic source localization evaluates direction-dependentinter-microphone differences in order to estimate the position of an acousticsource embedded in an interfering sound field. We here investigate a deep neuralnetwork (DNN) approach to source localization that improves on previous workwith learned, linear support-vector-machine localizers. DNNs with depthsbetween 4 and 15 layers were trained to predict azimuth direction of targetspeech in 72 directional bins of width 5 degree, embedded in an isotropic,multi-speech-source noise field. Several system parameters were varied, inparticular number of microphones in the bilateral hearing aid scenario wasset to 2, 4, and 6, respectively. Results show that DNNs provide a clear improvement inlocalization performance over a linear classifier reference system.Increasing the number of microphones from 2 to 4 results in a larger increase ofperformance for the DNNs than for the linear system. However, 6 microphonesprovide only a small additional gain. The DNN architectures perform betterwith 4 microphones than the linear approach does with 6 microphones, thusindicating that location-specific information in source-interference scenariosis encoded non-linearly in the sound field.

Download Full-text

Selective Hearing: A Machine Listening Perspective

2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) ◽

10.1109/mmsp.2019.8901720 ◽

2019 ◽

Author(s):

Estefania Cano ◽

Hanna Lukashevich

Keyword(s):

Machine Listening

Download Full-text

machine listening
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices using squeeze-excitation technique and low-complexity constraint

Over-Parameterization and Generalization in Audio Classification

On Open-Set Classification with L3-Net Embeddings for Machine Listening Applications

The folded space of machine listening

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding

Interactive Computation of Timbre Spaces for Sound Synthesis Control

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus

Reliable Local Explanations for Machine Listening

Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Selective Hearing: A Machine Listening Perspective

Export Citation Format

machine listeningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Task 1A DCASE 2021: Acoustic Scene Classification with mismatch-devices using squeeze-excitation technique and low-complexity constraint

Over-Parameterization and Generalization in Audio Classification

On Open-Set Classification with L3-Net Embeddings for Machine Listening Applications

The folded space of machine listening

New Avenues in Audio Intelligence: Towards Holistic Real-life Audio Understanding

Interactive Computation of Timbre Spaces for Sound Synthesis Control

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus

Reliable Local Explanations for Machine Listening

Machine listening in spatial acoustic scenes with deep networks in different microphone geometries

Selective Hearing: A Machine Listening Perspective

machine listening
Recently Published Documents