FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection

Mapping Intimacies ◽

10.1007/978-3-030-86362-3_36 ◽

2021 ◽

pp. 439-450

Author(s):

Hugo Carneiro ◽

Cornelius Weber ◽

Stefan Wermter

Keyword(s):

Speaker Detection

Download Full-text

Simulating Realistically-Spatialised Simultaneous Speech Using Video-Driven Speaker Detection and the CHiME-5 Dataset

10.21437/interspeech.2020-2807 ◽

2020 ◽

Author(s):

Jack Deadman ◽

Jon Barker

Keyword(s):

Speaker Detection

Download Full-text

Bio-Inspired Modality Fusion for Active Speaker Detection

Applied Sciences ◽

10.3390/app11083397 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3397

Author(s):

Gustavo Assunção ◽

Nuno Gonçalves ◽

Paulo Menezes

Keyword(s):

Superior Colliculus ◽

Visual Information ◽

Human Beings ◽

Validation Process ◽

Detection Approach ◽

Speaker Detection ◽

The One ◽

The Brain ◽

Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.

Download Full-text

Speaker detection using multi-speaker audio files for both enrollment and test

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). ◽

10.1109/icassp.2003.1202298 ◽

2003 ◽

Author(s):

J.-F. Bonastre ◽

S. Meignier ◽

T. Merlin

Keyword(s):

Speaker Detection ◽

Download Full-text

Local Normalization and Delayed Decision Making in Speaker Detection and Tracking

Digital Signal Processing ◽

10.1006/dspr.1999.0357 ◽

2000 ◽

Vol 10 (1-3) ◽

pp. 113-132 ◽

Author(s):

Johan Koolwaaij ◽

Lou Boves

Keyword(s):

Decision Making ◽

Detection And Tracking ◽

Speaker Detection

Download Full-text

Combining cohort and UBM models in open set speaker detection

Multimedia Tools and Applications ◽

10.1007/s11042-009-0381-x ◽

2009 ◽

Vol 48 (1) ◽

pp. 141-159 ◽

Author(s):

Anthony Brew ◽

Pádraig Cunningham

Keyword(s):

Open Set ◽

Speaker Detection

Download Full-text

Online Clustering of Narrowband Position Estimates with Application to Multi-speaker Detection and Tracking

Lecture Notes in Electrical Engineering - Advances in Machine Learning and Signal Processing ◽

10.1007/978-3-319-32213-1_6 ◽

2016 ◽

pp. 59-69 ◽

Author(s):

Maja Taseska ◽

Gleni Lamani ◽

Emanuël A. P. Habets

Keyword(s):

Online Clustering ◽

Detection And Tracking ◽

Speaker Detection

Download Full-text

Speaker Detection

Encyclopedia of Biometrics ◽

10.1007/978-0-387-73003-5_662 ◽

2009 ◽

pp. 1253-1253

Keyword(s):

Speaker Detection

Download Full-text

Latent Space Representation for Multi-Target Speaker Detection and Identification with a Sparse Dataset Using Triplet Neural Networks

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) ◽

10.1109/asru46091.2019.9003922 ◽

2019 ◽

Author(s):

Kin Wai Cheuk ◽

B T Balamurali ◽

Gemma Roig ◽

Dorien Herremans

Keyword(s):

Neural Networks ◽

Space Representation ◽

Detection And Identification ◽

Latent Space ◽

Speaker Detection ◽

Download Full-text

Conference system with automatic speaker detection and speaker unit

The Journal of the Acoustical Society of America ◽

10.1121/1.413305 ◽

1995 ◽

Vol 98 (5) ◽

pp. 2400-2400

Author(s):

Cornelis P. Janse ◽

Johannes M. Meijer

Keyword(s):

Speaker Detection ◽

Conference System

Download Full-text

Supplementary Material: AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) ◽

10.1109/iccvw.2019.00460 ◽

2019 ◽

Author(s):

Joseph Roth ◽

Sourish Chaudhuri ◽

Ondrej Klejch ◽

Radhika Marvin ◽

Andrew Gallagher ◽

...

Keyword(s):

Supplementary Material ◽

Speaker Detection

Download Full-text