Towards a Speaker Diarization System for the CHiME 2020 Dinner Party Transcription

2020 ◽  
Author(s):  
Christoph Boeddeker ◽  
Tobias Cord-Landwehr ◽  
Jens Heitkaemper ◽  
Cătălin Zorilă ◽  
Reinhold Haeb-Umbach
Author(s):  
Beatriz Martínez-González ◽  
José M. Pardo ◽  
José A. Vallejo-Pinto ◽  
Rubén San-Segundo ◽  
Javier Ferreiros

AbstractThere has been little work in the literature on the speaker diarization of meetings with multiple distance microphones since the publications in 2012 related to the last National Institute of Standards (NIST) Rich Transcription Evaluation Campaign in 2009 (RT09). Lately, the Second DIHARD Challenge Evaluation has also covered diarization at dinner party meetings that include multiple distant microphones. Dinner party meetings are somehow harder than office meetings because their participants can move freely around the room. In this paper, we studied some of the algorithms on speaker diarization for meetings with multiple distant microphones for the NIST Rich Transcription Evaluation Campaign in 2007 (RT07) and RT09 and provide definite and clear improvements. On the one hand, little or no care has been taken to the problem of penalizing or favoring transitions between speakers other than proposing a minimum duration of a speaker turn or calculating the speakers’ probabilities using Variational Bayes (VB). We have studied this issue and determined that a transition penalty term is needed that should be independent both of the number of active speakers and the minimum duration of speaker turns. On the other hand, the determination of a method to automatically select the right number of parameters is crucial in developing good models for speakers. Previous studies have proposed the dynamic selection of the number of parameters based on the duration of the speaker’s speech with a mixed performance when tested at one distant microphone meetings or multiple distant microphones meetings. In this paper, we propose a new method that takes into account both the duration of speaker’s speech to determine a minimum number of parameters, and the question of overfitting issue to determine a maximum number of them, also taking into account the computation time in order to reduce it.We have carried out experiments to support our findings, and we have been able to improve our baseline speaker error rate with multiple distant-microphone meetings. Both methods achieve improved performance over the baseline. The first method obtains a 21.6% decrease in relative speaker error for the development set and a 4.6% decrease in relative speaker error for the test set (RT09). The second method obtains a 46.47% decrease in relative speaker error for the development set and a 17.54% decrease in relative speaker error for the test set. Both methods complement each other, and when they are applied in combination, we obtain a 47.2% decrease in relative speaker error for the development set and a 22.02% decrease in relative speaker error for the test set.The performance obtained with our proposal is outstanding in some subsets of the development test such as the NIST RT07 and among the best for RT09 using our proposed simple modifications. Furthermore, with our algorithm we obtain gains in computation time without jeopardizing performance. Results with a different publicly available database, augmented multiparty interaction (AMI) obtains a 28.44% decrease in relative speaker error confirming the validity of our methods. Preliminary experiments with a single stream (mfcc) endorse the validity of our findings. Comparisons with an x-vector system deliver superior performance of our system on unseen test data.


2020 ◽  
Author(s):  
Ivan Medennikov ◽  
Maxim Korenevsky ◽  
Tatiana Prisyach ◽  
Yuri Khokhlov ◽  
Mariya Korenevskaya ◽  
...  

Author(s):  
Michael Harris

What do pure mathematicians do, and why do they do it? Looking beyond the conventional answers, this book offers an eclectic panorama of the lives and values and hopes and fears of mathematicians in the twenty-first century, assembling material from a startlingly diverse assortment of scholarly, journalistic, and pop culture sources. Drawing on the author's personal experiences as well as the thoughts and opinions of mathematicians from Archimedes and Omar Khayyám to such contemporary giants as Alexander Grothendieck and Robert Langlands, the book reveals the charisma and romance of mathematics as well as its darker side. In this portrait of mathematics as a community united around a set of common intellectual, ethical, and existential challenges, the book touches on a wide variety of questions, such as: Are mathematicians to blame for the 2008 financial crisis? How can we talk about the ideas we were born too soon to understand? And how should you react if you are asked to explain number theory at a dinner party? The book takes readers on an unapologetic guided tour of the mathematical life, from the philosophy and sociology of mathematics to its reflections in film and popular music, with detours through the mathematical and mystical traditions of Russia, India, medieval Islam, the Bronx, and beyond.


Author(s):  
Edward L. Campbell ◽  
Gabriel Hernandez ◽  
José R. Calvo de Lara
Keyword(s):  

Author(s):  
Alicia Lozano-Diez ◽  
Beltran Labrador ◽  
Diego de Benito ◽  
Pablo Ramirez ◽  
Doroteo T. Toledano
Keyword(s):  

Author(s):  
Xianhong Chen ◽  
Liang He ◽  
Can Xu ◽  
Yi Liu ◽  
Tianyu Liang ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document