speaker information Latest Research Papers

The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, depicting the broadcast domain as a collection of subdomains with particular characteristics. Taking these two factors into account, we also propose alternative approximations robust against domain mismatch. These approximations include a semisupervised alternative as well as a totally unsupervised new hybrid solution fusing diarization and speaker assignment. Thanks to these two approximations, our performance is boosted around a relative 50%. The analysis has been carried out using the corpus for the Albayzín 2020 challenge, a diarization and speaker attribution evaluation working with broadcast data. These data, provided by Radio Televisión Española (RTVE), the Spanish public Radio and TV Corporation, include multiple shows and genres to analyze the impact of new speech technologies in real-world scenarios.

Download Full-text

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

10.21437/interspeech.2021-1182 ◽

2021 ◽

Author(s):

Benjamin van Niekerk ◽

Leanne Nortje ◽

Matthew Baas ◽

Herman Kamper

Keyword(s):

Speech Processing ◽

Speaker Information

Download Full-text

S3: An AI-Enabled User Continuous Authentication for Smartphones Based on Sensors, Statistics and Speaker Information

Sensors ◽

10.3390/s21113765 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3765

Author(s):

Juan Manuel Espín López ◽

Alberto Huertas Celdrán ◽

Javier G. Marín-Blázquez ◽

Francisco Esquembre ◽

Gregorio Martínez Pérez

Keyword(s):

Artificial Intelligence ◽

Additional Contribution ◽

User Profiles ◽

Continuous Authentication ◽

Promising Solution ◽

Authentication System ◽

Speaker Information ◽

Current Systems

Continuous authentication systems have been proposed as a promising solution to authenticate users in smartphones in a non-intrusive way. However, current systems have important weaknesses related to the amount of data or time needed to build precise user profiles, together with high rates of false alerts. Voice is a powerful dimension for identifying subjects but its suitability and importance have not been deeply analyzed regarding its inclusion in continuous authentication systems. This work presents the S3 platform, an artificial intelligence-enabled continuous authentication system that combines data from sensors, applications statistics and voice to authenticate users in smartphones. Experiments have tested the relevance of each kind of data, explored different strategies to combine them, and determined how many days of training are needed to obtain good enough profiles. Results showed that voice is much more relevant than sensors and applications statistics when building a precise authenticating system, and the combination of individual models was the best strategy. Finally, the S3 platform reached a good performance with only five days of use available for training the users’ profiles. As an additional contribution, a dataset with 21 volunteers interacting freely with their smartphones for more than sixty days has been created and made available to the community.

Download Full-text

Keynote Speaker: Information Technology against COVID-19

2020 1st. Information Technology To Enhance e-learning and Other Application (IT-ELA ◽

10.1109/it-ela50150.2020.9253115 ◽

2020 ◽

Author(s):

Ban N. Dhannoon

Keyword(s):

Information Technology ◽

Keynote Speaker ◽

Speaker Information

Download Full-text

The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues

The Journal of the Acoustical Society of America ◽

10.1121/10.0000674 ◽

2020 ◽

Vol 147 (2) ◽

pp. 949-960 ◽

Cited By ~ 2

Author(s):

Laura Smorenburg ◽

Willemijn Heeren

Keyword(s):

Speaker Information

Download Full-text

Distinct Neural Networks Relate to Common and Speaker-Specific Language Priors

Cerebral Cortex Communications ◽

10.1093/texcom/tgaa021 ◽

2020 ◽

Vol 1 (1) ◽

Author(s):

Leon O H Kroczek ◽

Thomas C Gunter

Keyword(s):

Neural Networks ◽

Language Use ◽

Brain Regions ◽

Syntactic Complexity ◽

Top Down ◽

Specific Language ◽

Linguistic Rules ◽

Speaker Information ◽

Specific Individual ◽

General Linguistic

Abstract Effective natural communication requires listeners to incorporate not only very general linguistic principles which evolved during a lifetime but also other information like the specific individual language use of a particular interlocutor. Traditionally, research has focused on the general linguistic rules, and brain science has shown a left hemispheric fronto-temporal brain network related to this processing. The present fMRI research explores speaker-specific individual language use because it is unknown whether this processing is supported by similar or distinct neural structures. Twenty-eight participants listened to sentences of persons who used more easy or difficult language. This was done by manipulating the proportion of easy SOV vs. complex OSV sentences for each speaker. Furthermore, ambiguous probe sentences were included to test top-down influences of speaker information in the absence of syntactic structure information. We observed distinct neural processing for syntactic complexity and speaker-specific language use. Syntactic complexity correlated with left frontal and posterior temporal regions. Speaker-specific processing correlated with bilateral (right-dominant) fronto-parietal brain regions. Finally, the top-down influence of speaker information was found in frontal and striatal brain regions, suggesting a mechanism for controlled syntactic processing. These findings show distinct neural networks related to general language principles as well as speaker-specific individual language use.

Download Full-text

Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

2018 IEEE 4th International Conference on Computer and Communications (ICCC) ◽

10.1109/compcomm.2018.8780705 ◽

2018 ◽

Author(s):

Haixia Chen ◽

Longting Xu ◽

Zhen Yang

Keyword(s):

Neural Network ◽

Speaker Information ◽

Information Recognition

Download Full-text

Factor Analysis of Utterances in Japanese Fiction-Writing Based on BCCWJ Speaker Information Corpus

Advances in Human-Computer Interaction ◽

10.1155/2018/5056268 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9

Author(s):

Hajime Murai

Keyword(s):

Speaker Identification ◽

Data Sets ◽

Chi Square ◽

Fiction Writing ◽

Auxiliary Verbs ◽

Male Characters ◽

Particle Verbs ◽

Chi Square Test ◽

Speaker Information ◽

And Gender

To analyse the characteristics of utterances in Japanese novels, several attributes (e.g., the speaker, listener, relationship between the speaker and listener, and gender of the speaker) were added to a randomly extracted Japanese novel corpus. A total of 887 data sets, with 5632 annotated utterances, were prepared. Based on the attribute annotated utterance corpus, the characteristics of utterance styles were extracted quantitatively. A chi-square test was used for particles and auxiliary verbs to extract utterance characteristics which reflected the genders of and relationships between the speakers and listeners. Results revealed that the use of imperative words was higher among male characters than their female counterparts, who used more particle verbs, and that auxiliaries of politeness were used more frequently for ‘coworkers’ and ‘superior authorities’. In addition, utterances varied between close and intimate relationships between the speaker and listener. Moreover, repeated factor analyses for 7576 data sets in BCCWJ speaker information corpus revealed ten typical utterance styles (neutral, frank, dialect, polite, feminine, crude, aged, interrogative, approval, and dandy). The factor scores indicated relationships between various utterance styles and fundamental attributes of speakers. Thus, results of this study would be utilisable in speaker identification tasks, automatic speech generation tasks, and scientific interpretation of stories and characters.

Download Full-text

Speaker diarization and detection system using a priori speaker information

2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) ◽

10.1109/icnlsp.2018.8374379 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ouassila Kenai ◽

Nassim Asbai ◽

Siham Ouamour ◽

Mhania Guerti ◽

Salim Djeghiour

Keyword(s):

Detection System ◽

A Priori ◽

Speaker Diarization ◽

Speaker Information

Download Full-text

Text independent emotion recognition for Telugu speech by using prosodic features

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.10887 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 594 ◽

Cited By ~ 1

Author(s):

Kasiprasad Mannepalli ◽

Suman Maloji ◽

Panyam Narahari Sastry ◽

Swetha Danthala ◽

Durgaprasad Mannepalli

Keyword(s):

Emotion Recognition ◽

Speech Production ◽

Speech Signal ◽

Recognition Accuracy ◽

Prosodic Features ◽

Linguistic Information ◽

Human Speech ◽

Different Types ◽

Speaker Information ◽

Types Of Information

The human speech delivers different types of information about the speaker and speech. From the speech production side, the speech signal carries linguistic information such as the meaningful message and the language and emotional, geographical and the speaker’s physiological characteristics of the speaker information are conveyed. This paper focuses on automatically identifying the emotion of a speaker given a sample of speech. the speech signals considered in this work are collected from Telugu speakers. The features like pitch, pitch related prosody, energy and formants. The overall recognition accuracy obtained is 72% in this work.

Download Full-text

speaker information
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Domain Mismatch Problem in the Broadcast Speaker Attribution Task

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

S3: An AI-Enabled User Continuous Authentication for Smartphones Based on Sensors, Statistics and Speaker Information

Keynote Speaker: Information Technology against COVID-19

The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues

Distinct Neural Networks Relate to Common and Speaker-Specific Language Priors

Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

Factor Analysis of Utterances in Japanese Fiction-Writing Based on BCCWJ Speaker Information Corpus

Speaker diarization and detection system using a priori speaker information

Text independent emotion recognition for Telugu speech by using prosodic features

Export Citation Format

speaker informationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

The Domain Mismatch Problem in the Broadcast Speaker Attribution Task

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

S3: An AI-Enabled User Continuous Authentication for Smartphones Based on Sensors, Statistics and Speaker Information

Keynote Speaker: Information Technology against COVID-19

The distribution of speaker information in Dutch fricatives /s/ and /x/ from telephone dialogues

Distinct Neural Networks Relate to Common and Speaker-Specific Language Priors

Multi-dimensional Speaker Information Recognition with Multi-task Neural Network

Factor Analysis of Utterances in Japanese Fiction-Writing Based on BCCWJ Speaker Information Corpus

Speaker diarization and detection system using a priori speaker information

Text independent emotion recognition for Telugu speech by using prosodic features

speaker information
Recently Published Documents