speaker information
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 6)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 11 (18) ◽  
pp. 8521
Author(s):  
Ignacio Viñals ◽  
Alfonso Ortega ◽  
Antonio Miguel ◽  
Eduardo Lleida

The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, depicting the broadcast domain as a collection of subdomains with particular characteristics. Taking these two factors into account, we also propose alternative approximations robust against domain mismatch. These approximations include a semisupervised alternative as well as a totally unsupervised new hybrid solution fusing diarization and speaker assignment. Thanks to these two approximations, our performance is boosted around a relative 50%. The analysis has been carried out using the corpus for the Albayzín 2020 challenge, a diarization and speaker attribution evaluation working with broadcast data. These data, provided by Radio Televisión Española (RTVE), the Spanish public Radio and TV Corporation, include multiple shows and genres to analyze the impact of new speech technologies in real-world scenarios.


2021 ◽  
Author(s):  
Benjamin van Niekerk ◽  
Leanne Nortje ◽  
Matthew Baas ◽  
Herman Kamper

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3765
Author(s):  
Juan Manuel Espín López ◽  
Alberto Huertas Celdrán ◽  
Javier G. Marín-Blázquez ◽  
Francisco Esquembre ◽  
Gregorio Martínez Pérez

Continuous authentication systems have been proposed as a promising solution to authenticate users in smartphones in a non-intrusive way. However, current systems have important weaknesses related to the amount of data or time needed to build precise user profiles, together with high rates of false alerts. Voice is a powerful dimension for identifying subjects but its suitability and importance have not been deeply analyzed regarding its inclusion in continuous authentication systems. This work presents the S3 platform, an artificial intelligence-enabled continuous authentication system that combines data from sensors, applications statistics and voice to authenticate users in smartphones. Experiments have tested the relevance of each kind of data, explored different strategies to combine them, and determined how many days of training are needed to obtain good enough profiles. Results showed that voice is much more relevant than sensors and applications statistics when building a precise authenticating system, and the combination of individual models was the best strategy. Finally, the S3 platform reached a good performance with only five days of use available for training the users’ profiles. As an additional contribution, a dataset with 21 volunteers interacting freely with their smartphones for more than sixty days has been created and made available to the community.


2020 ◽  
Vol 1 (1) ◽  
Author(s):  
Leon O H Kroczek ◽  
Thomas C Gunter

Abstract Effective natural communication requires listeners to incorporate not only very general linguistic principles which evolved during a lifetime but also other information like the specific individual language use of a particular interlocutor. Traditionally, research has focused on the general linguistic rules, and brain science has shown a left hemispheric fronto-temporal brain network related to this processing. The present fMRI research explores speaker-specific individual language use because it is unknown whether this processing is supported by similar or distinct neural structures. Twenty-eight participants listened to sentences of persons who used more easy or difficult language. This was done by manipulating the proportion of easy SOV vs. complex OSV sentences for each speaker. Furthermore, ambiguous probe sentences were included to test top-down influences of speaker information in the absence of syntactic structure information. We observed distinct neural processing for syntactic complexity and speaker-specific language use. Syntactic complexity correlated with left frontal and posterior temporal regions. Speaker-specific processing correlated with bilateral (right-dominant) fronto-parietal brain regions. Finally, the top-down influence of speaker information was found in frontal and striatal brain regions, suggesting a mechanism for controlled syntactic processing. These findings show distinct neural networks related to general language principles as well as speaker-specific individual language use.


2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Hajime Murai

To analyse the characteristics of utterances in Japanese novels, several attributes (e.g., the speaker, listener, relationship between the speaker and listener, and gender of the speaker) were added to a randomly extracted Japanese novel corpus. A total of 887 data sets, with 5632 annotated utterances, were prepared. Based on the attribute annotated utterance corpus, the characteristics of utterance styles were extracted quantitatively. A chi-square test was used for particles and auxiliary verbs to extract utterance characteristics which reflected the genders of and relationships between the speakers and listeners. Results revealed that the use of imperative words was higher among male characters than their female counterparts, who used more particle verbs, and that auxiliaries of politeness were used more frequently for ‘coworkers’ and ‘superior authorities’. In addition, utterances varied between close and intimate relationships between the speaker and listener. Moreover, repeated factor analyses for 7576 data sets in BCCWJ speaker information corpus revealed ten typical utterance styles (neutral, frank, dialect, polite, feminine, crude, aged, interrogative, approval, and dandy). The factor scores indicated relationships between various utterance styles and fundamental attributes of speakers. Thus, results of this study would be utilisable in speaker identification tasks, automatic speech generation tasks, and scientific interpretation of stories and characters.


2018 ◽  
Vol 7 (2.7) ◽  
pp. 594 ◽  
Author(s):  
Kasiprasad Mannepalli ◽  
Suman Maloji ◽  
Panyam Narahari Sastry ◽  
Swetha Danthala ◽  
Durgaprasad Mannepalli

The human speech delivers different types of information about the speaker and speech. From the speech production side, the speech signal carries linguistic information such as the meaningful message and the language and emotional, geographical and the speaker’s physiological characteristics of the speaker information are conveyed. This paper focuses on automatically identifying the emotion of a speaker given a sample of speech. the speech signals considered in this work are collected from Telugu speakers. The features like pitch, pitch related prosody, energy and formants. The overall recognition accuracy obtained is 72% in this work. 


Sign in / Sign up

Export Citation Format

Share Document