acoustic similarity Latest Research Papers

Models for keyword spotting in continuous recordings can significantly improve the experience of navigating vast libraries of audio recordings. In this paper, we describe the development of such a keyword spotting system detecting regions of interest in Polish call centre conversations. Unfortunately, in spite of recent advancements in automatic speech recognition systems, human-level transcription accuracy reported on English benchmarks does not reflect the performance achievable in low-resource languages, such as Polish. Therefore, in this work, we shift our focus from complete speech-to-text conversion to acoustic similarity matching in the hope of reducing the demand for data annotation. As our primary approach, we evaluate Siamese and prototypical neural networks trained on several datasets of English and Polish recordings. While we obtain usable results in English, our models’ performance remains unsatisfactory when applied to Polish speech, both after mono- and cross-lingual training. This performance gap shows that generalisation with limited training resources is a significant obstacle for actual deployments in low-resource languages. As a potential countermeasure, we implement a detector using audio embeddings generated with a generic pre-trained model provided by Google. It has a much more favourable profile when applied in a cross-lingual setup to detect Polish audio patterns. Nevertheless, despite these promising results, its performance on out-of-distribution data are still far from stellar. It would indicate that, in spite of the richness of internal representations created by more generic models, such speech embeddings are not entirely malleable to cross-language transfer.

Download Full-text

Territorial responses of male Bermuda White‐eyed Vireos ( Vireo griseus subsp. bermudianus ) reflect phylogenetic similarity of intruders and acoustic similarity of their songs

Journal of Field Ornithology ◽

10.1111/jofo.12384 ◽

2021 ◽

Author(s):

Miguel A. Mejías ◽

Julissa Roncal ◽

David R. Wilson

Keyword(s):

Acoustic Similarity ◽

Vireo Griseus

Download Full-text

The impact of non-target events in synthetic soundscapes for sound event detection

10.31219/osf.io/zcvs3 ◽

2021 ◽

Author(s):

Francesca Ronchini ◽

Romain Serizel ◽

Nicolas Turpault ◽

Samuele Cornell

Keyword(s):

Event Detection ◽

Signal To Noise Ratio ◽

Acoustic Similarity ◽

Signal To Noise ◽

System P ◽

Sound Event ◽

Sound Event Detection ◽

Preliminary Study ◽

Future Work ◽

The Impact

Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes. Until recently only target sound events were considered when synthesizing the soundscapes. However, recorded soundscapes often contain a substantial amount of non-target events that may affect the performance. In this paper, we focus on the impact of these non-target events in the synthetic soundscapes. Firstly, we investigate to what extent using non-target events alternatively during the training or validation phase (or none of them) helps the system to correctly detect target events. Secondly, we analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance. The results show that using both target and non-target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases).The paper also reports the results of a preliminary study on evaluating the system on clips that contain only non-target events. This opens questions for future work on non-target subset and acoustic similarity between target and non-target events which might confuse the system.

Download Full-text

The Role of Task and Acoustic Similarity in Audio Transfer Learning: Insights from the Speech Emotion Recognition Case

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414896 ◽

2021 ◽

Author(s):

Andreas Triantafyllopoulos ◽

Bjorn W. Schuller

Keyword(s):

Emotion Recognition ◽

Transfer Learning ◽

Speech Emotion Recognition ◽

Acoustic Similarity

Download Full-text

Vocal learning in Savannah sparrows: acoustic similarity to neighbours shapes song development and territorial aggression

Animal Behaviour ◽

10.1016/j.anbehav.2021.03.015 ◽

2021 ◽

Vol 176 ◽

pp. 77-86

Author(s):

Ian P. Thomas ◽

Stéphanie M. Doucet ◽

D. Ryan Norris ◽

Amy E.M. Newman ◽

Heather Williams ◽

...

Keyword(s):

Vocal Learning ◽

Acoustic Similarity ◽

Territorial Aggression ◽

Song Development ◽

Savannah Sparrows

Download Full-text

The role of learning, acoustic similarity and phylogenetic relatedness in the recognition of distress calls in birds

Animal Behaviour ◽

10.1016/j.anbehav.2021.02.015 ◽

2021 ◽

Vol 175 ◽

pp. 111-121

Author(s):

Yingtong Wu ◽

Anna L. Petrosky ◽

Nicolas A. Hazzi ◽

Rebecca Lynn Woodward ◽

Luis Sandoval

Keyword(s):

Acoustic Similarity ◽

Phylogenetic Relatedness ◽

Distress Calls

Download Full-text

Phonetic feature size in second language acquisition: Examining VOT in voiceless and voiced stops

Second language Research ◽

10.1177/02676583211008951 ◽

2021 ◽

pp. 026765832110089

Author(s):

Daniel J Olson

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Voice Onset Time ◽

Onset Time ◽

Stop Consonant ◽

Acoustic Similarity ◽

Stop Consonants ◽

Voiceless Stop ◽

English Speaking ◽

Underlying Mechanisms

Featural approaches to second language phonetic acquisition posit that the development of new phonetic norms relies on sub-phonemic features, expressed through a constellation of articulatory gestures and their corresponding acoustic cues, which may be shared across multiple phonemes. Within featural approaches, largely supported by research in speech perception, debate remains as to the fundamental scope or ‘size’ of featural units. The current study examines potential featural relationships between voiceless and voiced stop consonants, as expressed through the voice onset time cue. Native English-speaking learners of Spanish received targeted training on Spanish voiceless stop consonant production through a visual feedback paradigm. Analysis focused on the change in voice onset time, for both voiceless (i.e. trained) and voiced (i.e. non-trained) phonemes, across the pretest, posttest, and delayed posttest. The results demonstrated a significant improvement (i.e. reduction) in voice onset time for voiceless stops, which were subject to the training paradigm. In contrast, there was no significant change in the non-trained voiced stop consonants. These results suggest a limited featural relationship, with independent voice onset time (VOT) cues for voiceless and voices phonemes. Possible underlying mechanisms that limit feature generalization in second language (L2) phonetic production, including gestural considerations and acoustic similarity, are discussed.

Download Full-text

Validating deep learning seabed classification via acoustic similarity

JASA Express Letters ◽

10.1121/10.0004138 ◽

2021 ◽

Vol 1 (4) ◽

pp. 040802

Author(s):

David J. Forman ◽

Tracianne B. Neilsen ◽

David F. Van Komen ◽

David P. Knobles

Keyword(s):

Deep Learning ◽

Acoustic Similarity ◽

Seabed Classification

Download Full-text

The Role of Acoustic Similarity and Non-Native Categorisation in Predicting Non-Native Discrimination: Brazilian Portuguese Vowels by English vs. Spanish Listeners

Languages ◽

10.3390/languages6010044 ◽

2021 ◽

Vol 6 (1) ◽

pp. 44

Author(s):

Jaydene Elvin ◽

Daniel Williams ◽

Jason A. Shaw ◽

Catherine T. Best ◽

Paola Escudero

Keyword(s):

Discrimination Performance ◽

Brazilian Portuguese ◽

Acoustic Similarity ◽

Vowel Perception ◽

Comparable Performance ◽

Learning Scenarios ◽

Australian English ◽

The Individual ◽

Better Than

This study tests whether Australian English (AusE) and European Spanish (ES) listeners differ in their categorisation and discrimination of Brazilian Portuguese (BP) vowels. In particular, we investigate two theoretically relevant measures of vowel category overlap (acoustic vs. perceptual categorisation) as predictors of non-native discrimination difficulty. We also investigate whether the individual listener’s own native vowel productions predict non-native vowel perception better than group averages. The results showed comparable performance for AusE and ES participants in their perception of the BP vowels. In particular, discrimination patterns were largely dependent on contrast-specific learning scenarios, which were similar across AusE and ES. We also found that acoustic similarity between individuals’ own native productions and the BP stimuli were largely consistent with the participants’ patterns of non-native categorisation. Furthermore, the results indicated that both acoustic and perceptual overlap successfully predict discrimination performance. However, accuracy in discrimination was better explained by perceptual similarity for ES listeners and by acoustic similarity for AusE listeners. Interestingly, we also found that for ES listeners, the group averages explained discrimination accuracy better than predictions based on individual production data, but that the AusE group showed no difference.

Download Full-text

acoustic similarity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Synthesizing Directed Lighting Videos Using Acoustic Similarity

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario

Territorial responses of male Bermuda White‐eyed Vireos ( Vireo griseus subsp. bermudianus ) reflect phylogenetic similarity of intruders and acoustic similarity of their songs

The impact of non-target events in synthetic soundscapes for sound event detection

The Role of Task and Acoustic Similarity in Audio Transfer Learning: Insights from the Speech Emotion Recognition Case

Vocal learning in Savannah sparrows: acoustic similarity to neighbours shapes song development and territorial aggression

The role of learning, acoustic similarity and phylogenetic relatedness in the recognition of distress calls in birds

Phonetic feature size in second language acquisition: Examining VOT in voiceless and voiced stops

Validating deep learning seabed classification via acoustic similarity

The Role of Acoustic Similarity and Non-Native Categorisation in Predicting Non-Native Discrimination: Brazilian Portuguese Vowels by English vs. Spanish Listeners

Export Citation Format

acoustic similarityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Synthesizing Directed Lighting Videos Using Acoustic Similarity

Generalisation Gap of Keyword Spotters in a Cross-Speaker Low-Resource Scenario

Territorial responses of male Bermuda White‐eyed Vireos ( Vireo griseus subsp. bermudianus ) reflect phylogenetic similarity of intruders and acoustic similarity of their songs

The impact of non-target events in synthetic soundscapes for sound event detection

The Role of Task and Acoustic Similarity in Audio Transfer Learning: Insights from the Speech Emotion Recognition Case

Vocal learning in Savannah sparrows: acoustic similarity to neighbours shapes song development and territorial aggression

The role of learning, acoustic similarity and phylogenetic relatedness in the recognition of distress calls in birds

Phonetic feature size in second language acquisition: Examining VOT in voiceless and voiced stops

Validating deep learning seabed classification via acoustic similarity

The Role of Acoustic Similarity and Non-Native Categorisation in Predicting Non-Native Discrimination: Brazilian Portuguese Vowels by English vs. Spanish Listeners

acoustic similarity
Recently Published Documents