HMM topology for boundary refinement in automatic speech segmentation

Abstract. The Rapid Serial Visual Presentation procedure is a method widely used in visual perception research. In this paper we propose an adaptation of this method which can be used with auditory material and enables assessment of statistical learning in speech segmentation. Adult participants were exposed to an artificial speech stream composed of statistically defined trisyllabic nonsense words. They were subsequently instructed to perform a detection task in a Rapid Serial Auditory Presentation (RSAP) stream in which they had to detect a syllable in a short speech stream. Results showed that reaction times varied as a function of the statistical predictability of the syllable: second and third syllables of each word were responded to faster than first syllables. This result suggests that the RSAP procedure provides a reliable and sensitive indirect measure of auditory statistical learning.

Download Full-text

Bridging the gap between speech segmentation and word-to-world mappings: Evidence from an audiovisual statistical learning task

PsycEXTRA Dataset ◽

10.1037/e512592013-037 ◽

2011 ◽

Author(s):

T. Cunillera

Keyword(s):

Statistical Learning ◽

Learning Task ◽

Speech Segmentation

Download Full-text

Visual speech segmentation: Using facial cues to locate word boundaries in continuous speech

PsycEXTRA Dataset ◽

10.1037/e520592012-436 ◽

2010 ◽

Author(s):

Aaron D. Mitchel ◽

Daniel J. Weiss

Keyword(s):

Visual Speech ◽

Speech Segmentation ◽

Continuous Speech ◽

Facial Cues ◽

Word Boundaries

Download Full-text

Trochaic rhythm in speech segmentation

PsycEXTRA Dataset ◽

10.1037/e536982012-705 ◽

1997 ◽

Cited By ~ 2

Author(s):

Jean Vroomen ◽

Beatrice de Gelder

Keyword(s):

Speech Segmentation

Download Full-text

Utility-based evaluation metrics for models of language acquisition: A look at speech segmentation

10.3115/v1/w15-1108 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lawrence Phillips ◽

Lisa Pearl

Keyword(s):

Language Acquisition ◽

Speech Segmentation ◽

Evaluation Metrics

Download Full-text

Statistical Speech Segmentation in Tone Languages: The Role of Lexical Tones

Language and Speech ◽

10.1177/0023830917706529 ◽

2017 ◽

Vol 61 (1) ◽

pp. 84-96 ◽

Cited By ~ 7

Author(s):

David M. Gómez ◽

Peggy Mok ◽

Mikhail Ordin ◽

Jacques Mehler ◽

Marina Nespor

Keyword(s):

Speech Processing ◽

Additional Data ◽

Native Speakers ◽

Speech Segmentation ◽

Lexical Tones ◽

Tone Languages ◽

Transitional Probabilities ◽

Lexical Processes

Research has demonstrated distinct roles for consonants and vowels in speech processing. For example, consonants have been shown to support lexical processes, such as the segmentation of speech based on transitional probabilities (TPs), more effectively than vowels. Theory and data so far, however, have considered only non-tone languages, that is to say, languages that lack contrastive lexical tones. In the present work, we provide a first investigation of the role of consonants and vowels in statistical speech segmentation by native speakers of Cantonese, as well as assessing how tones modulate the processing of vowels. Results show that Cantonese speakers are unable to use statistical cues carried by consonants for segmentation, but they can use cues carried by vowels. This difference becomes more evident when considering tone-bearing vowels. Additional data from speakers of Russian and Mandarin suggest that the ability of Cantonese speakers to segment streams with statistical cues carried by tone-bearing vowels extends to other tone languages, but is much reduced in speakers of non-tone languages.

Download Full-text

Which came first: Infants learning language or motherese?

Behavioral and Brain Sciences ◽

10.1017/s0140525x04240110 ◽

2004 ◽

Vol 27 (4) ◽

pp. 505-506 ◽

Cited By ~ 1

Author(s):

Heather Bortfeld

Keyword(s):

Language Acquisition ◽

Word Recognition ◽

Language Learning ◽

Learning Process ◽

Building Blocks ◽

Speech Segmentation ◽

Learning Language

Although motherese may facilitate language acquisition, recent findings indicate that not all aspects of motherese are necessary for word recognition and speech segmentation, the building blocks of language learning. Rather, exposure to input that has prosodic, phonological, and statistical consistencies is sufficient to jump-start the learning process. In light of this, the infant-directedness of the input might be considered superfluous, at least insofar as language acquisition is concerned.

Download Full-text

Robust interactive image segmentation with automatic boundary refinement

2010 IEEE International Conference on Image Processing ◽

10.1109/icip.2010.5652012 ◽

2010 ◽

Cited By ~ 5

Author(s):

Dingding Liu ◽

Yingen Xiong ◽

Linda Shapiro ◽

Kari Pulli

Keyword(s):

Image Segmentation ◽

Interactive Image Segmentation ◽

Boundary Refinement

Download Full-text

Automatic speech segmentation using throat-acoustic correlation coefficients

Open Engineering ◽

10.1515/eng-2016-0039 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Rustam Rafikovich Mussabayev ◽

Maksat N. Kalimoldayev ◽

Yedilkhan N. Amirgaliyev ◽

Timur R. Mussabayev

Keyword(s):

Large Volume ◽

Automatic Segmentation ◽

Correlation Coefficients ◽

High Accuracy ◽

Speech Segmentation ◽

Preliminary Training ◽

Effi Ciency ◽

Segmentation Accuracy ◽

New Type ◽

The Given

Abstract This work considers one of the approaches to the solution of the task of discrete speech signal automatic segmentation. The aim of this work is to construct such an algorithm which should meet the following requirements: segmentation of a signal into acoustically homogeneous segments, high accuracy and segmentation speed, unambiguity and reproducibility of segmentation results, lack of necessity of preliminary training with the use of a special set consisting of manually segmented signals. Development of the algorithm which corresponds to the given requirements was conditioned by the necessity of formation of automatically segmented speech databases that have a large volume. One of the new approaches to the solution of this task is viewed in this article. For this purpose we use the new type of informative features named TAC-coefficients (Throat-Acoustic Correlation coefficients) which provide sufficient segmentation accuracy and effi- ciency.

Download Full-text

Sequential speech segmentation based on the spectral ARMA transition measure

Circuits Systems and Signal Processing ◽

10.1007/bf01187694 ◽

1996 ◽

Vol 15 (1) ◽

pp. 71-92 ◽

Cited By ~ 1

Author(s):

Srbijanka R. Turajlić ◽

Zoran M. Šarić

Keyword(s):

Speech Segmentation ◽

Transition Measure

Download Full-text