Visual speech segmentation: Using facial cues to locate word boundaries in continuous speech

Visual speech segmentation: using facial cues to locate word boundaries in continuous speech

Language Cognition and Neuroscience ◽

10.1080/01690965.2013.791703 ◽

2013 ◽

Vol 29 (7) ◽

pp. 771-780 ◽

Cited By ~ 10

Author(s):

Aaron D. Mitchel ◽

Daniel J. Weiss

Keyword(s):

Visual Speech ◽

Speech Segmentation ◽

Continuous Speech ◽

Facial Cues ◽

Word Boundaries

A Statistical Model for Word Discovery in Transcribed Speech

Computational Linguistics ◽

10.1162/089120101317066113 ◽

2001 ◽

Vol 27 (3) ◽

pp. 351-372 ◽

Cited By ~ 37

Author(s):

Anand Venkataraman

Keyword(s):

Statistical Model ◽

Unsupervised Learning ◽

Learning Algorithm ◽

Continuous Speech ◽

Word Boundaries ◽

Empirical Tests

A statistical model for segmentation and word discovery in continuous speech is presented. An incremental unsupervised learning algorithm to infer word boundaries based on this model is described. Results are also presented of empirical tests showing that the algorithm is competitive with other models that have been used for similar tasks.

Myanmar Continuous Speech Recognition System Using Fuzzy Logic Classification in Speech Segmentation

Proceedings of the 2018 International Conference on Intelligent Information Technology - ICIIT 2018 ◽

10.1145/3193063.3193071 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yin Win Chit ◽

Soe Soe Khaing

Keyword(s):

Fuzzy Logic ◽

Speech Recognition ◽

Recognition System ◽

Speech Segmentation ◽

Speech Recognition System ◽

Continuous Speech ◽

Continuous Speech Recognition

Bootstrapping Word Boundaries: A Bottom-up Corpus-Based Approach to Speech Segmentation

Cognitive Psychology ◽

10.1006/cogp.1997.0649 ◽

1997 ◽

Vol 33 (2) ◽

pp. 111-153 ◽

Cited By ~ 78

Author(s):

Paul Cairns ◽

Richard Shillcock ◽

Nick Chater ◽

Joe Levy

Keyword(s):

Speech Segmentation ◽

Bottom Up ◽

Word Boundaries

A general language-operated decision implementation system (GLODIS): Its application to continuous-speech segmentation

IEEE Transactions on Acoustics Speech and Signal Processing ◽

10.1109/tassp.1976.1162793 ◽

1976 ◽

Vol 24 (2) ◽

pp. 137-162 ◽

Cited By ~ 18

Author(s):

N. Dixon ◽

H. Silverman

Keyword(s):

Speech Segmentation ◽

Continuous Speech

The Influence of Different Prosodic Cues on Word Segmentation

Frontiers in Psychology ◽

10.3389/fpsyg.2021.622042 ◽

2021 ◽

Vol 12 ◽

Author(s):

Theresa Matzinger ◽

Nikolaus Ritt ◽

W. Tecumseh Fitch

Keyword(s):

Language Learning ◽

Speech Segmentation ◽

Continuous Speech ◽

Artificial Language Learning ◽

Prosodic Cues ◽

German Speaking ◽

Final Syllable ◽

Language Universal ◽

Study Context ◽

Transitional Probabilities

A prerequisite for spoken language learning is segmenting continuous speech into words. Amongst many possible cues to identify word boundaries, listeners can use both transitional probabilities between syllables and various prosodic cues. However, the relative importance of these cues remains unclear, and previous experiments have not directly compared the effects of contrasting multiple prosodic cues. We used artificial language learning experiments, where native German speaking participants extracted meaningless trisyllabic “words” from a continuous speech stream, to evaluate these factors. We compared a baseline condition (statistical cues only) to five test conditions, in which word-final syllables were either (a) followed by a pause, (b) lengthened, (c) shortened, (d) changed to a lower pitch, or (e) changed to a higher pitch. To evaluate robustness and generality we used three tasks varying in difficulty. Overall, pauses and final lengthening were perceived as converging with the statistical cues and facilitated speech segmentation, with pauses helping most. Final-syllable shortening hindered baseline speech segmentation, indicating that when cues conflict, prosodic cues can override statistical cues. Surprisingly, pitch cues had little effect, suggesting that duration may be more relevant for speech segmentation than pitch in our study context. We discuss our findings with regard to the contribution to speech segmentation of language-universal boundary cues vs. language-specific stress patterns.

Alpha activity marking word boundaries mediates speech segmentation

European Journal of Neuroscience ◽

10.1111/ejn.12008 ◽

2012 ◽

Vol 36 (12) ◽

pp. 3740-3748 ◽

Cited By ~ 8

Author(s):

Antoine J. Shahin ◽

Mark A. Pitt

Keyword(s):

Alpha Activity ◽

Speech Segmentation ◽

Word Boundaries

Multimodal Second-Language Communication: Research Findings and Pedagogical Implications

RELC Journal ◽

10.1177/0033688220966635 ◽

2020 ◽

pp. 003368822096663

Author(s):

Debra M Hardison ◽

Martha C Pennington

Keyword(s):

Second Language ◽

Speech Processing ◽

Visual Cues ◽

Visual Speech ◽

Facial Cues ◽

L2 Learners ◽

Language Communication ◽

Research Findings ◽

L2 Perception ◽

Teaching Of Listening

This article reviews research findings involving visual input in speech processing in the form of facial cues and co-speech gestures for second-language (L2) learners, and provides pedagogical implications for the teaching of listening and speaking. It traces the foundations of auditory–visual speech research and explores the role of a speaker’s facial cues in L2 perception training and gestural cues in listening comprehension. There is a strong role for pedagogy to maximize the salience of multimodal cues for L2 learners. Visible articulatory gestures that precede the acoustic signal and the preparation phase of a hand gesture that precedes the acoustic onset of a word provide a priming effect on perceivers’ attention to signal upcoming information and facilitate processing, and visible gestures that co-occur with speech aid ongoing processing and comprehension. L2 learners benefit from an awareness of these visual cues and exposure to input.

When statistics collide: The use of transitional and phonotactic probability cues to word boundaries by Brazilian-Portuguese adults

10.31219/osf.io/5m2bk ◽

2019 ◽

Author(s):

Rodrigo Dal Ben ◽

Débora de Hollanda Souza ◽

Jessica Hay

Keyword(s):

Second Language Acquisition ◽

Brazilian Portuguese ◽

Speech Segmentation ◽

Transitional Probability ◽

Phonotactic Probability ◽

Test Items ◽

Early Language Development ◽

Significant Difference ◽

Word Boundaries ◽

Statistical Regularities

Statistical regularities in linguistic input shape early language development and second language acquisition. For example, both transitional probability and phonotactic probability play a role in speech segmentation, however, it remains unclear whether or how these statistics are combined when small differences in phonotactic probabilities are presented. We conducted two experiments to investigate the effects of transitional and phonotactic probabilities on speech segmentation by Brazilian-Portuguese-speaking adults. Four pseudo-languages, with six words each, were created. The transitional probabilities between words’ biphones were high, whereas the probabilities between part-words’ biphones were lower. Although the within and between word phonotactic probability were always high, they varied slightly across the familiarization languages and test words/part-words. Languages 1 and 2 had familiarization words with unbalanced phonotactics, but target words and part-words used at test were phonotactically balanced. Languages 3 and 4 had familiarization words with balanced phonotactics, but phonotactics were unbalanced across test items; In Language 3 words had slightly lower phonotactics that part-words. The reverse was true for Language 4. Eighty-one Brazilian-Portuguese speaking adults were divided in four groups. Each group was familiarized with one version of the language and then tested on two-alternative forced choice trials. Participants presented with Languages 1, 2 and 4 preferred words to part-words at test. However, participants who heard Language 3 did not select words above chance. There was no significant difference in word selection between Language 4 and Languages 1 and 2, despite the fact that phonotactics were higher during both familiarization and test for words from the fourth language. These findings indicate that phonotactic and transitional information can be tracked and combined to facilitate or impair speech segmentation. Furthermore, they suggest that subtle differences in phonotactics are more informative of word boundaries than congruency between high phonotactic and transitional probability cues.

Infants' Sensitivity to Vowel Harmony and its Role in Segmenting Speech

10.31234/osf.io/zw6cg ◽

2017 ◽

Author(s):

Toben Herbert Mintz ◽

Rachel L. Walker ◽

Celeste Kidd ◽

Ashlee Welday

Keyword(s):

Vowel Harmony ◽

Continuous Speech ◽

Processing Load ◽

Present Evidence ◽

Speech Input ◽

Speech Stream ◽

Sound Structure ◽

Word Boundaries ◽

Word Forms ◽

Shed Light

A critical part of infants’ ability to acquire any language involves segmenting continuous speech input into discrete word-forms. Certain properties of words could provide infants with reliable cues to word boundaries. Here we investigate the potential utility of vowel harmony (VH), a phonological property whereby vowels within a word systematically exhibit similarity (“harmony”) for some aspect of the way they are pronounced. We present evidence that infants with no experience of VH in their native language nevertheless actively use these patterns to generate hypotheses about where words begin and end in the speech stream. In two experiments, we exposed infants learning English, a language without VH, to a continuous speech stream in which the only systematic patterns available to be used as cues to word boundaries came from syllable sequences that showed VH or those that showed vowel disharmony (dissimilarity). After hearing less than one minute of the streams, infants showed evidence of sensitivity to VH cues. These results suggest that infants have an experience-independent sensitivity to VH, and are predisposed to segment speech according to harmony patterns. We also found that when the VH patterns were more subtle (Experiment 2), infants required more exposure to the speech stream before they segmented based on VH, consistent with previous work on infants’ preferences relating to processing load. Our findings evidence a previously unknown mechanism by which infants could discover the words of their language, and they shed light on the perceptual mechanisms that might be responsible for the emergence of vowel harmony as an organizing principle for the sound structure of words in many languages.