How Looking While Listening Affects Speech Segmentation

Author(s):  
Jaime Leung

This study looks at the mechanisms behind how people learn words of a new language. Syllables that occur within words have a higher chance of occurring together than the syllables between words. Both infants and adults use these transitional probabilities to extract the words in language. However, previous research has examined speech segmentation when learners are presented just with speech. In natural context, we look while we listen and what we see is correlated with what we hear. The goal of my study was to explore how visual context affects adult speech segmentation. To do so, we have three conditions: one where adults were presented with only a word stream, one where while listening adults saw animations that corresponded to words they heard, and one where the animations that the adults saw did not correspond to the words they heard. One hypothesis is that participants in the audio-visual conditions perform better at the segmentation task because the statistical boundaries in the audio are reinforced by the visual boundaries between animations. However, it is also possible that the visual information impairs performance because learners engage in learning the meanings of words in addition to speech segmentation. Preliminary results support the latter hypothesis.

2017 ◽  
Vol 61 (1) ◽  
pp. 84-96 ◽  
Author(s):  
David M. Gómez ◽  
Peggy Mok ◽  
Mikhail Ordin ◽  
Jacques Mehler ◽  
Marina Nespor

Research has demonstrated distinct roles for consonants and vowels in speech processing. For example, consonants have been shown to support lexical processes, such as the segmentation of speech based on transitional probabilities (TPs), more effectively than vowels. Theory and data so far, however, have considered only non-tone languages, that is to say, languages that lack contrastive lexical tones. In the present work, we provide a first investigation of the role of consonants and vowels in statistical speech segmentation by native speakers of Cantonese, as well as assessing how tones modulate the processing of vowels. Results show that Cantonese speakers are unable to use statistical cues carried by consonants for segmentation, but they can use cues carried by vowels. This difference becomes more evident when considering tone-bearing vowels. Additional data from speakers of Russian and Mandarin suggest that the ability of Cantonese speakers to segment streams with statistical cues carried by tone-bearing vowels extends to other tone languages, but is much reduced in speakers of non-tone languages.


2020 ◽  
Vol 94 (2) ◽  
pp. 305-322
Author(s):  
Henrik Lagerlund ◽  

In this article, I present two virtually unknown sixteenth-century views of human freedom, that is, the views of Bartolomaeus de Usingen (1465–1532) and Jodocus Trutfetter (1460–1519) on the one hand and John Mair (1470–1550) on the other. Their views serve as a natural context and partial background to the more famous debate on human freedom between Martin Luther (1483–1556) and Erasmus of Rotterdam (1466–1536) from 1524–1526. Usingen and Trutfetter were Luther’s philosophy teachers in Erfurt. In a passage from Book III of John Mair’s commentary on Aristotle’s Nicomachean Ethics from 1530, he seems to defend a view of human freedom by which we can will evil for the sake of evil. Very few thinkers in the history of philosophy have defended such a view. The most famous medieval thinker to do so is William Ockham (1288–1347). To illustrate how radical this view is, I place him in the historical context of such thinkers as Plato, Augustine, Buridan, and Descartes.


Author(s):  
Louise Goyet ◽  
Séverine Millotte ◽  
Anne Christophe ◽  
Thierry Nazzi

The present chapter focuses on fluent speech segmentation abilities in early language development. We first review studies exploring the early use of major prosodic boundary cues which allow infants to cut full utterances into smaller-sized sequences like clauses or phrases. We then summarize studies showing that word segmentation abilities emerge around 8 months, and rely on infants’ processing of various bottom-up word boundary cues and top-down known word recognition cues. Given that most of these cues are specific to the language infants are acquiring, we emphasize how the development of these abilities varies cross-linguistically, and explore their developmental origin. In particular, we focus on two cues that might allow bootstrapping of these abilities: transitional probabilities and rhythmic units.


2021 ◽  
Vol 12 ◽  
Author(s):  
Theresa Matzinger ◽  
Nikolaus Ritt ◽  
W. Tecumseh Fitch

A prerequisite for spoken language learning is segmenting continuous speech into words. Amongst many possible cues to identify word boundaries, listeners can use both transitional probabilities between syllables and various prosodic cues. However, the relative importance of these cues remains unclear, and previous experiments have not directly compared the effects of contrasting multiple prosodic cues. We used artificial language learning experiments, where native German speaking participants extracted meaningless trisyllabic “words” from a continuous speech stream, to evaluate these factors. We compared a baseline condition (statistical cues only) to five test conditions, in which word-final syllables were either (a) followed by a pause, (b) lengthened, (c) shortened, (d) changed to a lower pitch, or (e) changed to a higher pitch. To evaluate robustness and generality we used three tasks varying in difficulty. Overall, pauses and final lengthening were perceived as converging with the statistical cues and facilitated speech segmentation, with pauses helping most. Final-syllable shortening hindered baseline speech segmentation, indicating that when cues conflict, prosodic cues can override statistical cues. Surprisingly, pitch cues had little effect, suggesting that duration may be more relevant for speech segmentation than pitch in our study context. We discuss our findings with regard to the contribution to speech segmentation of language-universal boundary cues vs. language-specific stress patterns.


2019 ◽  
Vol 46 (6) ◽  
pp. 1169-1201
Author(s):  
Andrew CAINES ◽  
Emma ALTMANN-RICHER ◽  
Paula BUTTERY

AbstractWe select three word segmentation models with psycholinguistic foundations – transitional probabilities, the diphone-based segmenter, and PUDDLE – which track phoneme co-occurrence and positional frequencies in input strings, and in the case of PUDDLE build lexical and diphone inventories. The models are evaluated on caregiver utterances in 132 CHILDES corpora representing 28 languages and 11.9 m words. PUDDLE shows the best performance overall, albeit with wide cross-linguistic variation. We explore the reasons for this variation, fitting regression models to performance scores with linguistic properties which capture lexico-phonological characteristics of the input: word length, utterance length, diversity in the lexicon, the frequency of one-word utterances, the regularity of phoneme patterns at word boundaries, and the distribution of diphones in each language. These properties together explain four-tenths of the observed variation in segmentation performance, a strong outcome and a solid foundation for studying further variables which make the segmentation task difficult.


2002 ◽  
Vol 45 (3) ◽  
pp. 519-530 ◽  
Author(s):  
Lisa D. Sanders ◽  
Helen J. Neville ◽  
Marty G. Woldorff

Varying degrees of plasticity in different subsystems of language have been demonstrated by studies showing that some aspects of language are processed similarly by native speakers and late-learners whereas other aspects are processed differently by the two groups. The study of speech segmentation provides a means by which the ability to process different types of linguistic information can be measured within the same task, because lexical, syntactic, and stress-pattern information can all indicate where one word ends and the next begins in continuous speech. In this study, native Japanese and native Spanish late-learners of English (as well as near-monolingual Japanese and Spanish speakers) were asked to determine whether specific sounds fell at the beginning or in the middle of words in English sentences. Similar to native English speakers, late-learners employed lexical information to perform the segmentation task. However, nonnative speakers did not use syntactic information to the same extent as native English speakers. Although both groups of late-learners of English used stress pattern as a segmentation cue, the extent to which this cue was relied upon depended on the stress-pattern characteristics of their native language. These findings support the hypothesis that learning a second language later in life has differential effects on subsystems within language.


1993 ◽  
Vol 20 (2) ◽  
pp. 229-252 ◽  
Author(s):  
Jan V. Goodsitt ◽  
James L. Morgan ◽  
Patricia K. Kuhl

ABSTRACTPrevious work has suggested that infants may segment continuous speech by a BRACKETING STRATEGY that segregates portions of the speech stream based on prosodic cues to their endpoints. The two present studies were designed to assess whether infants also can deploy a CLUSTERING STRATEGY that exploits asymmetries in transitional probabilities between successive elements, aggregating elements with high transitional probabilities and identifying points of low transitional probabilities as boundaries between units. These studies examined effects of the structure and redundancy of speech context on infants' discrimination of two target syllables using an operant head-turning procedure. After discrimination training on the target syllables in isolation, discrimination maintenance was tested when the target syllables were embedded in one of three contexts. Invariant Order contexts were structured to promote clustering, whereas the Redundant and Variable Order contexts were not. Thirty-six seven-month-olds were tested in Experiment I, in which stimuli were produced with varying intonation contours; 36 eight-month-olds were tested in Experiment 2, in which stimuli were produced with comparable flat pitch contours. In both experiments, performance of the three groups was equivalent in an initial 20-trial test. However, in a second 20-trial test, significant improvements in performance were shown by infants in the Invariant Order condition. No such gains were shown by infants in the other two conditions. These studies suggest that clustering may complement bracketing in infants' discovery of units of language.


Perception ◽  
1997 ◽  
Vol 26 (3) ◽  
pp. 287-300 ◽  
Author(s):  
Yann Coello ◽  
Madeleine A Grealy

The aim of this study was to analyse the effects of manipulating the size and contour of the visual field on the accuracy of an aiming task. Subjects were required to perform pointing movements without seeing their moving hand. The target was displayed in either a wide structured visual field (control condition), a narrow visual field with orthogonal frame, or a narrow visual field with circular frame. The visual information surrounding the target was always provided prior to movement onset, but during the execution of the movement on only half of the trials. Overall, the results showed that undershooting was a common performance characteristic in all of the conditions. In comparison to the control performance, an increase of the degree of undershoot was found when the target was displayed inside a narrower visual field. An additional radial error was found when the contour of the visual scene was circular, but only when the visual context was available during the movement. The same pattern of results was observed for variable error. However, angular errors were not found to vary over the different conditions. Overall, the findings suggested that the visual context contributed to the assessment of the target locations, and the subsequent motor programming. Furthermore, visual information aided the on-line control of the unseen hand, but the extent of this was dependent on the size and shape of the frame denoting the visual scene. Finally, in the absence of any unexpected perturbation, the en-route amendment of the arm trajectory, based on visual information processing, seemed to be more related to distance than azimuth control.


1994 ◽  
Vol 37 (5) ◽  
pp. 1086-1099 ◽  
Author(s):  
Nancy L. Records

The contribution of a visual source of contextual information to speech perception was measured in 12 listeners with aphasia. The three experimental conditions were: Visual-Only (referential gesture), Auditory-Only (computer-edited speech), and Audio-Visual. In a two-alternative, forced-choice task, subjects indicated which picture had been requested. The stimuli were first validated with listeners without brain damage. The listeners with aphasia were subgrouped as having high or low language comprehension based on standardized test scores. Results showed a significantly larger contribution of gestural information to the responses of the lower-comprehension subgroup. The contribution of gesture was significantly correlated with the amount of ambiguity experienced with the auditory-only information. These results show that as the auditory information becomes more ambiguous, individuals with impaired language comprehension deficits make greater use of the visual information. The results support clinical observations that speech information received without visual context is perceived differently than when received with visual context.


Sign in / Sign up

Export Citation Format

Share Document