scholarly journals The Relative Weight of Statistical and Prosodic Cues in Speech Segmentation: A Matter of Language-(In)dependency and of Signal Quality

2011 ◽  
Vol 10 (1) ◽  
pp. 87 ◽  
Author(s):  
Tânia Fernandes ◽  
Paulo Ventura ◽  
Régine Kolinsky
2015 ◽  
Vol 6 ◽  
Author(s):  
Ruth de Diego-Balaguer ◽  
Antoni Rodríguez-Fornells ◽  
Anne-Catherine Bachoud-Lévi

2021 ◽  
Vol 12 ◽  
Author(s):  
Theresa Matzinger ◽  
Nikolaus Ritt ◽  
W. Tecumseh Fitch

A prerequisite for spoken language learning is segmenting continuous speech into words. Amongst many possible cues to identify word boundaries, listeners can use both transitional probabilities between syllables and various prosodic cues. However, the relative importance of these cues remains unclear, and previous experiments have not directly compared the effects of contrasting multiple prosodic cues. We used artificial language learning experiments, where native German speaking participants extracted meaningless trisyllabic “words” from a continuous speech stream, to evaluate these factors. We compared a baseline condition (statistical cues only) to five test conditions, in which word-final syllables were either (a) followed by a pause, (b) lengthened, (c) shortened, (d) changed to a lower pitch, or (e) changed to a higher pitch. To evaluate robustness and generality we used three tasks varying in difficulty. Overall, pauses and final lengthening were perceived as converging with the statistical cues and facilitated speech segmentation, with pauses helping most. Final-syllable shortening hindered baseline speech segmentation, indicating that when cues conflict, prosodic cues can override statistical cues. Surprisingly, pitch cues had little effect, suggesting that duration may be more relevant for speech segmentation than pitch in our study context. We discuss our findings with regard to the contribution to speech segmentation of language-universal boundary cues vs. language-specific stress patterns.


1993 ◽  
Vol 20 (2) ◽  
pp. 229-252 ◽  
Author(s):  
Jan V. Goodsitt ◽  
James L. Morgan ◽  
Patricia K. Kuhl

ABSTRACTPrevious work has suggested that infants may segment continuous speech by a BRACKETING STRATEGY that segregates portions of the speech stream based on prosodic cues to their endpoints. The two present studies were designed to assess whether infants also can deploy a CLUSTERING STRATEGY that exploits asymmetries in transitional probabilities between successive elements, aggregating elements with high transitional probabilities and identifying points of low transitional probabilities as boundaries between units. These studies examined effects of the structure and redundancy of speech context on infants' discrimination of two target syllables using an operant head-turning procedure. After discrimination training on the target syllables in isolation, discrimination maintenance was tested when the target syllables were embedded in one of three contexts. Invariant Order contexts were structured to promote clustering, whereas the Redundant and Variable Order contexts were not. Thirty-six seven-month-olds were tested in Experiment I, in which stimuli were produced with varying intonation contours; 36 eight-month-olds were tested in Experiment 2, in which stimuli were produced with comparable flat pitch contours. In both experiments, performance of the three groups was equivalent in an initial 20-trial test. However, in a second 20-trial test, significant improvements in performance were shown by infants in the Invariant Order condition. No such gains were shown by infants in the other two conditions. These studies suggest that clustering may complement bracketing in infants' discovery of units of language.


2018 ◽  
Vol 26 (4) ◽  
pp. 1647 ◽  
Author(s):  
Alessandro Panunzi ◽  
Valentina Saccone

Abstract: This work presents a pilot study for a prosodic analysis of two different spoken structures in spoken Italian within the theoretical framework of the Language into Act Theory (L-AcT): (i) chains of two or more Bound Comments (COB) that do not form a compositional informative and prosodic unit; (ii) compositional Information Units formed by two or more Multiple Comments (CMM), linked together by a conventional prosodic model that implements specific meta-illocutive structures. This work analyzes COBs and CMMs from the DB-IPIC Italian Minicorpus. Different prosodic cues are taken into account: f0 reset, pauses, final lengthening, intensity lowering and initial rush. The distinctive feature for COBs is a flat trend of f0 before the boundary, with a low number of f0 reset, while CMMs vary between different f0 shapes. Vowel elongation and a no rushing speech rate cooperate in perceiving the prolongation of one COB into another. Initial rush is a characteristic feature of CMMs, while the lengthening of the last vowel of the unit is easier to find at the end of a COB than in a CMM.Keywords: prosody; spontaneous speech segmentation; non-terminal breaks; L-AcT.Resumo: Este trabalho apresenta um estudo piloto sobre uma análise prosódica de duas estruturas distintas em italiano falado, sob a perspectiva da Teoria da Língua em Ato (L-AcT): (i) cadeiras de dois ou mais Comentários Ligados (COB) que não formam uma unidade informacional e prosódica composicional; (ii) unidades informacionais composicionais formadas por dois ou mais Comentários Múltiplos (CMM), ligados entre si por um modelo prosódico convencional que implementa estruturas metailocutivas específicas. Os COBs e CMMs analisados foram extraídos do minicorpus italiano disponível no DB-IPIC. Diferentes aspectos prosódicos são levados em conta: reset de f0, pausas, alongamento final, abaixamento de intensidade e rush inicial. O traço distintivo para os COBs é uma tendência a achatamento de f0 antes da fronteira, com um baixo número de reset de f0, enquanto os CMMs variam entre diferentes formatos de f0. Alongamento de vogal e uma velocidade de fala sem rushing cooperam na percepção do prolongamento de um COB naquele que o segue. O rush inicial é um traço característico dos CMMs, enquanto o alongamento da última vogal da unidade é mais fácil de encontrar ao final de um COB do que de um CMM.Palavras-chave: prosódia; segmentação da fala espontânea; quebras não-terminais; L-Act


2015 ◽  
Vol 19 (2) ◽  
pp. 400-414 ◽  
Author(s):  
SARA PETERS ◽  
KATHRYN WILSON ◽  
TIMOTHY W. BOITEAU ◽  
CARLOS GELORMINI-LEZAMA ◽  
AMIT ALMOR

Context and prosody are the main cues native-English speakers rely on to detect and interpret sarcastic irony within spoken discourse. The importance of each type of cue for detecting sarcasm has not been fully investigated in native speakers and has not been examined at all in adult English learners. Here, we compare the extent to which native-English speakers and Arabic-speaking English learners rely on contextual and prosodic cues to identify sarcasm in spoken English, situating these findings within current cross-linguistic effects literature. We show Arabic speakers utilize the cues to a different extent than native speakers: they tend not to utilize prosodic information, focusing on contextual semantic information. These results help clarify the relative weight of contextual and prosodic cues in native-English speakers and support theories that suggest that prosody and emotion could transfer separately in second language learning such that one could transfer while the other does not.


Diagnostica ◽  
2019 ◽  
Vol 65 (3) ◽  
pp. 133-141 ◽  
Author(s):  
Andreas Hirschi ◽  
Madeleine Hänggli ◽  
Noemi Nagy ◽  
Franziska Baumeler ◽  
Claire Johnston ◽  
...  
Keyword(s):  

Zusammenfassung. Die existierende Literatur schlägt eine Vielzahl von potentiellen Prädiktoren für Karriereerfolg vor, welche in ihrer Menge kaum auf eine ökonomische Art erhoben werden können. Um diesen Umstand anzugehen, haben Hirschi, Nagy, Baumeler, Johnston und Spurk (2018) den Karriere-Ressourcen Fragebogen (CRQ; Career Resources Questionnaire) entwickelt und in einer englischsprachigen Version validiert. Basierend auf einer Integration von theoretischer und metaanalytischer Forschung misst der Fragebogen 13 distinkte Faktoren, welche 4 übergeordnete Dimensionen repräsentieren: Wissen und Kompetenzen, Motivation, Umfeld und Aktivitäten bezüglich Karriere. In der vorliegenden Studie wird eine Validierung der deutschsprachigen Version mittels N = 1 666 Personen (Studierende und Berufstätige) vorgenommen. Die Ergebnisse bestätigen die Reliabilität sowie die Faktorstruktur des Fragebogens. Mittels Relative-Weight-Analysen konnte zudem die Wichtigkeit von verschiedenen Faktoren für unterschiedliche Arten von Karriereerfolg gezeigt werden. Das Messinstrument bietet Forschenden und Praktizierenden eine ökonomische, reliable und valide Möglichkeit, um Schlüsselfaktoren für Karriereerfolg zu erfassen.


2009 ◽  
Vol 23 (2) ◽  
pp. 63-76 ◽  
Author(s):  
Silke Paulmann ◽  
Sarah Jessen ◽  
Sonja A. Kotz

The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.


Author(s):  
Ana Franco ◽  
Julia Eberlen ◽  
Arnaud Destrebecqz ◽  
Axel Cleeremans ◽  
Julie Bertels

Abstract. The Rapid Serial Visual Presentation procedure is a method widely used in visual perception research. In this paper we propose an adaptation of this method which can be used with auditory material and enables assessment of statistical learning in speech segmentation. Adult participants were exposed to an artificial speech stream composed of statistically defined trisyllabic nonsense words. They were subsequently instructed to perform a detection task in a Rapid Serial Auditory Presentation (RSAP) stream in which they had to detect a syllable in a short speech stream. Results showed that reaction times varied as a function of the statistical predictability of the syllable: second and third syllables of each word were responded to faster than first syllables. This result suggests that the RSAP procedure provides a reliable and sensitive indirect measure of auditory statistical learning.


2012 ◽  
Author(s):  
Vanessa M. Lammers ◽  
Deborah Lee ◽  
Jenna C. Cox ◽  
Kathleen Frye ◽  
Jeffrey R. Labrador ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document