Prosodic Cues for Automatic Phrase Boundary Detection in ASR

Author(s):  
Klára Vicsi ◽  
György Szaszák
2021 ◽  
Vol 1 (1) ◽  
pp. 15-30
Author(s):  
Carolina Garcia de Carvalho Silva ◽  
Maria Cristina Name

We investigate the role of phonological phrase boundary cues on syntactic parsing by Brazilian Portuguese (henceforth, BP) native adults. It is assumed that speech is organized in a hierarchy of prosodic constituents that may relate to constituents of other components of grammar (Nespor e Vogel, 1986). Although this is not necessarily a one-to-one relationship, a mapping is possible between constituents of each component, such as between phonological phrases and certain syntactic unit. The production of reliable prosodic cues in spontaneous speech is controversial. For instance, Snedeker e Trueswell (2003) propose that only expert speakers produce disambiguating prosodic cues; for Kraljic e Brennan (2005), even naïve speakers produce prosodic cues that are helpful for listeners. Millotte et al. (2007) found that French native speakers produced reliable prosodic cues (phrase-final lengthening and pitch rise) when they read pairs of ambiguous sentences that differed in their prosodic structures. The authors also found that native listeners were able to use these cues to assign the ambiguous words to their correct lexical categories. Then, -boundary cues may help native listeners to correctly analyze ambiguous sentences. Motivated by the French experiment results, we proposed two experiments in order to test the influence of prosody on syntactic analysis by BP adults. In the first experiment, a sentence-reading task, participants produced different prosodic patterns for ambiguous words (verb or adjective) in different syntactic structures. Duration, pitch and energy values of the segments around the-boundaries were measured and revealed that (i)-boundaries were marked by acoustical reliable cues; and (ii) the lexical categories N, V and ADJ have different behaviors in the prosodic structure. Figure A: Example of the Noun + ambiguous word - Adj [garota MUDA] (on the left) and V [garota] [MUDA…] (on the right).In the second experiment, listeners were asked to complete the auditory ambiguous sentences that were just cut after the target words (Eu acho que a menina LIMPA… . I think the clean girl…/the girl cleans…). Participants gave more verb responses in the Verb condition and more adjective responses in the Adjective condition. These results suggest that BP adults are able to use phonological phrase boundary cues to decide if an ambiguous word is a verb or an adjective and, then, to constrain syntactic analysis. We discuss the implication of these results for models of online syntactic analysis and language acquisition.Figure B: Experiment 2- Mean number of adjective and verb responses given to adjective and verb sentences (out of 4 possible responses for each sentence type).


2009 ◽  
Vol 23 (2) ◽  
pp. 63-76 ◽  
Author(s):  
Silke Paulmann ◽  
Sarah Jessen ◽  
Sonja A. Kotz

The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.


2009 ◽  
Vol 35 (2) ◽  
pp. 132-136 ◽  
Author(s):  
Xiao-Mao LI ◽  
Lin-Lin ZHU ◽  
Yan-Dong TANG

Sign in / Sign up

Export Citation Format

Share Document