Prosodic Cues for Automatic Phrase Boundary Detection in ASR

We investigate the role of phonological phrase boundary cues on syntactic parsing by Brazilian Portuguese (henceforth, BP) native adults. It is assumed that speech is organized in a hierarchy of prosodic constituents that may relate to constituents of other components of grammar (Nespor e Vogel, 1986). Although this is not necessarily a one-to-one relationship, a mapping is possible between constituents of each component, such as between phonological phrases and certain syntactic unit. The production of reliable prosodic cues in spontaneous speech is controversial. For instance, Snedeker e Trueswell (2003) propose that only expert speakers produce disambiguating prosodic cues; for Kraljic e Brennan (2005), even naïve speakers produce prosodic cues that are helpful for listeners. Millotte et al. (2007) found that French native speakers produced reliable prosodic cues (phrase-final lengthening and pitch rise) when they read pairs of ambiguous sentences that differed in their prosodic structures. The authors also found that native listeners were able to use these cues to assign the ambiguous words to their correct lexical categories. Then, -boundary cues may help native listeners to correctly analyze ambiguous sentences. Motivated by the French experiment results, we proposed two experiments in order to test the influence of prosody on syntactic analysis by BP adults. In the first experiment, a sentence-reading task, participants produced different prosodic patterns for ambiguous words (verb or adjective) in different syntactic structures. Duration, pitch and energy values of the segments around the-boundaries were measured and revealed that (i)-boundaries were marked by acoustical reliable cues; and (ii) the lexical categories N, V and ADJ have different behaviors in the prosodic structure. Figure A: Example of the Noun + ambiguous word - Adj [garota MUDA] (on the left) and V [garota] [MUDA…] (on the right).In the second experiment, listeners were asked to complete the auditory ambiguous sentences that were just cut after the target words (Eu acho que a menina LIMPA… . I think the clean girl…/the girl cleans…). Participants gave more verb responses in the Verb condition and more adjective responses in the Adjective condition. These results suggest that BP adults are able to use phonological phrase boundary cues to decide if an ambiguous word is a verb or an adjective and, then, to constrain syntactic analysis. We discuss the implication of these results for models of online syntactic analysis and language acquisition.Figure B: Experiment 2- Mean number of adjective and verb responses given to adjective and verb sentences (out of 4 possible responses for each sentence type).

Download Full-text

On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora

Speech and Computer - Lecture Notes in Computer Science ◽

10.1007/978-3-319-99579-3_27 ◽

2018 ◽

pp. 255-263

Author(s):

Markéta Jůzová

Keyword(s):

Boundary Detection ◽

Phrase Boundary ◽

Speech Corpora

Download Full-text

DNN based phrase boundary detection using knowledge-based features and feature representations from CNN

10.1109/ncc52529.2021.9530147 ◽

2021 ◽

Author(s):

Pavan J Kumar ◽

Chiranjeevi Yarra ◽

Prasanta Kumar Ghosh

Keyword(s):

Boundary Detection ◽

Phrase Boundary ◽

Feature Representations ◽

Knowledge Based

Download Full-text

Prosodic word and phrase boundary detection based on F0 contour analysis using empirical mode decomposition

2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) ◽

10.1109/icsda.2013.6709909 ◽

2013 ◽

Author(s):

Sudipta Acharya ◽

Shyamal Kumar Das Mandal

Keyword(s):

Empirical Mode Decomposition ◽

Boundary Detection ◽

Contour Analysis ◽

Phrase Boundary ◽

Prosodic Word ◽

Mode Decomposition

Download Full-text

A multi-pass linear fold algorithm for sentence boundary detection using prosodic cues

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing ◽

10.1109/icassp.2004.1326038 ◽

2004 ◽

Cited By ~ 2

Author(s):

Dagen Wang ◽

S.S. Narayanan

Keyword(s):

Boundary Detection ◽

Prosodic Cues ◽

Sentence Boundary

Download Full-text

Cross-language phrase boundary detection

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6639316 ◽

2013 ◽

Cited By ~ 4

Author(s):

Victor Soto ◽

Erica Cooper ◽

Andrew Rosenberg ◽

Julia Hirschberg

Keyword(s):

Boundary Detection ◽

Phrase Boundary ◽

Cross Language

Download Full-text

Investigating the Multimodal Nature of Human Communication

Journal of Psychophysiology ◽

10.1027/0269-8803.23.2.63 ◽

2009 ◽

Vol 23 (2) ◽

pp. 63-76 ◽

Cited By ~ 41

Author(s):

Silke Paulmann ◽

Sarah Jessen ◽

Sonja A. Kotz

Keyword(s):

Visual Information ◽

Empirical Studies ◽

Physical Property ◽

Event Related Potentials ◽

Accurate Information ◽

Human Communication ◽

Channel Condition ◽

Prosodic Cues ◽

Early Processing ◽

Related Potentials

The multimodal nature of human communication has been well established. Yet few empirical studies have systematically examined the widely held belief that this form of perception is facilitated in comparison to unimodal or bimodal perception. In the current experiment we first explored the processing of unimodally presented facial expressions. Furthermore, auditory (prosodic and/or lexical-semantic) information was presented together with the visual information to investigate the processing of bimodal (facial and prosodic cues) and multimodal (facial, lexic, and prosodic cues) human communication. Participants engaged in an identity identification task, while event-related potentials (ERPs) were being recorded to examine early processing mechanisms as reflected in the P200 and N300 component. While the former component has repeatedly been linked to physical property stimulus processing, the latter has been linked to more evaluative “meaning-related” processing. A direct relationship between P200 and N300 amplitude and the number of information channels present was found. The multimodal-channel condition elicited the smallest amplitude in the P200 and N300 components, followed by an increased amplitude in each component for the bimodal-channel condition. The largest amplitude was observed for the unimodal condition. These data suggest that multimodal information induces clear facilitation in comparison to unimodal or bimodal information. The advantage of multimodal perception as reflected in the P200 and N300 components may thus reflect one of the mechanisms allowing for fast and accurate information processing in human communication.

Download Full-text