Statistical Pronunciation Modeling for Non-Native Speech Processing

This study was designed to determine the nature and occurrence of hesitation phenomena in spontaneous speech of native and non-native speakers, and to determine whether and to what extent the hesitation phenomena normal in spontaneous speech pose perception problems for non-native speakers. A quantitative analysis reveals that hesitation phenomena are ubiquitous in both native and non-native speech production. A qualitative analysis based on a content-processing classification framework reveals the function of hesitations. Hesitations act as overt traces of prospective and retrospective speech-processing tasks which function to forestall errors, and to permit detection and repair of errors once they are committed. Hesitations are quality control devices; native and non-native speakers are highly successful utilizing them to forestall errors. However, hesitation phenomena clearly pose perception problems for non-native speakers who show little evidence of recognizing them as such. Like native speakers, non-native speakers produce hesitation phenomena. Unlike native speakers, who edit and filter out the hesitations they hear, non-native speakers attempt to assign meaning to speakers' faulty output or to parenthetical remarks. Hesitations are unpredictable in their frequency or occurrence; failure to provide training in these oral discourse features of connected speech may result in non-native speakers whose speech production vastly outstrips their perception.

Download Full-text

When One Person's Mistake Is Another's Standard Usage: The Effect of Foreign Accent on Syntactic Processing

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_00103 ◽

2012 ◽

Vol 24 (4) ◽

pp. 878-887 ◽

Cited By ~ 102

Author(s):

Adriana Hanulíková ◽

Petra M. van Alphen ◽

Merel M. van Goch ◽

Andrea Weber

Keyword(s):

Speech Processing ◽

Native Speaker ◽

Neural Correlates ◽

Syntactic Processing ◽

Foreign Accent ◽

Gender Agreement ◽

Accented Speech ◽

Grammatical Errors ◽

Integration Problem ◽

Native Speech

How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.

Download Full-text

Improving pronunciation modeling for non-native speech recognition

10.21437/interspeech.2008-495 ◽

2008 ◽

Author(s):

Tien-Ping Tan ◽

Laurent Besacier

Keyword(s):

Speech Recognition ◽

Pronunciation Modeling ◽

Native Speech

Download Full-text

Non‐native speech processing in children with phonological disorders.

The Journal of the Acoustical Society of America ◽

10.1121/1.4783558 ◽

2009 ◽

Vol 125 (4) ◽

pp. 2532-2532 ◽

Cited By ~ 1

Author(s):

Dongsun Yim

Keyword(s):

Speech Processing ◽

Phonological Disorders ◽

Native Speech

Download Full-text

Establishing the fluency gap between native and non-native-speech

Research in Language ◽

10.1515/rela-2015-0021 ◽

2015 ◽

Vol 13 (3) ◽

pp. 230-247 ◽

Cited By ~ 5

Author(s):

Ewa Guz

Keyword(s):

Second Language ◽

Empirical Evidence ◽

Speech Processing ◽

Mother Tongue ◽

Oral Performance ◽

Native Speech ◽

The Individual ◽

Temporal Measures ◽

L1 And L2 ◽

The Relationship

Although various dimensions of speech fluency have so far generated a great deal of research interest, very few accounts have tackled the issue of the relationship between L1 and L2 fluency. Also, little empirical evidence has been provided to support the claim that language users are more fluent in their mother tongue than in a foreign/second language. This study examines the fluency gap between L1 and L2 fluency using a battery of objectively quantifiable temporal measures of speed and breakdown fluency. It also attempts to identify those temporal fluency variables which are affected by the individual way of speaking rather than the degree of automatisation of speech processing and which underlie oral performance both in L1 and L2. The analysis draws on transcriptions of elicited speech samples in L1 (Polish) and L2 (English).

Download Full-text

ASR Systems as Models of Phonetic Category Perception in Adults

10.31234/osf.io/57d8x ◽

2017 ◽

Author(s):

Thomas Schatz ◽

Francis Bach ◽

Emmanuel Dupoux

Keyword(s):

Speech Recognition ◽

Speech Perception ◽

Automatic Speech Recognition ◽

Speech Processing ◽

Systematic Investigation ◽

Speech Sounds ◽

Phonetic Category ◽

Phonetic Perception ◽

Human Adults ◽

Native Speech

We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in pro- cessing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some well-documented effects in non-native phonetic perception. Beyond the immediate results, our methodology opens up the possibility of a more systematic investigation of phonetic category perception in humans.

Download Full-text

Pre-fortis shortening in Czech English: A production and reaction-time study

Research in Language ◽

10.1515/rela-2016-0005 ◽

2016 ◽

Vol 14 (1) ◽

pp. 1-14 ◽

Cited By ~ 3

Author(s):

Radek Skarnitzl ◽

Pavel Šturm

Keyword(s):

Reaction Time ◽

Target Word ◽

Speech Processing ◽

Reaction Times ◽

Large Degree ◽

Time Study ◽

Monitoring Task ◽

Accented Speech ◽

Frequency Of Use ◽

Native Speech

This study focuses on the production and perception of English words with a fortis vs. lenis obstruent in the syllable coda. The contrast is mostly cued by the duration of the preceding vowel, which is shorter before fortis than before lenis sounds in native speech. In the first experiment we analyzed the production of 10 Czech speakers of English and compared them to two native controls. The results showed that the Czech speakers did not sufficiently exploit duration to cue the identity of the word-final obstruent. In the second experiment we manipulated C and V durations in target words to transplant the native ratios onto the Czech-accented speech, enhancing the fortis–lenis contrast, and vice versa. 108 listeners took part in a word-monitoring task in which reaction times were measured. The hypothesized advantage to items in which the target word (with a fortis or lenis obstruent) was semantically congruent with the following context was not confirmed, and subsequent analyses showed that the words’ frequency of use and the collocations they enter into strongly affect speech processing and correlate to a large degree with the reaction times.

Download Full-text