Statistical Pronunciation Modeling for Non-Native Speech Processing

Author(s):  
Rainer E. Gruhn ◽  
Wolfgang Minker ◽  
Satoshi Nakamura
2000 ◽  
Vol 182 (3) ◽  
pp. 72-97 ◽  
Author(s):  
Marnie Reed

This study was designed to determine the nature and occurrence of hesitation phenomena in spontaneous speech of native and non-native speakers, and to determine whether and to what extent the hesitation phenomena normal in spontaneous speech pose perception problems for non-native speakers. A quantitative analysis reveals that hesitation phenomena are ubiquitous in both native and non-native speech production. A qualitative analysis based on a content-processing classification framework reveals the function of hesitations. Hesitations act as overt traces of prospective and retrospective speech-processing tasks which function to forestall errors, and to permit detection and repair of errors once they are committed. Hesitations are quality control devices; native and non-native speakers are highly successful utilizing them to forestall errors. However, hesitation phenomena clearly pose perception problems for non-native speakers who show little evidence of recognizing them as such. Like native speakers, non-native speakers produce hesitation phenomena. Unlike native speakers, who edit and filter out the hesitations they hear, non-native speakers attempt to assign meaning to speakers' faulty output or to parenthetical remarks. Hesitations are unpredictable in their frequency or occurrence; failure to provide training in these oral discourse features of connected speech may result in non-native speakers whose speech production vastly outstrips their perception.


2012 ◽  
Vol 24 (4) ◽  
pp. 878-887 ◽  
Author(s):  
Adriana Hanulíková ◽  
Petra M. van Alphen ◽  
Merel M. van Goch ◽  
Andrea Weber

How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.


2015 ◽  
Vol 13 (3) ◽  
pp. 230-247 ◽  
Author(s):  
Ewa Guz

Although various dimensions of speech fluency have so far generated a great deal of research interest, very few accounts have tackled the issue of the relationship between L1 and L2 fluency. Also, little empirical evidence has been provided to support the claim that language users are more fluent in their mother tongue than in a foreign/second language. This study examines the fluency gap between L1 and L2 fluency using a battery of objectively quantifiable temporal measures of speed and breakdown fluency. It also attempts to identify those temporal fluency variables which are affected by the individual way of speaking rather than the degree of automatisation of speech processing and which underlie oral performance both in L1 and L2. The analysis draws on transcriptions of elicited speech samples in L1 (Polish) and L2 (English).


2017 ◽  
Author(s):  
Thomas Schatz ◽  
Francis Bach ◽  
Emmanuel Dupoux

We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in pro- cessing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some well-documented effects in non-native phonetic perception. Beyond the immediate results, our methodology opens up the possibility of a more systematic investigation of phonetic category perception in humans.


2016 ◽  
Vol 14 (1) ◽  
pp. 1-14 ◽  
Author(s):  
Radek Skarnitzl ◽  
Pavel Šturm

This study focuses on the production and perception of English words with a fortis vs. lenis obstruent in the syllable coda. The contrast is mostly cued by the duration of the preceding vowel, which is shorter before fortis than before lenis sounds in native speech. In the first experiment we analyzed the production of 10 Czech speakers of English and compared them to two native controls. The results showed that the Czech speakers did not sufficiently exploit duration to cue the identity of the word-final obstruent. In the second experiment we manipulated C and V durations in target words to transplant the native ratios onto the Czech-accented speech, enhancing the fortis–lenis contrast, and vice versa. 108 listeners took part in a word-monitoring task in which reaction times were measured. The hypothesized advantage to items in which the target word (with a fortis or lenis obstruent) was semantically congruent with the following context was not confirmed, and subsequent analyses showed that the words’ frequency of use and the collocations they enter into strongly affect speech processing and correlate to a large degree with the reaction times.


Sign in / Sign up

Export Citation Format

Share Document