scholarly journals Modeling the Development of Pronunciation in Infant Speech Acquisition

Motor Control ◽  
2011 ◽  
Vol 15 (1) ◽  
pp. 85-117 ◽  
Author(s):  
Ian S. Howard ◽  
Piers Messum

Pronunciation is an important part of speech acquisition, but little attention has been given to the mechanism or mechanisms by which it develops. Speech sound qualities, for example, have just been assumed to develop by simple imitation. In most accounts this is then assumed to be by acoustic matching, with the infant comparing his output to that of his caregiver. There are theoretical and empirical problems with both of these assumptions, and we present a computational model—Elija—that does not learn to pronounce speech sounds this way. Elija starts by exploring the sound making capabilities of his vocal apparatus. Then he uses the natural responses he gets from a caregiver to learn equivalence relations between his vocal actions and his caregiver’s speech. We show that Elija progresses from a babbling stage to learning the names of objects. This demonstrates the viability of a non-imitative mechanism in learning to pronounce.

1946 ◽  
Vol 11 (1) ◽  
pp. 2-2

In the article “Infant Speech Sounds and Intelligence” by Orvis C. Irwin and Han Piao Chen, in the December 1945 issue of the Journal, the paragraph which begins at the bottom of the left hand column on page 295 should have been placed immediately below the first paragraph at the top of the right hand column on page 296. To the authors we express our sincere apologies.


2018 ◽  
Vol 15 (2) ◽  
pp. 104-110 ◽  
Author(s):  
Shohei Kato ◽  
Akira Homma ◽  
Takuto Sakuma

Objective: This study presents a novel approach for early detection of cognitive impairment in the elderly. The approach incorporates the use of speech sound analysis, multivariate statistics, and data-mining techniques. We have developed a speech prosody-based cognitive impairment rating (SPCIR) that can distinguish between cognitively normal controls and elderly people with mild Alzheimer's disease (mAD) or mild cognitive impairment (MCI) using prosodic signals extracted from elderly speech while administering a questionnaire. Two hundred and seventy-three Japanese subjects (73 males and 200 females between the ages of 65 and 96) participated in this study. The authors collected speech sounds from segments of dialogue during a revised Hasegawa's dementia scale (HDS-R) examination and talking about topics related to hometown, childhood, and school. The segments correspond to speech sounds from answers to questions regarding birthdate (T1), the name of the subject's elementary school (T2), time orientation (Q2), and repetition of three-digit numbers backward (Q6). As many prosodic features as possible were extracted from each of the speech sounds, including fundamental frequency, formant, and intensity features and mel-frequency cepstral coefficients. They were refined using principal component analysis and/or feature selection. The authors calculated an SPCIR using multiple linear regression analysis. Conclusion: In addition, this study proposes a binary discrimination model of SPCIR using multivariate logistic regression and model selection with receiver operating characteristic curve analysis and reports on the sensitivity and specificity of SPCIR for diagnosis (control vs. MCI/mAD). The study also reports discriminative performances well, thereby suggesting that the proposed approach might be an effective tool for screening the elderly for mAD and MCI.


1974 ◽  
Vol 17 (3) ◽  
pp. 352-366 ◽  
Author(s):  
Lorraine M. Monnin ◽  
Dorothy A. Huntington

Normal-speaking and speech-defective children were compared on a speech-sound identification task which included sounds the speech-defective subjects misarticulated and sounds they articulated correctly. The identification task included four tests: [r]-[w] contrasts, acoustically similar contrasts, acoustically dissimilar contrasts, and vowel contrasts. The speech sounds were presented on a continuum from undistorted signals to severely distorted speech signals under conditions which have caused confusion among adults. Subjects included 15 normal-speaking kindergarten children, 15 kindergarten children with defective [r]s, and 15 preschool-age children. The procedure employed was designed to test, in depth, each sound under study and to minimize extraneous variables. Speech-sound identification ability of speech-defective subjects was found to be specific rather than a general deficiency, indicating a positive relationship between production and identification ability.


Author(s):  
Aidan Kehoe ◽  
Flaithri Neff ◽  
Ian Pitt

There are numerous challenges to accessing user assistance information in mobile and ubiquitous computing scenarios. For example, there may be little-or-no display real estate on which to present information visually, the user’s eyes may be busy with another task (e.g., driving), it can be difficult to read text while moving, etc. Speech, together with non-speech sounds and haptic feedback can be used to make assistance information available to users in these situations. Non-speech sounds and haptic feedback can be used to cue information that is to be presented to users via speech, ensuring that the listener is prepared and that leading words are not missed. In this chapter, we report on two studies that examine user perception of the duration of a pause between a cue (which may be a variety of non-speech sounds, haptic effects or combined non-speech sound plus haptic effects) and the subsequent delivery of assistance information using speech. Based on these user studies, recommendations for use of cue pause intervals in the range of 600 ms to 800 ms are made.


1990 ◽  
Vol 55 (4) ◽  
pp. 779-798 ◽  
Author(s):  
Ann Bosma Smit ◽  
Linda Hand ◽  
J. Joseph Freilinger ◽  
John E. Bernthal ◽  
Ann Bird

The purpose of the Iowa Articulation Norms Project and its Nebraska replication was to provide normative information about speech sound acquisition in these two states. An assessment instrument consisting of photographs and a checklist form for narrow phonetic transcription was administered by school-based speech-language pathologists to stratified samples of children in the age range 3–9 years. The resulting data were not influenced by the demographic variables of population density (rural/urban), SES (based on parental education), or state of residence (Iowa/Nebraska); however, sex of the child exerted a significant influence in some of the preschool age groups. The criteria used to determine acceptability of a production appeared to influence outcomes for some speech sounds. Acquisition curves were plotted for individual phoneme targets or groups of targets. These curves were used to develop recommended ages of acquisition for the tested speech sounds, with recommendations based generally on a 90% level of acquisition. Special considerations were required for the phonemes /n s z/.


2011 ◽  
Vol 23 (4) ◽  
pp. 1003-1014 ◽  
Author(s):  
Ying Huang ◽  
Jingyu Li ◽  
Xuefei Zou ◽  
Tianshu Qu ◽  
Xihong Wu ◽  
...  

To discriminate and to recognize sound sources in a noisy, reverberant environment, listeners need to perceptually integrate the direct wave with the reflections of each sound source. It has been confirmed that perceptual fusion between direct and reflected waves of a speech sound helps listeners recognize this speech sound in a simulated reverberant environment with disrupting sound sources. When the delay between a direct sound wave and its reflected wave is sufficiently short, the two waves are perceptually fused into a single sound image as coming from the source location. Interestingly, compared with nonspeech sounds such as clicks and noise bursts, speech sounds have a much larger perceptual fusion tendency. This study investigated why the fusion tendency for speech sounds is so large. Here we show that when the temporal amplitude fluctuation of speech was artificially time reversed, a large perceptual fusion tendency of speech sounds disappeared, regardless of whether the speech acoustic carrier was in normal or reversed temporal order. Moreover, perceptual fusion of normal-order speech, but not that of time-reversed speech, was accompanied by increased coactivation of the attention-control-related, spatial-processing-related, and speech-processing-related cortical areas. Thus, speech-like acoustic carriers modulated by speech amplitude fluctuation selectively activate a cortical network for top–down modulations of speech processing, leading to an enhancement of perceptual fusion of speech sounds. This mechanism represents a perceptual-grouping strategy for unmasking speech under adverse conditions.


1945 ◽  
Vol 10 (4) ◽  
pp. 293-296 ◽  
Author(s):  
Orvis C. Irwin ◽  
Han Piao Chen
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document