scholarly journals Acoustic and auditory phonetics: the adaptive design of speech sound systems

2007 ◽  
Vol 363 (1493) ◽  
pp. 965-978 ◽  
Author(s):  
Randy L Diehl

Speech perception is remarkably robust. This paper examines how acoustic and auditory properties of vowels and consonants help to ensure intelligibility. First, the source–filter theory of speech production is briefly described, and the relationship between vocal-tract properties and formant patterns is demonstrated for some commonly occurring vowels. Next, two accounts of the structure of preferred sound inventories, quantal theory and dispersion theory, are described and some of their limitations are noted. Finally, it is suggested that certain aspects of quantal and dispersion theories can be unified in a principled way so as to achieve reasonable predictive accuracy.

2021 ◽  
Vol 15 ◽  
Author(s):  
Hung-Shao Cheng ◽  
Caroline A. Niziolek ◽  
Adam Buchwald ◽  
Tara McAllister

Several studies have demonstrated that individuals’ ability to perceive a speech sound contrast is related to the production of that contrast in their native language. The theoretical account for this relationship is that speech perception and production have a shared multimodal representation in relevant sensory spaces (e.g., auditory and somatosensory domains). This gives rise to a prediction that individuals with more narrowly defined targets will produce greater separation between contrasting sounds, as well as lower variability in the production of each sound. However, empirical studies that tested this hypothesis, particularly with regard to variability, have reported mixed outcomes. The current study investigates the relationship between perceptual ability and production ability, focusing on the auditory domain. We examined whether individuals’ categorical labeling consistency for the American English /ε/–/æ/ contrast, measured using a perceptual identification task, is related to distance between the centroids of vowel categories in acoustic space (i.e., vowel contrast distance) and to two measures of production variability: the overall distribution of repeated tokens for the vowels (i.e., area of the ellipse) and the proportional within-trial decrease in variability as defined as the magnitude of self-correction to the initial acoustic variation of each token (i.e., centering ratio). No significant associations were found between categorical labeling consistency and vowel contrast distance, between categorical labeling consistency and area of the ellipse, or between categorical labeling consistency and centering ratio. These null results suggest that the perception-production relation may not be as robust as suggested by a widely adopted theoretical framing in terms of the size of auditory target regions. However, the present results may also be attributable to choices in implementation (e.g., the use of model talkers instead of continua derived from the participants’ own productions) that should be subject to further investigation.


1982 ◽  
Vol 3 (3) ◽  
pp. 243-261 ◽  
Author(s):  
Amy Sheldon ◽  
Winifred Strange

ABSTRACTThis study examines the relationship between the production and perception of English /r/ and /l/ by native Japanese adults learning English in the United States. For some subjects, production of the contrast was more accurate than their perception of it, replicating and extending a previous finding reported by Goto (1971) in Japan. The difficulty in perception of the liquid contrast varied with its position in the word. Prevocalic /r/ and /l/ in consonant clusters yielded the greatest perceptual errors, while word-final liquids were accurately perceived. This pattern of errors is not predictable on the basis of contrastive phonological analysis, but might be the result of acoustic-phonetic factors. Implications for second language pedagogy are discussed.


2014 ◽  
Vol 24 (1) ◽  
pp. 7-20
Author(s):  
Brad H. Story

Models that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.


1989 ◽  
Vol 69 (2) ◽  
pp. 435-441 ◽  
Author(s):  
Linda I. Shuster ◽  
Robert Allen Fox

This study investigated the relationship between speech perception and speech production. An experimental technique called motor-motor adaptation was devised. Subjects produced a speech token repeatedly (20 to 40 repetitions), then produced a second token one time. These tokens all contained stop consonants and were subsequently analyzed for voice onset time. The results paralleled previous findings using the experimental procedure, perceptuomotor adaptation. The present study supports the notion of a perception-production link.


Author(s):  
Larissa Cristina Berti ◽  
Mayara Ferreira de Assis ◽  
Elissa Cremasco ◽  
Ana Cláudia Vieira Cardoso

Author(s):  
Isao Tokuda

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds. The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra. It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.


Sign in / Sign up

Export Citation Format

Share Document