Voice Production and Perception

Author(s):  
Roza G. Kamiloğlu ◽  
Disa A. Sauter

The voice is a prime channel of communication in humans and other animals. Voices convey many kinds of information, including physical characteristics like body size and sex, as well as providing cues to the vocalizing individual’s identity and emotional state. Vocalizations are produced by dynamic modifications of the physiological vocal production system. The source-filter theory explains how vocalizations are produced in two stages: (a) the production of a sound source in the larynx, and (b) the filtering of that sound by the vocal tract. This two-stage process largely applies to all primate vocalizations. However, there are some differences between the vocal production apparatus of humans as compared to nonhuman primates, such as the lower position of the larynx and lack of air sacs in humans. Thanks to our flexible vocal apparatus, humans can produce a range of different types of vocalizations, including spoken language, nonverbal vocalizations, whispering, and singing. A comprehensive understanding of vocal communication takes both production and perception of vocalizations into account. Internal processes are expressed in the form of specific acoustic patterns in the producer’s voice. In order to communicate information in vocalizations, those acoustic patterns must be acoustically registered by listeners via auditory perception mechanisms. Both production and perception of vocalizations are affected by psychobiological mechanisms as well as sociocultural factors. Furthermore, vocal production and perception can be impaired by a range of different disorders. Vocal production and hearing disorders, as well as mental disorders including autism spectrum disorder, depression, and schizophrenia, affect vocal communication.

Author(s):  
Marina Gilman

“Voice” is the complex interaction of the vocal mechanism with the rest of the body used to produce speech and song (melody, rhythm, and language) as well as so-called vegetative sounds, such as coughing, crying, screaming, and so on. The voice and vocal production for speech or singing is so much more than the lungs, larynx, and vocal tract depicted in many vocal pedagogy texts. Balance, as well as increased muscle tension of the neck, shoulders, and torso, can change the necessary finely tuned coordination of respiration, phonation, and resonance necessary for speaking, singing, and swallowing. Understanding of this exquisite mechanism has increased one-hundred-fold since the middle of the twentieth century due to improved technology and research opportunities in laryngeal imaging and acoustic analysis.


Animals ◽  
2018 ◽  
Vol 8 (10) ◽  
pp. 167 ◽  
Author(s):  
Anton Baotic ◽  
Maxime Garcia ◽  
Markus Boeckle ◽  
Angela Stoeger

African savanna elephants live in dynamic fission–fusion societies and exhibit a sophisticated vocal communication system. Their most frequent call-type is the ‘rumble’, with a fundamental frequency (which refers to the lowest vocal fold vibration rate when producing a vocalization) near or in the infrasonic range. Rumbles are used in a wide variety of behavioral contexts, for short- and long-distance communication, and convey contextual and physical information. For example, maturity (age and size) is encoded in male rumbles by formant frequencies (the resonance frequencies of the vocal tract), having the most informative power. As sound propagates, however, its spectral and temporal structures degrade progressively. Our study used manipulated and resynthesized male social rumbles to simulate large and small individuals (based on different formant values) to quantify whether this phenotypic information efficiently transmits over long distances. To examine transmission efficiency and the potential influences of ecological factors, we broadcasted and re-recorded rumbles at distances of up to 1.5 km in two different habitats at the Addo Elephant National Park, South Africa. Our results show that rumbles were affected by spectral–temporal degradation over distance. Interestingly and unlike previous findings, the transmission of formants was better than that of the fundamental frequency. Our findings demonstrate the importance of formant frequencies for the efficiency of rumble propagation and the transmission of information content in a savanna elephant’s natural habitat.


2020 ◽  
Vol 27 (2) ◽  
pp. 237-265 ◽  
Author(s):  
Roza G. Kamiloğlu ◽  
Agneta H. Fischer ◽  
Disa A. Sauter

AbstractResearchers examining nonverbal communication of emotions are becoming increasingly interested in differentiations between different positive emotional states like interest, relief, and pride. But despite the importance of the voice in communicating emotion in general and positive emotion in particular, there is to date no systematic review of what characterizes vocal expressions of different positive emotions. Furthermore, integration and synthesis of current findings are lacking. In this review, we comprehensively review studies (N = 108) investigating acoustic features relating to specific positive emotions in speech prosody and nonverbal vocalizations. We find that happy voices are generally loud with considerable variability in loudness, have high and variable pitch, and are high in the first two formant frequencies. When specific positive emotions are directly compared with each other, pitch mean, loudness mean, and speech rate differ across positive emotions, with patterns mapping onto clusters of emotions, so-called emotion families. For instance, pitch is higher for epistemological emotions (amusement, interest, relief), moderate for savouring emotions (contentment and pleasure), and lower for a prosocial emotion (admiration). Some, but not all, of the differences in acoustic patterns also map on to differences in arousal levels. We end by pointing to limitations in extant work and making concrete proposals for future research on positive emotions in the voice.


Author(s):  
MAK KABOUDAN ◽  
MARK CONOVER

Forecasts of the San Diego and San Francisco S&P/Case-Shiller Home Price Indices through December 2012 are obtained using a multi-agent system that utilizes January, 2002–June, 2011 data. Agents employ genetic programming (GP) and neural networks (NN) in a three-stage process to produce fits and forecasts. First, GP and NN compete to provide independent predictions. In the second stage, they cooperate by fitting the first-stage competitor's residuals. Outputs from the first two stages then become inputs to produce two final GP and NN outputs. The NN output from the third stage using the combined method produces improved forecasts over the 3-stage GP method as well as those produced by either method alone. The proposed methodology serves as an example of how combining more than one estimation/forecasting technique may lead to more accurate forecasts.


2005 ◽  
Vol 14 (3) ◽  
pp. 126-130 ◽  
Author(s):  
Klaus Zuberbühler

The anatomy of the nonhuman primate vocal tract is not fundamentally different from the human one. Notwithstanding, nonhuman primates are remarkably unskillful at controlling vocal production and at combining basic call units into more complex strings. Instead, their vocal behavior is linked to specific psychological states, which are evoked by events in their social or physical environment. Humans are the only primates that have evolved the ability to produce elaborate and willfully controlled vocal signals, although this may have been a fairly recent invention. Despite their expressive limitations, nonhuman primates have demonstrated a surprising degree of cognitive complexity when responding to other individuals' vocalizations, suggesting that, as recipients, crucial linguistic abilities are part of primate cognition. Pivotal aspects of language comprehension, particularly the ability to process semantic content, may thus be part of our primate heritage. The strongest evidence currently comes from Old World monkeys, but recent work indicates that these capacities may also be present in our closest relatives, the chimpanzees.


Author(s):  
Gillyanne Kayes

Key structural aspects of the vocal mechanism and the physiology of vocal function are presented and discussed in relation to the singing voice. Details of anatomical structure and physiological function are given for the regions of the vocal tract and respiratory system under the broad headings of respiration, phonation (the larynx), and resonation. Use of voice in singing is examined in terms of breath use, control of pitch, and loudness, and shaping of resonance for change of timbre. Key developmental stages during the lifecycle are given, including infancy, childhood, voice mutation in adolescence, and the impact of hormonal change on the voice. Differences between the genders in adulthood are discussed in the light of current research knowledge of voice.


Author(s):  
Johan Sundberg

The function of the voice organ is basically the same in classical singing as in speech. However, loud orchestral accompaniment has necessitated the use of the voice in an economical way. As a consequence, the vowel sounds tend to deviate considerably from those in speech. Male voices cluster formant three, four, and five, so that a marked peak is produced in spectrum envelope near 3,000 Hz. This helps them to get heard through a loud orchestral accompaniment. They seem to achieve this effect by widening the lower pharynx, which makes the vowels more centralized than in speech. Singers often sing at fundamental frequencies higher than the normal first formant frequency of the vowel in the lyrics. In such cases they raise the first formant frequency so that it gets somewhat higher than the fundamental frequency. This is achieved by reducing the degree of vocal tract constriction or by widening the lip and jaw openings, constricting the vocal tract in the pharyngeal end and widening it in the mouth. These deviations from speech cause difficulties in vowel identification, particularly at high fundamental frequencies. Actually, vowel identification is almost impossible above 700 Hz (pitch F5). Another great difference between vocal sound produced in speech and the classical singing tradition concerns female voices, which need to reduce the timbral differences between voice registers. Females normally speak in modal or chest register, and the transition to falsetto tends to happen somewhere above 350 Hz. The great timbral differences between these registers are avoided by establishing control over the register function, that is, over the vocal fold vibration characteristics, so that seamless transitions are achieved. In many other respects, there are more or less close similarities between speech and singing. Thus, marking phrase structure, emphasizing important events, and emotional coloring are common principles, which may make vocal artists deviate considerably from the score’s nominal description of fundamental frequency and syllable duration.


Sign in / Sign up

Export Citation Format

Share Document