The neural control of volitional vocal production—from speech to identity, from social meaning to song

The networks of cortical and subcortical fields that contribute to speech production have benefitted from many years of detailed study, and have been used as a framework for human volitional vocal production more generally. In this article, I will argue that we need to consider speech production as an expression of the human voice in a more general sense. I will also argue that the neural control of the voice can and should be considered to be a flexible system, into which more right hemispheric networks are differentially recruited, based on the factors that are modulating vocal production. I will explore how this flexible network is recruited to express aspects of non-verbal information in the voice, such as identity and social traits. Finally, I will argue that we need to widen out the kinds of vocal behaviours that we explore, if we want to understand the neural underpinnings of the true range of sound-making capabilities of the human voice. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’.

Download Full-text

Vocal modulation in human mating and competition

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0388 ◽

2021 ◽

Vol 376 (1840) ◽

Cited By ~ 1

Author(s):

Susan M. Hughes ◽

David A. Puts

Keyword(s):

Social Interactions ◽

Social Impact ◽

Social Contexts ◽

Future Studies ◽

Human Voice ◽

Vocal Behaviour ◽

Courtship Success ◽

Human Mating ◽

Voice Modulation ◽

Romantic Interest

The human voice is dynamic, and people modulate their voices across different social interactions. This article presents a review of the literature examining natural vocal modulation in social contexts relevant to human mating and intrasexual competition. Altering acoustic parameters during speech, particularly pitch, in response to mating and competitive contexts can influence social perception and indicate certain qualities of the speaker. For instance, a lowered voice pitch is often used to exert dominance, display status and compete with rivals. Changes in voice can also serve as a salient medium for signalling a person's attraction to another, and there is evidence to support the notion that attraction and/or romantic interest can be distinguished through vocal tones alone. Individuals can purposely change their vocal behaviour in attempt to sound more attractive and to facilitate courtship success. Several findings also point to the effectiveness of vocal change as a mechanism for communicating relationship status. As future studies continue to explore vocal modulation in the arena of human mating, we will gain a better understanding of how and why vocal modulation varies across social contexts and its impact on receiver psychology. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.

Download Full-text

Voice Pathology Identification using Deep Neural Networks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d5316.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 7447-7450

Keyword(s):

Deep Neural Networks ◽

External Factors ◽

Mel Frequency Cepstral Coefficients ◽

Vocal Cords ◽

Voice Change ◽

Human Voice ◽

Quality Of Voice ◽

The Voice ◽

Voice Modulation

The human voice construction is a complex biological mechanism capable of Changing pitch and volume. Some Internal or External factors frequently damage the vocal cords and change quality of voice or do some alteration in the voice modulation. The effects are reflected in expression of speech and understanding of information said by the person. So it is important to examine problem at early stages of voice change and overcome from this problem. ML play a major role in identifying whether voice is pathological or normal in nature. Voice features are extracted by Implementing Mel-frequency Cepstral Coefficients (MFCC) method, and examined on the Convolutional Neural Network (CNN) to identify the category of voice.

Download Full-text

Even violins can cry: specifically vocal emotional behaviours also drive the perception of emotions in non-vocal music

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0396 ◽

2021 ◽

Vol 376 (1840) ◽

Cited By ~ 1

Author(s):

D. Bedoya ◽

P. Arias ◽

L. Rachman ◽

M. Liuni ◽

C. Canonne ◽

...

Keyword(s):

Social Impact ◽

Computational Models ◽

Emotional Responses ◽

Vocal Music ◽

Singing Voice ◽

Theme Issue ◽

Human Voice ◽

Musical Background ◽

Vocal Tremor ◽

Voice Modulation

A wealth of theoretical and empirical arguments have suggested that music triggers emotional responses by resembling the inflections of expressive vocalizations, but have done so using low-level acoustic parameters (pitch, loudness, speed) that, in fact, may not be processed by the listener in reference to human voice. Here, we take the opportunity of the recent availability of computational models that allow the simulation of three specifically vocal emotional behaviours: smiling, vocal tremor and vocal roughness. When applied to musical material, we find that these three acoustic manipulations trigger emotional perceptions that are remarkably similar to those observed on speech and scream sounds, and identical across musician and non-musician listeners. Strikingly, this not only applied to singing voice with and without musical background, but also to purely instrumental material. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.

Download Full-text

The bouba/kiki effect is robust across cultures and writing systems

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0390 ◽

2021 ◽

Vol 377 (1841) ◽

Author(s):

Aleksandra Ćwiek ◽

Susanne Fuchs ◽

Christoph Draxler ◽

Eva Liina Asu ◽

Dan Dediu ◽

...

Keyword(s):

Social Impact ◽

Spoken Language ◽

Writing Systems ◽

Theme Issue ◽

Round Shape ◽

Crossmodal Correspondence ◽

Visual Shape ◽

Visual Properties ◽

The Voice ◽

Voice Modulation

The bouba/kiki effect—the association of the nonce word bouba with a round shape and kiki with a spiky shape—is a type of correspondence between speech sounds and visual properties with potentially deep implications for the evolution of spoken language. However, there is debate over the robustness of the effect across cultures and the influence of orthography. We report an online experiment that tested the bouba/kiki effect across speakers of 25 languages representing nine language families and 10 writing systems. Overall, we found strong evidence for the effect across languages, with bouba eliciting more congruent responses than kiki . Participants who spoke languages with Roman scripts were only marginally more likely to show the effect, and analysis of the orthographic shape of the words in different scripts showed that the effect was no stronger for scripts that use rounder forms for bouba and spikier forms for kiki . These results confirm that the bouba/kiki phenomenon is rooted in crossmodal correspondence between aspects of the voice and visual shape, largely independent of orthography. They provide the strongest demonstration to date that the bouba/kiki effect is robust across cultures and writing systems. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part II)’.

Download Full-text

Voice modulatory cues to structure across languages and species

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0393 ◽

2021 ◽

Vol 376 (1840) ◽

Cited By ~ 1

Author(s):

Theresa Matzinger ◽

W. Tecumseh Fitch

Keyword(s):

Social Impact ◽

Vocal Communication ◽

Structural Characteristics ◽

Key Factors ◽

Vocal Production ◽

Cognitive Constraints ◽

Vocal Signals ◽

Efficient Communication ◽

Shared Ancestry ◽

Voice Modulation

Voice modulatory cues such as variations in fundamental frequency, duration and pauses are key factors for structuring vocal signals in human speech and vocal communication in other tetrapods. Voice modulation physiology is highly similar in humans and other tetrapods due to shared ancestry and shared functional pressures for efficient communication. This has led to similarly structured vocalizations across humans and other tetrapods. Nonetheless, in their details, structural characteristics may vary across species and languages. Because data concerning voice modulation in non-human tetrapod vocal production and especially perception are relatively scarce compared to human vocal production and perception, this review focuses on voice modulatory cues used for speech segmentation across human languages, highlighting comparative data where available. Cues that are used similarly across many languages may help indicate which cues may result from physiological or basic cognitive constraints, and which cues may be employed more flexibly and are shaped by cultural evolution. This suggests promising candidates for future investigation of cues to structure in non-human tetrapod vocalizations. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.

Download Full-text

Voice modulation: from origin and mechanism to social impact

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0386 ◽

2021 ◽

Vol 376 (1840) ◽

Author(s):

Juan David Leongómez ◽

Katarzyna Pisanski ◽

David Reby ◽

Disa Sauter ◽

Nadine Lavan ◽

...

Keyword(s):

Social Impact ◽

Theme Issue ◽

Social Functions ◽

Factors Affecting ◽

Evolutionary Origins ◽

Vocal Control ◽

Species Comparisons ◽

Physical Traits ◽

The Voice ◽

Voice Modulation

Research on within-individual modulation of vocal cues is surprisingly scarce outside of human speech. Yet, voice modulation serves diverse functions in human and nonhuman nonverbal communication, from dynamically signalling motivation and emotion, to exaggerating physical traits such as body size and masculinity, to enabling song and musicality. The diversity of anatomical, neural, cognitive and behavioural adaptations necessary for the production and perception of voice modulation make it a critical target for research on the origins and functions of acoustic communication. This diversity also implicates voice modulation in numerous disciplines and technological applications. In this two-part theme issue comprising 21 articles from leading and emerging international researchers, we highlight the multidisciplinary nature of the voice sciences. Every article addresses at least two, if not several, critical topics: (i) development and mechanisms driving vocal control and modulation; (ii) cultural and other environmental factors affecting voice modulation; (iii) evolutionary origins and adaptive functions of vocal control including cross-species comparisons; (iv) social functions and real-world consequences of voice modulation; and (v) state-of-the-art in multidisciplinary methodologies and technologies in voice modulation research. With this collection of works, we aim to facilitate cross-talk across disciplines to further stimulate the burgeoning field of voice modulation. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.

Download Full-text

The Auto-Tuned Self: Modulating Voice and Gender in Digital Media Ecologies

Camera Obscura Feminism Culture and Media Studies ◽

10.1215/02705346-9052802 ◽

2021 ◽

Vol 36 (2) ◽

pp. 65-97

Author(s):

Lisa Åkervall

Keyword(s):

Digital Media ◽

Medical Systems ◽

Human Voice ◽

Viral Video ◽

Point Of Entry ◽

History Of ◽

And Gender ◽

And Control ◽

The Voice ◽

Voice Modulation

Abstract This essay takes the auto-tuned viral video “Can't Hug Every Cat” as a point of entry for a broader analysis of how modulation decisively shapes politics, aesthetics, and gendering in contemporary digital ecologies. It uncovers how the exaggerated exhibitions of feminine vocal modulation in “Can't Hug Every Cat” entangle with generational feminist anxieties over gendered forms of articulation such as “sexy baby voice” and “upspeak.” It argues that the problematic of the modulated voice is both technologically and thematically central to political, technological, aesthetic, and gendered genealogies of media-technical modulation. The modulated voice given such extraordinary staging in “Can't Hug Every Cat” is therefore restored to the longer history of voice modulation, which is itself closely tied to the rise of control societies and digital media. In this perspective, techniques of voice modulation and social modulation are tandem technologies. The voice modulation that has figured prominently in media cultures in recent decades—from the music of Cher to T-Pain and beyond—is not merely a consequence of digital media and control societies but is also integral to their conditions of possibility. In this light, the rise of technologies for the modulation of the human voice since the nineteenth century is intertwined with the rise of new economic, political, and medical systems of control.

Download Full-text

Singing and Emotion

The Oxford Handbook of Singing ◽

10.1093/oxfordhb/9780199660773.013.006 ◽

2014 ◽

pp. 296-314

Author(s):

Eduardo Coutinho ◽

Klaus R. Scherer ◽

Nicola Dibben

Keyword(s):

Emotional Expression ◽

Affective State ◽

Acoustic Cues ◽

Singing Voice ◽

Vocal Production ◽

Human Voice ◽

Interdisciplinary Approaches ◽

Artistic Interpretation ◽

Expressed Emotions ◽

The Voice

In this chapter the authors discuss the emotional power of the singing voice. The chapter begins by providing an overview of the process of externalization of emotions by the human voice. Then, the authors discuss some fundamental determinants of emotional expression in singing, namely the ‘emotional script’, the artistic interpretation, and the singer’s affective state. Next, they describe the manner in which expressed emotions are encoded in the voice by singers and recognized by listeners, and compare it with vocal expression in everyday life. Finally, they identify various methodologies that can enhance understanding of the physiology of vocal production and the acoustic cues fundamental to perception and production of expressive sung performance. The authors propose that the knowledge gained from application of these methodologies can inform singing practice, and that interdisciplinary approaches and cooperation are central aspects of a fruitful and sustainable study of the expressive powers of the singing voice.

Download Full-text

Emotional authenticity modulates affective and social trait inferences from voices

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0402 ◽

2021 ◽

Vol 376 (1840) ◽

Cited By ~ 1

Author(s):

Ana P. Pinheiro ◽

Andrey Anikin ◽

Tatiana Conde ◽

João Sarzedas ◽

Sinead Chen ◽

...

Keyword(s):

Mixed Models ◽

Social Impact ◽

Communication Studies ◽

Acoustic Features ◽

Spectral Variability ◽

Communicative Acts ◽

Human Voice ◽

Trait Inferences ◽

Voice Modulation ◽

Social Trait

The human voice is a primary tool for verbal and nonverbal communication. Studies on laughter emphasize a distinction between spontaneous laughter, which reflects a genuinely felt emotion, and volitional laughter, associated with more intentional communicative acts. Listeners can reliably differentiate the two. It remains unclear, however, if they can detect authenticity in other vocalizations, and whether authenticity determines the affective and social impressions that we form about others. Here, 137 participants listened to laughs and cries that could be spontaneous or volitional and rated them on authenticity, valence, arousal, trustworthiness and dominance. Bayesian mixed models indicated that listeners detect authenticity similarly well in laughter and crying. Speakers were also perceived to be more trustworthy, and in a higher arousal state, when their laughs and cries were spontaneous. Moreover, spontaneous laughs were evaluated as more positive than volitional ones, and we found that the same acoustic features predicted perceived authenticity and trustworthiness in laughter: high pitch, spectral variability and less voicing. For crying, associations between acoustic features and ratings were less reliable. These findings indicate that emotional authenticity shapes affective and social trait inferences from voices, and that the ability to detect authenticity in vocalizations is not limited to laughter. This article is part of the theme issue ‘Voice modulation: from origin and mechanism to social impact (Part I)’.

Download Full-text

Extraordinary voices: Helen Keller, music and the limits of oralism

Journal of Interdisciplinary Voice Studies ◽

10.1386/jivs_00002_1 ◽

2019 ◽

Vol 4 (2) ◽

pp. 139-156

Author(s):

Michael Accinno

Keyword(s):

World View ◽

Musical Culture ◽

Vocal Production ◽

Cultural Logic ◽

Human Voice ◽

Shared Identity ◽

Helen Keller ◽

Voice Teacher ◽

Musical Practices ◽

Opera Singers

Abstract This article examines iconic American deafblind writer Helen Keller's entræ#169;e into musical culture, culminating in her studies with voice teacher Charles A. White. In 1909, Keller began weekly lessons with White, who deepened her understanding of breathing and vocal production. Keller routinely made the acquaintance of opera singers in the 1910s and the 1920s, including sopranos Georgette Leblanc and Minnie Saltzman-Stevens, and tenor Enrico Caruso. Guided by the cultural logic of oralism, Keller nurtured a lively interest in music throughout her life. Although a voice-centred world-view enhanced Keller's cultural standing among hearing Americans, it did little to promote the growth of a shared identity rooted in deaf or deafblind experience. The subsequent growth of Deaf culture challenges us to reconsider the limits of Keller's musical practices and to question anew her belief in the extraordinary power of the human voice.

Download Full-text