The Oxford Handbook of Voice Perception
Latest Publications


TOTAL DOCUMENTS

40
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By Oxford University Press

9780198743187

Author(s):  
Natacha Paquette ◽  
Emmanuelle Dionne-Dostie ◽  
Maryse Lassonde ◽  
Anne Gallagher

Human voice perception plays a significant role in day-to-day interactions. However, little is known about how newborns and infants perceive and process this information. Yet the ability to perceive vocal cues is crucial, not only for speech and language learning, but also for the development of key social skills such as perceiving other people’s emotions. It is therefore important to understand how, typically, developing infants perceive and process this information in the first few months of life. The aim of this chapter is to provide a better understanding of the early development of these abilities as well as an overview of the key recent behavioural and neuroimaging studies in fetuses, newborns, and infants. The chapter describes and discusses how newborns and infants perceive human voices; how they extract and learn social cues from vocalizations; and how they use this information to learn language.


Author(s):  
Yuanyuan Wang ◽  
Derek M. Houston ◽  
Amanda Seidl

Language acquisition is a complex process that involves an interaction between learning mechanisms and the input to the child. An important component of infants’ input is infant-directed speech (IDS)—a unique speech register that caregivers use when talking to infants. IDS differs from adult-directed speech (ADS) in a variety of dimensions. This chapter examines empirical research on the acoustic properties of IDS and the role that IDS may play in supporting infant language learning. Taking the discussion of IDS function in language development to the next level, this chapter further discusses the underlying mechanisms of IDS to promote language learning and caregivers’ intentions to use this speech register. Theoretical and practical implications of this body of work are discussed and areas for future research are highlighted.


Author(s):  
David I. Leitman ◽  
Sarah M. Haigh

Communication of social-affective intent through vocal modulation (prosody) has received increasing recent attention from cognitive neuroscientists. Clinically, dysprosodia is a cardinal feature of schizophrenia and may also be present in bipolar disorder. This chapter summarizes the state of knowledge regarding schizophrenia and bipolar dysprosodia, examining how it is measured and the neural mechanisms that underlie its disturbance. The authors argue that in schizophrenia, rather than reflecting generalized emotional dysfunction, affective prosody deficits are better explained from an information theoretical perspective of impaired audio-linguistic signal processing (ALSP) beginning with basic impairments to simple pitch perception that, along with higher-order cognitive impairments, generate dysprosodia. The ALSP model engenders specific theoretical and clinical implications, which the chapter also details. Finally, the chapter outlines the limitations of the ALSP model and current approaches that examine dysprosdia in single individuals, advocating that future research must study prosody within a communication and linguistic perspective that examines interpersonal communications.


Author(s):  
Bernd J. Kröger

This chapter outlines a comprehensive neurocomputational model of voice and speech perception based on (i) already established computational models, as well as on (ii) neurophysiological data of the underlying neural processes. Neurocomputational models of speech perception comprise auditory as well as cognitive modules, in order to extract sound features as well as linguistic information (linguistic content). A model of voice and speech perception in addition needs to process paralinguistic information like gender, age, emotional or affective state of speaker, etc. It is argued here that modules of a neurocomputational model of voice and speech perception need to interact with modules which go beyond unimodal auditory processing because, for example, processing of paralinguistic information is closely related to such as visual facial perception. Thus, this chapter describes neural modelling of voice and speech perception in relation to general communication and social-interaction processes, which makes it necessary to develop a hypermodal processing approach.


Author(s):  
Benjamin Kreifelts ◽  
Thomas Ethofer

More often than not, emotion perception is a process guided by several sensory channels, accompanied by multimodal integration of emotional information. This process appears vital for effective social communication. This chapter provides an overview of recent studies describing the crossmodal integration of non-verbal emotional cues communicated via voices, faces, and bodies. The first parts of the chapter deal with the behavioural and neural correlates of multimodal integration in the healthy population using psychophysiological, electrophysiological, and neuroimaging measures, highlighting a network of brain areas involved in this process and discussing different methodological approaches. The final parts of the chapter, in contrast, are dedicated to the alterations of the multisensory integration of non-verbal emotional signals in states of psychiatric disease, with the main focus on schizophrenia and autism spectrum disorders.


Author(s):  
Samantha Carouso Peck ◽  
Michael H. Goldstein

The social environment plays an important role in vocal development. In songbirds, social interactions that promote vocal learning are often characterized by contingent responses of adults to early, immature vocalizations. Parallel processes have been discovered in the early speech development of human infants. Why does contingent social feedback facilitate vocal learning so effectively? Answers may be found by connecting the neural mechanisms of vocal learning and control with those involved in processing social reward. This chapter extends the idea of Newman’s social behaviour network, a tightly interconnected system of limbic areas across which social behaviour and motivation are distributed, to an avian social/vocal control network. It explores anatomical and functional overlaps between song circuitry and social-motivational circuitry, describing how circuitry linking basal ganglia with cortical areas serves to integrate social reward with vocal control and may underlie socially guided vocal learning. In species that have evolved socially guided vocal learning, a unique link has been forgedbetween social circuitry and vocal learning systems, such that learning is driven by social motivation.


Author(s):  
Claudia Roswandowitz ◽  
Corrina Maguinness ◽  
Katharina von Kriegstein

The voice contains elementary social communication cues, conveying speech, as well as paralinguistic information pertaining to the emotional state and the identity of the speaker. In contrast to vocal-speech and vocal-emotion processing, voice-identity processing has been less explored. This seems surprising, given the day-to-day significance of person recognition by voice. A valuable approach to unravel how voice-identity processing is accomplished is to investigate people who have a selective deficit in recognizing voices. Such a deficit has been termed phonagnosia. This chapter provides a systematic overview of studies on phonagnosia and how they relate to current neurocognitive models of person recognition. It reviews studies that have characterized people who suffer from phonagnosia following brain damage (i.e. acquired phonagnosia) and also studies which have examined phonagnosia cases without apparent brain lesion (i.e. developmental phonagnosia). Based on the reviewed literature, the chapter emphasizes the need for a careful behavioural characterization of phonagnosia cases by taking into consideration the multistage nature of voice-identity processing and the resulting behavioural phonagnosia subtypes.


Author(s):  
Klaus R. Scherer

Starting with evolutionary considerations, this chapter provides a comprehensive overview of vocal emotion communication. On the production/encoding side, the effects of the physiological changes accompanying different emotions is described, highlighting the functional aspects of the acoustic patterns that are determined by emotion-antecedent appraisals and the consequent behavioural tendencies. Special efforts are made to examine the stability of the acoustic patterning for different emotions based on the available reports in the literature, with some attention to the underlying phonatory-articulatory mechanisms. A brief excursion examines the acoustic parameters characterizing emotions in singing. Next, the transmission of vocal signals from sender/encoder to receiver/decoder is briefly described and the literature on emotion inference by receivers (vocal emotion recognition) is reviewed. In this context, the use of path models to obtain a comprehensive investigation of the communication process as a whole is discussed and illustrated. The important issue of language and cultural differences in vocal emotion encoding and decoding is considered in the light of recent evidence. The discussion of applied aspects of the acoustic analysis of vocal emotion expression and recognition concludes the chapter. Specific attention is paid to the role of voice analysis in clinical diagnostics, for example in the case of depression, and for the detection of stress.


Author(s):  
Maximilian Schmitt ◽  
Björn W. Schuller

Machines are able to obtain rich information from the human voice with a certain reliability. This can comprise information about the affective or mental state, but also traits of the speaker. This chapter introduces all the different technical steps needed in such intelligent voice analysis. Typically, the first step involves extraction of meaningful acoustic features, which are then transformed into a suitable representation. The acoustic information can be augmented by linguistic features originating from a speech-to-text transcription. The features are finally decoded on different levels using machine-learning methods. Recently, ‘deep learning’ has received growing interest, where deep artificial neural networks are used to decode the information. From this, end-to-end learning has evolved, where even the feature extraction step is learned seamlessly, through to the decoding step, mimicking the recognition process in the human brain. Subsequent to the description of according and further frequently encountered methods, the chapter concludes with some future perspective.


Author(s):  
Sarah Stevenage

Are we reliable earwitnesses? Imagine a telephone call in which all you hear is the single word ‘Hello … ?’. From that single word you can tell a considerable amount about the caller. You can tell their gender and their likely age range. You can detect a rising intonation that indicates uncertainty or a question. You can discern their accent and thus perhaps their nationality. Last, but most important, you perhaps can tell their identity. Voice perception is a highly sophisticated ability, drawing on numerous characteristics of the voice signal and yielding numerous judgements or estimations about the speaker. This chapter focuses on the reliability of those voice-perception abilities, with the express purpose of being able to establish our capacity as earwitnesses. Across this chapter, the latest scientific findings will be reviewed, allowing identification of the decisions that earwitnesses will be good at, and the decisions that may be less reliable.


Sign in / Sign up

Export Citation Format

Share Document