Talker and accent familiarity yield advantages for voice identity perception: a voice sorting study

Familiar and unfamiliar voice perception are often understood as being distinct from each other. For identity perception, theoretical work has proposed that listeners use acoustic information in different ways to perceive identity from familiar and unfamiliar voices: Unfamiliar voices are thought to be processed based on close comparisons of acoustic properties, while familiar voices are processed based on diagnostic acoustic features that activate a stored person-specific representation of that voice. To date no empirical study has directly examined whether and how familiar and unfamiliar listeners differ in their use of acoustic information for identity perception. Here, we tested this theoretical claim by linking listeners’ judgements in voice identity tasks to complex acoustic representations—spectral similarity of the heard voice recordings. Participants (N=150) who were either familiar or unfamiliar with a set of voices completed an identity discrimination task (Experiment 1) or an identity sorting task (Experiment 2). In both experiments, identity judgements for familiar and unfamiliar voices alike were guided by spectral similarity: Pairs of recordings with greater acoustic similarity were more likely to be perceived as belonging to the same voice identity. However, while there were no differences in how familiar and unfamiliar listeners used acoustic information for identity discrimination, differences were apparent for identity sorting. Our study therefore challenges proposals that view familiar and unfamiliar voice perception as being at all times distinct and suggests a critical role of the listening situation in which familiar and unfamiliar voices are evaluated.

Download Full-text

Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices

Memory & Cognition ◽

10.3758/s13421-021-01198-7 ◽

2021 ◽

Author(s):

Nadine Lavan ◽

Harriet M. J. Smith ◽

Carolyn McGettigan

Keyword(s):

Sorting Task ◽

Matching Task ◽

Mutual Benefit ◽

Sources Of Information ◽

Visual Training ◽

Voice Perception ◽

Identity Information ◽

Identity Matching ◽

Unfamiliar Face ◽

Identity Perception

AbstractUnimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.

Download Full-text

Unimodal and cross-modal identity judgements using an audiovisual sorting task: Evidence for independent processing of faces and voices

10.31234/osf.io/y5zbh ◽

2020 ◽

Author(s):

Nadine Lavan ◽

Harriet M J Smith ◽

Carolyn McGettigan

Keyword(s):

Sorting Task ◽

Matching Task ◽

Mutual Benefit ◽

Sources Of Information ◽

Voice Perception ◽

Identity Matching ◽

Identity Perception

Unimodal and cross-modal information provided by faces and voices can contribute to identity percepts. To examine how these unimodal and cross-modal sources of information interact, we devised a novel audiovisual identity sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting accuracy: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel-level or below. In Experiment 1, we contextualised performance in our novel audiovisual sorting task by comparing it to a traditional identity matching task. Here we found that unimodal and cross-modal identity perception were more accurate in the matching task. In Experiment 2, we separated unimodal from cross-modal sorting, which led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. Finally, in Experiment 3 we explored the effect of minimal audiovisual training: Participants were shown an audiovisual clip of the two identities in conversation prior to completing the sorting task. This minimal training led to small but non-significant improvements in accuracy for both unimodal and cross-modal sorting. Our results indicate that, for unfamiliar people, face and voice perception operate relatively independently with no evidence of mutual benefit. We also show that extracting reliable redundant cross-modal information for identity judgements is challenging.

Download Full-text

Breaking voice identity perception: Expressive voices are more confusable for listeners

Quarterly Journal of Experimental Psychology ◽

10.1177/1747021819836890 ◽

2019 ◽

Vol 72 (9) ◽

pp. 2240-2248 ◽

Cited By ~ 8

Author(s):

Nadine Lavan ◽

Luke FK Burston ◽

Paayal Ladwa ◽

Siobhan E Merriman ◽

Sarah Knight ◽

...

Keyword(s):

The Other ◽

Acoustic Properties ◽

Breaking Bad ◽

Human Voice ◽

Complex Interactions ◽

Flexible Instrument ◽

Vocal Signals ◽

Identity Perception ◽

Tv Show ◽

Identity Processing

The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing.

Download Full-text

How many voices did you hear? Natural variability disrupts identity perception in unfamiliar listeners.

10.31234/osf.io/cye6t ◽

2018 ◽

Cited By ~ 4

Author(s):

Nadine Lavan ◽

Luke Burston ◽

Lucia Garrido

Keyword(s):

Theoretical Models ◽

Natural Variability ◽

Striking Feature ◽

Voice Perception ◽

Single Identity ◽

Identity Perception ◽

Tv Show

Within-person variability is a striking feature of human voices: our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech). When perceiving speaker identities, listeners therefore need to not only "tell people apart" (perceiving exemplars from two different speakers as separate identities) but also "tell people together" (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within-person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with the show, sorted naturally-varying voice clips from 2 speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in "telling people together". Our study highlights within-person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re-evaluation of theoretical models to account for natural variability during identity perception.

Download Full-text

Flexible voices: identity perception from variable vocal signals

10.31234/osf.io/pczvm ◽

2017 ◽

Cited By ~ 1

Author(s):

Nadine Lavan ◽

A Mike Burton ◽

Sophie K Scott ◽

Carolyn McGettigan

Keyword(s):

Essential Feature ◽

Diagnostic Information ◽

Vocal Cues ◽

Vocal Signals ◽

Person Identity ◽

Identity Perception ◽

Identity Processing

Human voices are extremely variable: The same person can sound very different depending on whether they are speaking, laughing, shouting or whispering. In order to successfully recognise someone from their voice, a listener needs to be able to generalise across these different vocal signals ('telling people together'). However, in most studies of voice identity processing to date, the substantial within-person variability has been eliminated through the use of highly controlled stimuli, thus focussing on how we tell people apart. We argue that this obscures our understanding of voice identity processing by controlling away an essential feature of vocal stimuli that may include diagnostic information. In this paper, we propose that we need to extend the focus of voice identity research to account for both 'telling people together' as well as 'telling people apart'. That is, we must account for whether, and to what extent, listeners can overcome within-person variability to obtain a stable percept of person identity from vocal cues. To do this, our theoretical and methodological frameworks need to be adjusted to explicitly include the study of within-person variability.

Download Full-text

Facial expression processing in developmental prosopagnosia

10.26686/wgtn.17068415 ◽

2021 ◽

Author(s):

◽

Lauren Clare Bell

Keyword(s):

Facial Expression ◽

Expression Recognition ◽

Facial Identity ◽

Video Clips ◽

Core Deficit ◽

Identity Perception ◽

Expression Processing ◽

Sequential Matching ◽

Identity Processing ◽

Inversion Effects

<p>Individuals with developmental prosopagnosia experience lifelong deficits recognising facial identity, but whether their ability to process facial expression is also impaired is unclear. Addressing this issue is key for understanding the core deficit in developmental prosopagnosia, and for advancing knowledge about the mechanisms and development of normal face processing. In this thesis, I report two online studies on facial expression processing with large samples of prosopagnosics. In Study 1, I compared facial expression and facial identity perception in 124 prosopagnosics and 133 controls. I used three perceptual tasks including simultaneous matching, sequential matching, and sorting. I also measured inversion effects to examine whether prosopagnosics rely on typical face mechanisms. Prosopagnosics showed subtle deficits with facial expression, but they performed worse with facial identity. Prosopagnosics also showed reduced inversion effects for facial identity but normal inversion effects for facial expression, suggesting they use atypical mechanisms for facial identity but normal mechanisms for facial expression. In Study 2, I extended the findings of Study 1 by assessing facial expression recognition in 78 prosopagnosics and 138 controls. I used four labelling tasks that varied on whether the facial expressions were basic (e.g., happy) or complex (e.g., elated), and whether they were displayed via static (i.e., images) or dynamic (i.e., video clips) stimuli. Prosopagnosics showed subtle deficits with basic expressions but performed normally with complex expressions. Further, prosopagnosics did not show reduced inversion effects for both types of expressions, suggesting they use similar recognition mechanisms as controls. Critically, the subtle expression deficits that prosopagnosics showed in both studies can be accounted for by autism traits, suggesting that expression deficits are not a feature of prosopagnosia per se. I also provide estimates of the prevalence of deficits in facial expression perception (7.70%) and recognition (2.56% - 5.13%) in prosopagnosia, both of which suggest that facial expression processing is normal in the majority of prosopagnosics. Overall, my thesis demonstrates that facial expression processing is not impaired in developmental prosopagnosia, and suggests that facial expression and facial identity processing rely on separate mechanisms that dissociate in development.</p>

Download Full-text

Toward a unified account of person perception from familiar and unfamiliar voices

10.31234/osf.io/shxa6 ◽

2019 ◽

Author(s):

Nadine Lavan ◽

Carolyn McGettigan

Keyword(s):

Person Perception ◽

Voice Recognition ◽

Familiar Voice ◽

Person Characteristics ◽

The Rich ◽

Identity Perception ◽

First Time ◽

Voice Processing ◽

Identity Processing ◽

Specific Identity

When we hear a voice, we instantly form rich impressions of the person it belongs to – whether we are familiar with this voice or whether we are hearing it for the first time. Despite the rich impressions we can form of both familiar and unfamiliar voices, current models of voice processing primarily focus on familiar voice identity perception only and do not explicitly account for the processing of unfamiliar voices. Where unfamiliar identity processing is described, it tends to be in the context of specific identity perception tasks, such that the extant literature is largely built on a distinction between familiar voice recognition and unfamiliar voice discrimination. We argue that the current focus of the literature is too narrow in its strong emphasis on identity-specific perception, and does not adequately reflect person perception from voices beyond experimental tasks. Here, we propose a broader, unified account of person perception from both familiar and unfamiliar voices. We suggest that listeners routinely perceive all person characteristics from voices via common recognition processes, based on representations – be those of a specific identity, speaker sex, accent, or a perceived personality trait. While explicit discrimination processes may still be used to disambiguate percepts, they are likely to play a smaller role in perception in naturalistic settings. We offer discussions of how this representation-centred person perception from voices may work, in terms of the nature of representations, their specificity and interactions of different kinds of representation.

Download Full-text

Facial expression processing in developmental prosopagnosia

10.26686/wgtn.17068415.v1 ◽

2021 ◽

Author(s):

◽

Lauren Clare Bell

Keyword(s):

Facial Expression ◽

Expression Recognition ◽

Facial Identity ◽

Video Clips ◽

Core Deficit ◽

Identity Perception ◽

Expression Processing ◽

Sequential Matching ◽

Identity Processing ◽

Inversion Effects

<p>Individuals with developmental prosopagnosia experience lifelong deficits recognising facial identity, but whether their ability to process facial expression is also impaired is unclear. Addressing this issue is key for understanding the core deficit in developmental prosopagnosia, and for advancing knowledge about the mechanisms and development of normal face processing. In this thesis, I report two online studies on facial expression processing with large samples of prosopagnosics. In Study 1, I compared facial expression and facial identity perception in 124 prosopagnosics and 133 controls. I used three perceptual tasks including simultaneous matching, sequential matching, and sorting. I also measured inversion effects to examine whether prosopagnosics rely on typical face mechanisms. Prosopagnosics showed subtle deficits with facial expression, but they performed worse with facial identity. Prosopagnosics also showed reduced inversion effects for facial identity but normal inversion effects for facial expression, suggesting they use atypical mechanisms for facial identity but normal mechanisms for facial expression. In Study 2, I extended the findings of Study 1 by assessing facial expression recognition in 78 prosopagnosics and 138 controls. I used four labelling tasks that varied on whether the facial expressions were basic (e.g., happy) or complex (e.g., elated), and whether they were displayed via static (i.e., images) or dynamic (i.e., video clips) stimuli. Prosopagnosics showed subtle deficits with basic expressions but performed normally with complex expressions. Further, prosopagnosics did not show reduced inversion effects for both types of expressions, suggesting they use similar recognition mechanisms as controls. Critically, the subtle expression deficits that prosopagnosics showed in both studies can be accounted for by autism traits, suggesting that expression deficits are not a feature of prosopagnosia per se. I also provide estimates of the prevalence of deficits in facial expression perception (7.70%) and recognition (2.56% - 5.13%) in prosopagnosia, both of which suggest that facial expression processing is normal in the majority of prosopagnosics. Overall, my thesis demonstrates that facial expression processing is not impaired in developmental prosopagnosia, and suggests that facial expression and facial identity processing rely on separate mechanisms that dissociate in development.</p>

Download Full-text

Comparing unfamiliar voice and face identity perception using identity sorting tasks

Quarterly Journal of Experimental Psychology ◽

10.1177/1747021820938659 ◽

2020 ◽

Vol 73 (10) ◽

pp. 1537-1545 ◽

Cited By ~ 2

Author(s):

Justine Johnson ◽

Carolyn McGettigan ◽

Nadine Lavan

Keyword(s):

Discrimination Performance ◽

Sorting Task ◽

Significant Relationships ◽

Face Matching ◽

Matching Test ◽

Voice Matching ◽

Sorting Tasks ◽

Identity Perception ◽

Study Participants ◽

And Task

Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to “tell people together.” These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants’ performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on “same identity” trials with “telling people together” for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.

Download Full-text