scholarly journals Talker and accent familiarity yield advantages for voice identity perception: a voice sorting study

2021 ◽  
Author(s):  
Njie ◽  
Nadine Lavan ◽  
Carolyn McGettigan

Familiarity benefits in voice identity perception have been frequently described in the literature. Typically, studies have contrasted listeners who were either familiar or unfamiliar with the target voices, thus manipulating talker familiarity. In these studies, familiarity with a voice results in more accurate voice identity perception. Such talker familiarity is, however, only one way in which listeners can be familiar the stimuli used: Another type of familiarity that has been shown to benefit voice identity perception is language or accent familiarity. In the current study, we examine and compare the effects of talker and accent familiarity in the context of a voice identity sorting task, using naturally varying voice recording samples from the TV show “Derry Girls”. Voice samples were thus all spoken with a regional accent of UK/Irish English (Northern Irish). We tested four listeners groups: Listeners were either familiar or unfamiliar with the TV show (and therefore the talker identities) and were either highly familiar or relatively less familiar with the accent. We find that both talker and accent familiarity significantly improve accuracy of voice identity perception. However, the effect sizes for effects of talker familiarity are overall larger. We discuss our findings in light of existing models of voice perception, arguing that they provide evidence for interactions of speech and identity processing pathways in voice perception. We conclude that voice perception is a highly interactive process, during which listeners make use of any available information to achieve their perceptual goals.

2020 ◽  
Author(s):  
Jens Kreitewolf ◽  
Nadine Lavan ◽  
Jonas Obleser ◽  
Carolyn McGettigan

Familiar and unfamiliar voice perception are often understood as being distinct from each other. For identity perception, theoretical work has proposed that listeners use acoustic information in different ways to perceive identity from familiar and unfamiliar voices: Unfamiliar voices are thought to be processed based on close comparisons of acoustic properties, while familiar voices are processed based on diagnostic acoustic features that activate a stored person-specific representation of that voice. To date no empirical study has directly examined whether and how familiar and unfamiliar listeners differ in their use of acoustic information for identity perception. Here, we tested this theoretical claim by linking listeners’ judgements in voice identity tasks to complex acoustic representations—spectral similarity of the heard voice recordings. Participants (N=150) who were either familiar or unfamiliar with a set of voices completed an identity discrimination task (Experiment 1) or an identity sorting task (Experiment 2). In both experiments, identity judgements for familiar and unfamiliar voices alike were guided by spectral similarity: Pairs of recordings with greater acoustic similarity were more likely to be perceived as belonging to the same voice identity. However, while there were no differences in how familiar and unfamiliar listeners used acoustic information for identity discrimination, differences were apparent for identity sorting. Our study therefore challenges proposals that view familiar and unfamiliar voice perception as being at all times distinct and suggests a critical role of the listening situation in which familiar and unfamiliar voices are evaluated.


Author(s):  
Nadine Lavan ◽  
Harriet M. J. Smith ◽  
Carolyn McGettigan

AbstractUnimodal and cross-modal information provided by faces and voices contribute to identity percepts. To examine how these sources of information interact, we devised a novel audio-visual sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel level or below. In Experiment 1, we compared performance in our novel audio-visual sorting task to a traditional identity matching task, showing that unimodal and cross-modal identity perception were overall moderately more accurate than the traditional identity matching task. In Experiment 2, separating unimodal from cross-modal sorting led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. In Experiment 3, we explored the effect of minimal audio-visual training: Participants were shown a clip of the two identities in conversation prior to completing the sorting task. This led to small, nonsignificant improvements in accuracy for unimodal and cross-modal sorting. Our results indicate that unfamiliar face and voice perception operate relatively independently with no evidence of mutual benefit, suggesting that extracting reliable cross-modal identity information is challenging.


2020 ◽  
Author(s):  
Nadine Lavan ◽  
Harriet M J Smith ◽  
Carolyn McGettigan

Unimodal and cross-modal information provided by faces and voices can contribute to identity percepts. To examine how these unimodal and cross-modal sources of information interact, we devised a novel audiovisual identity sorting task in which participants were required to group video-only and audio-only clips into two identities. In a series of three experiments, we show that unimodal face and voice sorting were more accurate than cross-modal sorting accuracy: While face sorting was consistently most accurate followed by voice sorting, cross-modal sorting was at chancel-level or below. In Experiment 1, we contextualised performance in our novel audiovisual sorting task by comparing it to a traditional identity matching task. Here we found that unimodal and cross-modal identity perception were more accurate in the matching task. In Experiment 2, we separated unimodal from cross-modal sorting, which led to small improvements in accuracy for unimodal sorting, but no change in cross-modal sorting performance. Finally, in Experiment 3 we explored the effect of minimal audiovisual training: Participants were shown an audiovisual clip of the two identities in conversation prior to completing the sorting task. This minimal training led to small but non-significant improvements in accuracy for both unimodal and cross-modal sorting. Our results indicate that, for unfamiliar people, face and voice perception operate relatively independently with no evidence of mutual benefit. We also show that extracting reliable redundant cross-modal information for identity judgements is challenging.


2019 ◽  
Vol 72 (9) ◽  
pp. 2240-2248 ◽  
Author(s):  
Nadine Lavan ◽  
Luke FK Burston ◽  
Paayal Ladwa ◽  
Siobhan E Merriman ◽  
Sarah Knight ◽  
...  

The human voice is a highly flexible instrument for self-expression, yet voice identity perception is largely studied using controlled speech recordings. Using two voice-sorting tasks with naturally varying stimuli, we compared the performance of listeners who were familiar and unfamiliar with the TV show Breaking Bad. Listeners organised audio clips of speech with (1) low-expressiveness and (2) high-expressiveness into perceived identities. We predicted that increased expressiveness (e.g., shouting, strained voice) would significantly impair performance. Overall, while unfamiliar listeners were less able to generalise identity across exemplars, the two groups performed equivalently well when telling voices apart when dealing with low-expressiveness stimuli. However, high vocal expressiveness significantly impaired telling apart in both the groups: this led to increased misidentifications, where sounds from one character were assigned to the other. These misidentifications were highly consistent for familiar listeners but less consistent for unfamiliar listeners. Our data suggest that vocal flexibility has powerful effects on identity perception, where changes in the acoustic properties of vocal signals introduced by expressiveness lead to effects apparent in familiar and unfamiliar listeners alike. At the same time, expressiveness appears to have affected other aspects of voice identity processing selectively in one listener group but not the other, thus revealing complex interactions of stimulus properties and listener characteristics (i.e., familiarity) in identity processing.


2018 ◽  
Author(s):  
Nadine Lavan ◽  
Luke Burston ◽  
Lucia Garrido

Within-person variability is a striking feature of human voices: our voices sound different depending on the context (laughing vs. talking to a child vs. giving a speech). When perceiving speaker identities, listeners therefore need to not only "tell people apart" (perceiving exemplars from two different speakers as separate identities) but also "tell people together" (perceiving different exemplars from the same speaker as a single identity). In the current study, we investigated how such natural within-person variability affects voice identity perception. Using voices from a popular TV show, listeners, who were either familiar or unfamiliar with the show, sorted naturally-varying voice clips from 2 speakers into clusters to represent perceived identities. Across three independent participant samples, unfamiliar listeners perceived more identities than familiar listeners and frequently mistook exemplars from the same speaker to be different identities. These findings point towards a selective failure in "telling people together". Our study highlights within-person variability as a key feature of voices that has striking effects on (unfamiliar) voice identity perception. Our findings not only open up a new line of enquiry in the field of voice perception but also call for a re-evaluation of theoretical models to account for natural variability during identity perception.


2017 ◽  
Author(s):  
Nadine Lavan ◽  
A Mike Burton ◽  
Sophie K Scott ◽  
Carolyn McGettigan

Human voices are extremely variable: The same person can sound very different depending on whether they are speaking, laughing, shouting or whispering. In order to successfully recognise someone from their voice, a listener needs to be able to generalise across these different vocal signals ('telling people together'). However, in most studies of voice identity processing to date, the substantial within-person variability has been eliminated through the use of highly controlled stimuli, thus focussing on how we tell people apart. We argue that this obscures our understanding of voice identity processing by controlling away an essential feature of vocal stimuli that may include diagnostic information. In this paper, we propose that we need to extend the focus of voice identity research to account for both 'telling people together' as well as 'telling people apart'. That is, we must account for whether, and to what extent, listeners can overcome within-person variability to obtain a stable percept of person identity from vocal cues. To do this, our theoretical and methodological frameworks need to be adjusted to explicitly include the study of within-person variability.


2021 ◽  
Author(s):  
◽  
Lauren Clare Bell

<p>Individuals with developmental prosopagnosia experience lifelong deficits recognising facial identity, but whether their ability to process facial expression is also impaired is unclear. Addressing this issue is key for understanding the core deficit in developmental prosopagnosia, and for advancing knowledge about the mechanisms and development of normal face processing. In this thesis, I report two online studies on facial expression processing with large samples of prosopagnosics. In Study 1, I compared facial expression and facial identity perception in 124 prosopagnosics and 133 controls. I used three perceptual tasks including simultaneous matching, sequential matching, and sorting. I also measured inversion effects to examine whether prosopagnosics rely on typical face mechanisms. Prosopagnosics showed subtle deficits with facial expression, but they performed worse with facial identity. Prosopagnosics also showed reduced inversion effects for facial identity but normal inversion effects for facial expression, suggesting they use atypical mechanisms for facial identity but normal mechanisms for facial expression. In Study 2, I extended the findings of Study 1 by assessing facial expression recognition in 78 prosopagnosics and 138 controls. I used four labelling tasks that varied on whether the facial expressions were basic (e.g., happy) or complex (e.g., elated), and whether they were displayed via static (i.e., images) or dynamic (i.e., video clips) stimuli. Prosopagnosics showed subtle deficits with basic expressions but performed normally with complex expressions. Further, prosopagnosics did not show reduced inversion effects for both types of expressions, suggesting they use similar recognition mechanisms as controls. Critically, the subtle expression deficits that prosopagnosics showed in both studies can be accounted for by autism traits, suggesting that expression deficits are not a feature of prosopagnosia per se. I also provide estimates of the prevalence of deficits in facial expression perception (7.70%) and recognition (2.56% - 5.13%) in prosopagnosia, both of which suggest that facial expression processing is normal in the majority of prosopagnosics. Overall, my thesis demonstrates that facial expression processing is not impaired in developmental prosopagnosia, and suggests that facial expression and facial identity processing rely on separate mechanisms that dissociate in development.</p>


2019 ◽  
Author(s):  
Nadine Lavan ◽  
Carolyn McGettigan

When we hear a voice, we instantly form rich impressions of the person it belongs to – whether we are familiar with this voice or whether we are hearing it for the first time. Despite the rich impressions we can form of both familiar and unfamiliar voices, current models of voice processing primarily focus on familiar voice identity perception only and do not explicitly account for the processing of unfamiliar voices. Where unfamiliar identity processing is described, it tends to be in the context of specific identity perception tasks, such that the extant literature is largely built on a distinction between familiar voice recognition and unfamiliar voice discrimination. We argue that the current focus of the literature is too narrow in its strong emphasis on identity-specific perception, and does not adequately reflect person perception from voices beyond experimental tasks. Here, we propose a broader, unified account of person perception from both familiar and unfamiliar voices. We suggest that listeners routinely perceive all person characteristics from voices via common recognition processes, based on representations – be those of a specific identity, speaker sex, accent, or a perceived personality trait. While explicit discrimination processes may still be used to disambiguate percepts, they are likely to play a smaller role in perception in naturalistic settings. We offer discussions of how this representation-centred person perception from voices may work, in terms of the nature of representations, their specificity and interactions of different kinds of representation.


2021 ◽  
Author(s):  
◽  
Lauren Clare Bell

<p>Individuals with developmental prosopagnosia experience lifelong deficits recognising facial identity, but whether their ability to process facial expression is also impaired is unclear. Addressing this issue is key for understanding the core deficit in developmental prosopagnosia, and for advancing knowledge about the mechanisms and development of normal face processing. In this thesis, I report two online studies on facial expression processing with large samples of prosopagnosics. In Study 1, I compared facial expression and facial identity perception in 124 prosopagnosics and 133 controls. I used three perceptual tasks including simultaneous matching, sequential matching, and sorting. I also measured inversion effects to examine whether prosopagnosics rely on typical face mechanisms. Prosopagnosics showed subtle deficits with facial expression, but they performed worse with facial identity. Prosopagnosics also showed reduced inversion effects for facial identity but normal inversion effects for facial expression, suggesting they use atypical mechanisms for facial identity but normal mechanisms for facial expression. In Study 2, I extended the findings of Study 1 by assessing facial expression recognition in 78 prosopagnosics and 138 controls. I used four labelling tasks that varied on whether the facial expressions were basic (e.g., happy) or complex (e.g., elated), and whether they were displayed via static (i.e., images) or dynamic (i.e., video clips) stimuli. Prosopagnosics showed subtle deficits with basic expressions but performed normally with complex expressions. Further, prosopagnosics did not show reduced inversion effects for both types of expressions, suggesting they use similar recognition mechanisms as controls. Critically, the subtle expression deficits that prosopagnosics showed in both studies can be accounted for by autism traits, suggesting that expression deficits are not a feature of prosopagnosia per se. I also provide estimates of the prevalence of deficits in facial expression perception (7.70%) and recognition (2.56% - 5.13%) in prosopagnosia, both of which suggest that facial expression processing is normal in the majority of prosopagnosics. Overall, my thesis demonstrates that facial expression processing is not impaired in developmental prosopagnosia, and suggests that facial expression and facial identity processing rely on separate mechanisms that dissociate in development.</p>


2020 ◽  
Vol 73 (10) ◽  
pp. 1537-1545 ◽  
Author(s):  
Justine Johnson ◽  
Carolyn McGettigan ◽  
Nadine Lavan

Identity sorting tasks, in which participants sort multiple naturally varying stimuli of usually two identities into perceived identities, have recently gained popularity in voice and face processing research. In both modalities, participants who are unfamiliar with the identities tend to perceive multiple stimuli of the same identity as different people and thus fail to “tell people together.” These similarities across modalities suggest that modality-general mechanisms may underpin sorting behaviour. In this study, participants completed a voice sorting and a face sorting task. Taking an individual differences approach, we asked whether participants’ performance on voice and face sorting of unfamiliar identities is correlated. Participants additionally completed a voice discrimination (Bangor Voice Matching Test) and a face discrimination task (Glasgow Face Matching Test). Using these tasks, we tested whether performance on sorting related to explicit identity discrimination. Performance on voice sorting and face sorting tasks was correlated, suggesting that common modality-general processes underpin these tasks. However, no significant correlations were found between sorting and discrimination performance, with the exception of significant relationships for performance on “same identity” trials with “telling people together” for voices and faces. Overall, any reported relationships were however relatively weak, suggesting the presence of additional modality-specific and task-specific processes.


Sign in / Sign up

Export Citation Format

Share Document