scholarly journals Multisensory integration of musical emotion perception in singing

Author(s):  
Elke B. Lange ◽  
Jens Fünderich ◽  
Hartmut Grimm

AbstractWe investigated how visual and auditory information contributes to emotion communication during singing. Classically trained singers applied two different facial expressions (expressive/suppressed) to pieces from their song and opera repertoire. Recordings of the singers were evaluated by laypersons or experts, presented to them in three different modes: auditory, visual, and audio–visual. A manipulation check confirmed that the singers succeeded in manipulating the face while keeping the sound highly expressive. Analyses focused on whether the visual difference or the auditory concordance between the two versions determined perception of the audio–visual stimuli. When evaluating expressive intensity or emotional content a clear effect of visual dominance showed. Experts made more use of the visual cues than laypersons. Consistency measures between uni-modal and multimodal presentations did not explain the visual dominance. The evaluation of seriousness was applied as a control. The uni-modal stimuli were rated as expected, but multisensory evaluations converged without visual dominance. Our study demonstrates that long-term knowledge and task context affect multisensory integration. Even though singers’ orofacial movements are dominated by sound production, their facial expressions can communicate emotions composed into the music, and observes do not rely on audio information instead. Studies such as ours are important to understand multisensory integration in applied settings.

Perception ◽  
2016 ◽  
Vol 46 (5) ◽  
pp. 624-631 ◽  
Author(s):  
Andreas M. Baranowski ◽  
H. Hecht

Almost a hundred years ago, the Russian filmmaker Lev Kuleshov conducted his now famous editing experiment in which different objects were added to a given film scene featuring a neutral face. It is said that the audience interpreted the unchanged facial expression as a function of the added object (e.g., an added soup made the face express hunger). This interaction effect has been dubbed “Kuleshov effect.” In the current study, we explored the role of sound in the evaluation of facial expressions in films. Thirty participants watched different clips of faces that were intercut with neutral scenes, featuring either happy music, sad music, or no music at all. This was crossed with the facial expressions of happy, sad, or neutral. We found that the music significantly influenced participants’ emotional judgments of facial expression. Thus, the intersensory effects of music are more specific than previously thought. They alter the evaluation of film scenes and can give meaning to ambiguous situations.


2021 ◽  
Vol 39 (3) ◽  
pp. 315-327
Author(s):  
Marco Brambilla ◽  
Matteo Masi ◽  
Simone Mattavelli ◽  
Marco Biella

Face processing has mainly been investigated by presenting facial expressions without any contextual information. However, in everyday interactions with others, the sight of a face is often accompanied by contextual cues that are processed either visually or under different sensory modalities. Here, we tested whether the perceived trustworthiness of a face is influenced by the auditory context in which that face is embedded. In Experiment 1, participants evaluated trustworthiness from faces that were surrounded by either threatening or non-threatening auditory contexts. Results showed that faces were judged more untrustworthy when accompanied by threatening auditory information. Experiment 2 replicated the effect in a design that disentangled the effects of threatening contexts from negative contexts in general. Thus, perceiving facial trustworthiness involves a cross-modal integration of the face and the level of threat posed by the surrounding context.


2014 ◽  
Vol 23 (3) ◽  
pp. 132-139 ◽  
Author(s):  
Lauren Zubow ◽  
Richard Hurtig

Children with Rett Syndrome (RS) are reported to use multiple modalities to communicate although their intentionality is often questioned (Bartolotta, Zipp, Simpkins, & Glazewski, 2011; Hetzroni & Rubin, 2006; Sigafoos et al., 2000; Sigafoos, Woodyatt, Tuckeer, Roberts-Pennell, & Pittendreigh, 2000). This paper will present results of a study analyzing the unconventional vocalizations of a child with RS. The primary research question addresses the ability of familiar and unfamiliar listeners to interpret unconventional vocalizations as “yes” or “no” responses. This paper will also address the acoustic analysis and perceptual judgments of these vocalizations. Pre-recorded isolated vocalizations of “yes” and “no” were presented to 5 listeners (mother, father, 1 unfamiliar, and 2 familiar clinicians) and the listeners were asked to rate the vocalizations as either “yes” or “no.” The ratings were compared to the original identification made by the child's mother during the face-to-face interaction from which the samples were drawn. Findings of this study suggest, in this case, the child's vocalizations were intentional and could be interpreted by familiar and unfamiliar listeners as either “yes” or “no” without contextual or visual cues. The results suggest that communication partners should be trained to attend to eye-gaze and vocalizations to ensure the child's intended choice is accurately understood.


Perception ◽  
2021 ◽  
pp. 030100662110270
Author(s):  
Kennon M. Sheldon ◽  
Ryan Goffredi ◽  
Mike Corcoran

Facial expressions of emotion have important communicative functions. It is likely that mask-wearing during pandemics disrupts these functions, especially for expressions defined by activity in the lower half of the face. We tested this by asking participants to rate both Duchenne smiles (DSs; defined by the mouth and eyes) and non-Duchenne or “social” smiles (SSs; defined by the mouth alone), within masked and unmasked target faces. As hypothesized, masked SSs were rated much lower in “a pleasant social smile” and much higher in “a merely neutral expression,” compared with unmasked SSs. Essentially, masked SSs became nonsmiles. Masked DSs were still rated as very happy and pleasant, although significantly less so than unmasked DSs. Masked DSs and SSs were both rated as displaying more disgust than the unmasked versions.


2021 ◽  
pp. 003329412110184
Author(s):  
Paola Surcinelli ◽  
Federica Andrei ◽  
Ornella Montebarocci ◽  
Silvana Grandi

Aim of the research The literature on emotion recognition from facial expressions shows significant differences in recognition ability depending on the proposed stimulus. Indeed, affective information is not distributed uniformly in the face and recent studies showed the importance of the mouth and the eye regions for a correct recognition. However, previous studies used mainly facial expressions presented frontally and studies which used facial expressions in profile view used a between-subjects design or children faces as stimuli. The present research aims to investigate differences in emotion recognition between faces presented in frontal and in profile views by using a within subjects experimental design. Method The sample comprised 132 Italian university students (88 female, Mage = 24.27 years, SD = 5.89). Face stimuli displayed both frontally and in profile were selected from the KDEF set. Two emotion-specific recognition accuracy scores, viz., frontal and in profile, were computed from the average of correct responses for each emotional expression. In addition, viewing times and response times (RT) were registered. Results Frontally presented facial expressions of fear, anger, and sadness were significantly better recognized than facial expressions of the same emotions in profile while no differences were found in the recognition of the other emotions. Longer viewing times were also found when faces expressing fear and anger were presented in profile. In the present study, an impairment in recognition accuracy was observed only for those emotions which rely mostly on the eye regions.


2021 ◽  
pp. 174702182199299
Author(s):  
Mohamad El Haj ◽  
Emin Altintas ◽  
Ahmed A Moustafa ◽  
Abdel Halim Boudoukha

Future thinking, which is the ability to project oneself forward in time to pre-experience an event, is intimately associated with emotions. We investigated whether emotional future thinking can activate emotional facial expressions. We invited 43 participants to imagine future scenarios, cued by the words “happy,” “sad,” and “city.” Future thinking was video recorded and analysed with a facial analysis software to classify whether facial expressions (i.e., happy, sad, angry, surprised, scared, disgusted, and neutral facial expression) of participants were neutral or emotional. Analysis demonstrated higher levels of happy facial expressions during future thinking cued by the word “happy” than “sad” or “city.” In contrast, higher levels of sad facial expressions were observed during future thinking cued by the word “sad” than “happy” or “city.” Higher levels of neutral facial expressions were observed during future thinking cued by the word “city” than “happy” or “sad.” In the three conditions, the neutral facial expressions were high compared with happy and sad facial expressions. Together, emotional future thinking, at least for future scenarios cued by “happy” and “sad,” seems to trigger the corresponding facial expression. Our study provides an original physiological window into the subjective emotional experience during future thinking.


2021 ◽  
Vol 11 (4) ◽  
pp. 1428
Author(s):  
Haopeng Wu ◽  
Zhiying Lu ◽  
Jianfeng Zhang ◽  
Xin Li ◽  
Mingyue Zhao ◽  
...  

This paper addresses the problem of Facial Expression Recognition (FER), focusing on unobvious facial movements. Traditional methods often cause overfitting problems or incomplete information due to insufficient data and manual selection of features. Instead, our proposed network, which is called the Multi-features Cooperative Deep Convolutional Network (MC-DCN), maintains focus on the overall feature of the face and the trend of key parts. The processing of video data is the first stage. The method of ensemble of regression trees (ERT) is used to obtain the overall contour of the face. Then, the attention model is used to pick up the parts of face that are more susceptible to expressions. Under the combined effect of these two methods, the image which can be called a local feature map is obtained. After that, the video data are sent to MC-DCN, containing parallel sub-networks. While the overall spatiotemporal characteristics of facial expressions are obtained through the sequence of images, the selection of keys parts can better learn the changes in facial expressions brought about by subtle facial movements. By combining local features and global features, the proposed method can acquire more information, leading to better performance. The experimental results show that MC-DCN can achieve recognition rates of 95%, 78.6% and 78.3% on the three datasets SAVEE, MMI, and edited GEMEP, respectively.


2021 ◽  
Author(s):  
Judith M. Varkevisser ◽  
Ralph Simon ◽  
Ezequiel Mendoza ◽  
Martin How ◽  
Idse van Hijlkema ◽  
...  

AbstractBird song and human speech are learned early in life and for both cases engagement with live social tutors generally leads to better learning outcomes than passive audio-only exposure. Real-world tutor–tutee relations are normally not uni- but multimodal and observations suggest that visual cues related to sound production might enhance vocal learning. We tested this hypothesis by pairing appropriate, colour-realistic, high frame-rate videos of a singing adult male zebra finch tutor with song playbacks and presenting these stimuli to juvenile zebra finches (Taeniopygia guttata). Juveniles exposed to song playbacks combined with video presentation of a singing bird approached the stimulus more often and spent more time close to it than juveniles exposed to audio playback only or audio playback combined with pixelated and time-reversed videos. However, higher engagement with the realistic audio–visual stimuli was not predictive of better song learning. Thus, although multimodality increased stimulus engagement and biologically relevant video content was more salient than colour and movement equivalent videos, the higher engagement with the realistic audio–visual stimuli did not lead to enhanced vocal learning. Whether the lack of three-dimensionality of a video tutor and/or the lack of meaningful social interaction make them less suitable for facilitating song learning than audio–visual exposure to a live tutor remains to be tested.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2003 ◽  
Author(s):  
Xiaoliang Zhu ◽  
Shihao Ye ◽  
Liang Zhao ◽  
Zhicheng Dai

As a sub-challenge of EmotiW (the Emotion Recognition in the Wild challenge), how to improve performance on the AFEW (Acted Facial Expressions in the wild) dataset is a popular benchmark for emotion recognition tasks with various constraints, including uneven illumination, head deflection, and facial posture. In this paper, we propose a convenient facial expression recognition cascade network comprising spatial feature extraction, hybrid attention, and temporal feature extraction. First, in a video sequence, faces in each frame are detected, and the corresponding face ROI (range of interest) is extracted to obtain the face images. Then, the face images in each frame are aligned based on the position information of the facial feature points in the images. Second, the aligned face images are input to the residual neural network to extract the spatial features of facial expressions corresponding to the face images. The spatial features are input to the hybrid attention module to obtain the fusion features of facial expressions. Finally, the fusion features are input in the gate control loop unit to extract the temporal features of facial expressions. The temporal features are input to the fully connected layer to classify and recognize facial expressions. Experiments using the CK+ (the extended Cohn Kanade), Oulu-CASIA (Institute of Automation, Chinese Academy of Sciences) and AFEW datasets obtained recognition accuracy rates of 98.46%, 87.31%, and 53.44%, respectively. This demonstrated that the proposed method achieves not only competitive performance comparable to state-of-the-art methods but also greater than 2% performance improvement on the AFEW dataset, proving the significant outperformance of facial expression recognition in the natural environment.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0245777
Author(s):  
Fanny Poncet ◽  
Robert Soussignan ◽  
Margaux Jaffiol ◽  
Baptiste Gaudelus ◽  
Arnaud Leleu ◽  
...  

Recognizing facial expressions of emotions is a fundamental ability for adaptation to the social environment. To date, it remains unclear whether the spatial distribution of eye movements predicts accurate recognition or, on the contrary, confusion in the recognition of facial emotions. In the present study, we asked participants to recognize facial emotions while monitoring their gaze behavior using eye-tracking technology. In Experiment 1a, 40 participants (20 women) performed a classic facial emotion recognition task with a 5-choice procedure (anger, disgust, fear, happiness, sadness). In Experiment 1b, a second group of 40 participants (20 women) was exposed to the same materials and procedure except that they were instructed to say whether (i.e., Yes/No response) the face expressed a specific emotion (e.g., anger), with the five emotion categories tested in distinct blocks. In Experiment 2, two groups of 32 participants performed the same task as in Experiment 1a while exposed to partial facial expressions composed of actions units (AUs) present or absent in some parts of the face (top, middle, or bottom). The coding of the AUs produced by the models showed complex facial configurations for most emotional expressions, with several AUs in common. Eye-tracking data indicated that relevant facial actions were actively gazed at by the decoders during both accurate recognition and errors. False recognition was mainly associated with the additional visual exploration of less relevant facial actions in regions containing ambiguous AUs or AUs relevant to other emotional expressions. Finally, the recognition of facial emotions from partial expressions showed that no single facial actions were necessary to effectively communicate an emotional state. In contrast, the recognition of facial emotions relied on the integration of a complex set of facial cues.


Sign in / Sign up

Export Citation Format

Share Document