scholarly journals Comparing supervised and unsupervised approaches to multimodal emotion recognition

2021 ◽  
Vol 7 ◽  
pp. e804
Author(s):  
Marcos Fernández Carbonell ◽  
Magnus Boman ◽  
Petri Laukka

We investigated emotion classification from brief video recordings from the GEMEP database wherein actors portrayed 18 emotions. Vocal features consisted of acoustic parameters related to frequency, intensity, spectral distribution, and durations. Facial features consisted of facial action units. We first performed a series of person-independent supervised classification experiments. Best performance (AUC = 0.88) was obtained by merging the output from the best unimodal vocal (Elastic Net, AUC = 0.82) and facial (Random Forest, AUC = 0.80) classifiers using a late fusion approach and the product rule method. All 18 emotions were recognized with above-chance recall, although recognition rates varied widely across emotions (e.g., high for amusement, anger, and disgust; and low for shame). Multimodal feature patterns for each emotion are described in terms of the vocal and facial features that contributed most to classifier performance. Next, a series of exploratory unsupervised classification experiments were performed to gain more insight into how emotion expressions are organized. Solutions from traditional clustering techniques were interpreted using decision trees in order to explore which features underlie clustering. Another approach utilized various dimensionality reduction techniques paired with inspection of data visualizations. Unsupervised methods did not cluster stimuli in terms of emotion categories, but several explanatory patterns were observed. Some could be interpreted in terms of valence and arousal, but actor and gender specific aspects also contributed to clustering. Identifying explanatory patterns holds great potential as a meta-heuristic when unsupervised methods are used in complex classification tasks.

2015 ◽  
Vol 294 ◽  
pp. 553-564 ◽  
Author(s):  
Manuel Domínguez ◽  
Serafín Alonso ◽  
Antonio Morán ◽  
Miguel A. Prada ◽  
Juan J. Fuertes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


2019 ◽  
Vol 165 ◽  
pp. 104-111 ◽  
Author(s):  
S. Velliangiri ◽  
S. Alagumuthukrishnan ◽  
S Iwin Thankumar joseph

2016 ◽  
Vol 85 ◽  
pp. 241-248 ◽  
Author(s):  
A. Vinay ◽  
Vikkram Vasuki ◽  
Shreyas Bhat ◽  
K.S. Jayanth ◽  
K.N. Balasubramanya Murthy ◽  
...  

2021 ◽  
Author(s):  
Nicole X Han ◽  
Puneeth N. Chakravarthula ◽  
Miguel P. Eckstein

Face processing is a fast and efficient process due to its evolutionary and social importance. A majority of people direct their first eye movement to a featureless point just below the eyes that maximizes accuracy in recognizing a person's identity and gender. Yet, the exact properties or features of the face that guide the first eye movements and reduce fixational variability are unknown. Here, we manipulated the presence of the facial features and the spatial configuration of features to investigate their effect on the location and variability of first and second fixations to peripherally presented faces. Results showed that observers can utilize the face outline, individual facial features, and feature spatial configuration to guide the first eye movements to their preferred point of fixation. The eyes have a preferential role in guiding the first eye movements and reducing fixation variability. Eliminating the eyes or altering their position had the greatest influence on the location and variability of fixations and resulted in the largest detriment to face identification performance. The other internal features (nose and mouth) also contribute to reducing fixation variability. A subsequent experiment measuring detection of single features showed that the eyes have the highest detectability (relative to other features) in the visual periphery providing a strong sensory signal to guide the oculomotor system. Together, the results suggest a flexible multiple-cue approach that might be a robust solution to cope with how the varying eccentricities in the real world influence the ability to resolve individual feature properties and the preferential role of the eyes.


Sign in / Sign up

Export Citation Format

Share Document