scholarly journals Context-Aware Attention Network for Human Emotion Recognition in Video

2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Xiaodong Liu ◽  
Miao Wang

Recognition of human emotion from facial expression is affected by distortions of pictorial quality and facial pose, which is often ignored by traditional video emotion recognition methods. On the other hand, context information can also provide different degrees of extra clues, which can further improve the recognition accuracy. In this paper, we first build a video dataset with seven categories of human emotion, named human emotion in the video (HEIV). With the HEIV dataset, we trained a context-aware attention network (CAAN) to recognize human emotion. The network consists of two subnetworks to process both face and context information. Features from facial expression and context clues are fused to represent the emotion of video frames, which will be then passed through an attention network and generate emotion scores. Then, the emotion features of all frames will be aggregated according to their emotional score. Experimental results show that our proposed method is effective on HEIV dataset.

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaodong Liu ◽  
Songyang Li ◽  
Miao Wang

The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.


Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2038
Author(s):  
Xi Shao ◽  
Xuan Zhang ◽  
Guijin Tang ◽  
Bingkun Bao

We propose a new end-to-end scene recognition framework, called a Recurrent Memorized Attention Network (RMAN) model, which performs object-based scene classification by recurrently locating and memorizing objects in the image. Based on the proposed framework, we introduce a multi-task mechanism that contiguously attends on the different essential objects in a scene image and recurrently performs memory fusion of the features of object focused by an attention model to improve the scene recognition accuracy. The experimental results show that the RMAN model has achieved better classification performance on the constructed dataset and two public scene datasets, surpassing state-of-the-art image scene recognition approaches.


Algorithms ◽  
2019 ◽  
Vol 13 (1) ◽  
pp. 14
Author(s):  
Jianjian Ji ◽  
Gang Yang

Existing image completion methods are mostly based on missing regions that are small or located in the middle of the images. When regions to be completed are large or near the edge of the images, due to the lack of context information, the completion results tend to be blurred or distorted, and there will be a large blank area in the final results. In addition, the unstable training of the generative adversarial network is also prone to cause pseudo-color in the completion results. Aiming at the two above-mentioned problems, a method of image completion with large or edge-missing areas is proposed; also, the network structures have been improved. On the one hand, it overcomes the problem of lacking context information, which thereby ensures the reality of generated texture details; on the other hand, it suppresses the generation of pseudo-color, which guarantees the consistency of the whole image both in vision and content. The experimental results show that the proposed method achieves better completion results in completing large or edge-missing areas.


2011 ◽  
Vol 26 (S2) ◽  
pp. 1399-1399
Author(s):  
S. Herrera ◽  
M. Bardón ◽  
C. Fernández ◽  
V. Ángeles ◽  
G. Lahera Forteza ◽  
...  

IntroductionPatients with schizophrenia show a deficit in emotion recognition through facial expression and the low sense of familiarity may be a factor involved. However, the emotion facial expression in families of patients could be disturbed and be another factor related to the deficit in emotion recognition and in sense of familiarity in schizophrenia.ObjectivesTo assess the emotion facial expression in a sample of 21 families of patients with schizophrenia and families of healthy controls.Methods22 healthy volunteers, all of them professionals of mental health, were assessed with the Ekman Test of emotion recognition in unfamiliar people which was photographed by expressing the 6 Ekman’s basic emotions. The task was composed of 42 pictures, half of them from families of patients and the other half from families of healthy control.ResultsVolunteers recognize worse emotions in relatives of patients than in relatives of control group and this difference was statistically significant (Wilcoxon W = -4.13; p = .001). The average of pictures correctly recognized from families of patients was lower than pictures from families of control group (54.28% vs. 82%).ConclusionsThe emotion facial expression in families of patients with schizophrenia seems worse than in families of healthy controls. It could be a factor involved in face emotion recognition deficit in schizophrenia.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Wenming Ma ◽  
Junfeng Shi ◽  
Ruidong Zhao

Item-based collaborative filter algorithms play an important role in modern commercial recommendation systems (RSs). To improve the recommendation performance, normalization is always used as a basic component for the predictor models. Among a lot of normalizing methods, subtracting the baseline predictor (BLP) is the most popular one. However, the BLP uses a statistical constant without considering the context. We found that slightly scaling the different components of the BLP separately could dramatically improve the performance. This paper proposed some normalization methods based on the scaled baseline predictors according to different context information. The experimental results show that using context-aware scaled baseline predictor for normalization indeed gets better recommendation performance, including RMSE, MAE, precision, recall, and nDCG.


2021 ◽  
Vol 25 ◽  
pp. 233121652110453
Author(s):  
Minke J. de Boer ◽  
Tim Jürgens ◽  
Deniz Başkent ◽  
Frans W. Cornelissen

Since emotion recognition involves integration of the visual and auditory signals, it is likely that sensory impairments worsen emotion recognition. In emotion recognition, young adults can compensate for unimodal sensory degradations if the other modality is intact. However, most sensory impairments occur in the elderly population and it is unknown whether older adults are similarly capable of compensating for signal degradations. As a step towards studying potential effects of real sensory impairments, this study examined how degraded signals affect emotion recognition in older adults with normal hearing and vision. The degradations were designed to approximate some aspects of sensory impairments. Besides emotion recognition accuracy, we recorded eye movements to capture perceptual strategies for emotion recognition. Overall, older adults were as good as younger adults at integrating auditory and visual information and at compensating for degraded signals. However, accuracy was lower overall for older adults, indicating that aging leads to a general decrease in emotion recognition. In addition to decreased accuracy, older adults showed smaller adaptations of perceptual strategies in response to video degradations. Concluding, this study showed that emotion recognition declines with age, but that integration and compensation abilities are retained. In addition, we speculate that the reduced ability of older adults to adapt their perceptual strategies may be related to the increased time it takes them to direct their attention to scene aspects that are relatively far away from fixation.


2012 ◽  
Vol 601 ◽  
pp. 325-331
Author(s):  
Shu Gao ◽  
Hua Huang ◽  
Bing Ge

Nowadays, a lot of services which do not meet user’s the requirements are returned while searching web services with traditional service discovery, and moreover, the efficiency is very low. On the other hand, current service directory specifications do not focus on context-aware. In this paper, a novel, enhanced model for the web service discovery, which is based on context-aware, is proposed, and the context information and domain information are integrated to filter and sort services during the process of service discovery. By this way, the precision and efficiency of the service discovery can be improved.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1881
Author(s):  
Yuhui Chang ◽  
Jiangtao Xu ◽  
Zhiyuan Gao

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.


Sign in / Sign up

Export Citation Format

Share Document