Classifying web videos using a global video descriptor

Affective image operations are attempts to influence behaviour and stimulate action by evoking affects through images. The paper explores their forms and uses in political conflict, from video activism to war propaganda. Drawing together interdisciplinary research, the chapter develops a theoretical framework for analysing the affective and political force of still and moving images, arguing that the affective structure of images has four layers: Political affects and emotions are triggered by the specific interplay of visual forms, worlds, messages, and reflections. On the basis of this framework, several frequent types of affective image operations can be distinguished, illustrated by brief case studies of political web videos.

Download Full-text

Web-Videos — Social Branding und Performance-Optimierung

Social Branding ◽

10.1007/978-3-8349-3755-1_13 ◽

2012 ◽

pp. 197-205

Author(s):

Carsten Kreilaus

Keyword(s):

Web Videos

Download Full-text

Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Symmetry ◽

10.3390/sym11010052 ◽

2019 ◽

Vol 11 (1) ◽

pp. 52 ◽

Cited By ~ 5

Author(s):

Xianzhang Pan ◽

Wenping Guo ◽

Xiaoying Guo ◽

Wenshu Li ◽

Junjie Xu ◽

...

Keyword(s):

Facial Expression ◽

Facial Expression Recognition ◽

State Of The Art ◽

Spatial Aggregation ◽

Expression Recognition ◽

Temporal Features ◽

Visual Descriptors ◽

Feature Aggregation ◽

Temporal Feature ◽

Video Descriptor

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.

Download Full-text