scholarly journals Hallucinating Optical Flow Features for Video Classification

Author(s):  
Yongyi Tang ◽  
Lin Ma ◽  
Lianqiang Zhou

Appearance and motion are two key components to depict and characterize the video content. Currently, the two-stream models have achieved state-of-the-art performances on video classification. However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification. In this paper, we propose a motion hallucination network, namely MoNet, to imagine the optical flow features from the appearance features, with no reliance on the optical flow computation. Specifically, MoNet models the temporal relationships of the appearance features and exploits the contextual relationships of the optical flow features with concurrent connections. Extensive experimental results demonstrate that the proposed MoNet can effectively and efficiently hallucinate the optical flow features, which together with the appearance features consistently improve the video classification performances. Moreover, MoNet can help cutting down almost a half of computational and data-storage burdens for the two-stream video classification. Our code is available at: https://github.com/YongyiTang92/MoNet-Features

Author(s):  
Hehe Fan ◽  
Zhongwen Xu ◽  
Linchao Zhu ◽  
Chenggang Yan ◽  
Jianjun Ge ◽  
...  

We aim to significantly reduce the computational cost for classification of temporally untrimmed videos while retaining similar accuracy. Existing video classification methods sample frames with a predefined frequency over entire video. Differently, we propose an end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames like what we do. We make two main contributions. First, information is not equally distributed in video frames along time. An agent needs to watch more carefully when a clip is informative and skip the frames if they are redundant or irrelevant. The proposed approach enables the agent to adapt sampling rate to video content and skip most of the frames without the loss of information. Second, in order to have a confident decision, the number of frames that should be watched by an agent varies greatly from one video to another. We incorporate an adaptive stop network to measure confidence score and generate timely trigger to stop the agent watching videos, which improves efficiency without loss of accuracy. Our approach reduces the computational cost significantly for the large-scale YouTube-8M dataset, while the accuracy remains the same.


2019 ◽  
Vol 29 (01) ◽  
pp. 2050006 ◽  
Author(s):  
Qiuyu Li ◽  
Jun Yu ◽  
Toru Kurihara ◽  
Haiyan Zhang ◽  
Shu Zhan

Micro-expression is a kind of brief facial movements which could not be controlled by the nervous system. Micro-expression indicates that a person is hiding his true emotion consciously. Micro-expression recognition has various potential applications in public security and clinical medicine. Researches are focused on the automatic micro-expression recognition, because it is hard to recognize the micro-expression by people themselves. This research proposed a novel algorithm for automatic micro-expression recognition which combined a deep multi-task convolutional network for detecting the facial landmarks and a fused deep convolutional network for estimating the optical flow features of the micro-expression. First, the deep multi-task convolutional network is employed to detect facial landmarks with the manifold-related tasks for dividing the facial region. Furthermore, a fused convolutional network is applied for extracting the optical flow features from the facial regions which contain the muscle changes when the micro-expression appears. Because each video clip has many frames, the original optical flow features of the whole video clip will have high number of dimensions and redundant information. This research revises the optical flow features for reducing the redundant dimensions. Finally, a revised optical flow feature is applied for refining the information of the features and a support vector machine classifier is adopted for recognizing the micro-expression. The main contribution of work is combining the deep multi-task learning neural network and the fusion optical flow network for micro-expression recognition and revising the optical flow features for reducing the redundant dimensions. The results of experiments on two spontaneous micro-expression databases prove that our method achieved competitive performance in micro-expression recognition.


2020 ◽  
Vol 14 (13) ◽  
pp. 1845-1854
Author(s):  
Yuan Hu ◽  
Hubert P. H. Shum ◽  
Edmond S. L. Ho

Author(s):  
Hongwei Ren ◽  
Chenghao Li ◽  
Xinyi Zhang ◽  
Chenchen Ding ◽  
Changhai Man ◽  
...  

Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5142 ◽  
Author(s):  
Dong Liang ◽  
Jiaxing Pan ◽  
Han Sun ◽  
Huiyu Zhou

Foreground detection is an important theme in video surveillance. Conventional background modeling approaches build sophisticated temporal statistical model to detect foreground based on low-level features, while modern semantic/instance segmentation approaches generate high-level foreground annotation, but ignore the temporal relevance among consecutive frames. In this paper, we propose a Spatio-Temporal Attention Model (STAM) for cross-scene foreground detection. To fill the semantic gap between low and high level features, appearance and optical flow features are synthesized by attention modules via the feature learning procedure. Experimental results on CDnet 2014 benchmarks validate it and outperformed many state-of-the-art methods in seven evaluation metrics. With the attention modules and optical flow, its F-measure increased 9 % and 6 % respectively. The model without any tuning showed its cross-scene generalization on Wallflower and PETS datasets. The processing speed was 10.8 fps with the frame size 256 by 256.


Sign in / Sign up

Export Citation Format

Share Document