An Efficient Video Prediction Recurrent Network using Focal Loss and Decomposed Tensor Train for Imbalance Dataset

Author(s):  
Mingshuo Liu ◽  
Kevin Han ◽  
Shiyi Luo ◽  
Mingze Pan ◽  
Mousam Hossain ◽  
...  
Author(s):  
Shijie Yang ◽  
Liang Li ◽  
Shuhui Wang ◽  
Dechao Meng ◽  
Qingming Huang ◽  
...  

2021 ◽  
Author(s):  
Xiaojie Gao ◽  
Yueming Jin ◽  
Qi Dou ◽  
Chi-Wing Fu ◽  
Pheng-Ann Heng

2020 ◽  
Vol 34 (07) ◽  
pp. 13098-13105 ◽  
Author(s):  
Linchao Zhu ◽  
Du Tran ◽  
Laura Sevilla-Lara ◽  
Yi Yang ◽  
Matt Feiszli ◽  
...  

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.


2020 ◽  
Vol 10 (22) ◽  
pp. 8288
Author(s):  
Kun Fan ◽  
Chungin Joung ◽  
Seungjun Baek

Video prediction which maps a sequence of past video frames into realistic future video frames is a challenging task because it is difficult to generate realistic frames and model the coherent relationship between consecutive video frames. In this paper, we propose a hierarchical sequence-to-sequence prediction approach to address this challenge. We present an end-to-end trainable architecture in which the frame generator automatically encodes input frames into different levels of latent Convolutional Neural Network (CNN) features, and then recursively generates future frames conditioned on the estimated hierarchical CNN features and previous prediction. Our design is intended to automatically learn hierarchical representations of video and their temporal dynamics. Convolutional Long Short-Term Memory (ConvLSTM) is used in combination with skip connections so as to separately capture the sequential structures of multiple levels of hierarchy of features. We adopt Scheduled Sampling for training our recurrent network in order to facilitate convergence and to produce high-quality sequence predictions. We evaluate our method on the Bouncing Balls, Moving MNIST, and KTH human action dataset, and report favorable results as compared to existing methods.


1997 ◽  
Author(s):  
William T. Farrar ◽  
Guy C. Van Orden

2020 ◽  
Vol 28 (7) ◽  
pp. 1480-1484
Author(s):  
Yun-hong LI ◽  
◽  
Hong-hao LI ◽  
Da WEN ◽  
Fan-su WEI ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document