4D Effect Video Classification with Shot-Aware Frame Selection and Deep Neural Networks

AbstractCombining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architecture for effective combination of CNN and RNN, a novel action template-based keyframe extraction method is proposed by identifying the informative region of each frame and selecting keyframes based on the similarity between those regions. Extensive experiments on KTH and UCF-101 datasets with ConvLSTM-based video classifiers have been conducted. Experimental results are evaluated using one-way analysis of variance, which reveals the effectiveness of the proposed keyframe extraction method in the sense that it can significantly improve video classification accuracy.

Download Full-text

Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification

Proceedings of the ACM International Conference on Multimedia - MM '14 ◽

10.1145/2647868.2654931 ◽

2014 ◽

Cited By ~ 64

Author(s):

Zuxuan Wu ◽

Yu-Gang Jiang ◽

Jun Wang ◽

Jian Pu ◽

Xiangyang Xue

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Video Classification

Download Full-text

Multimodal social media video classification with deep neural networks

Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018 ◽

10.1117/12.2501679 ◽

2018 ◽

Author(s):

Tomasz Trzcinski

Keyword(s):

Neural Networks ◽

Social Media ◽

Deep Neural Networks ◽

Video Classification

Download Full-text

Frame-by-frame annotation of video recordings using deep neural networks

10.1101/2020.06.29.177261 ◽

2020 ◽

Author(s):

Alexander M. Conway ◽

Ian N. Durbach ◽

Alistair McInnes ◽

Robert N. Harris

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Video Data ◽

Video Frame ◽

Video Classification ◽

Ecological Studies ◽

Video Cameras ◽

Video Recordings ◽

Manual Inspection ◽

Research Classification

AbstractVideo data are widely collected in ecological studies but manual annotation is a challenging and time-consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets, and another collected by animal-borne cameras deployed on African penguins, used to classify behaviour. The combined RNN-CNN led to a relative improvement in test set classification accuracy over an image-only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behaviour classes (12–17%). Image-only and video models classified seal activity with equally high accuracy (90%). Temporal patterns related to movement provide valuable information about animal behaviour, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.

Download Full-text