4D Effect Video Classification with Shot-Aware Frame Selection and Deep Neural Networks

Author(s):  
Thomhert S. Siadari ◽  
Mikyong Han ◽  
Hyunjin Yoon
Author(s):  
Rukiye Savran Kızıltepe ◽  
John Q. Gan ◽  
Juan José Escobar

AbstractCombining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) produces a powerful architecture for video classification problems as spatial–temporal information can be processed simultaneously and effectively. Using transfer learning, this paper presents a comparative study to investigate how temporal information can be utilized to improve the performance of video classification when CNNs and RNNs are combined in various architectures. To enhance the performance of the identified architecture for effective combination of CNN and RNN, a novel action template-based keyframe extraction method is proposed by identifying the informative region of each frame and selecting keyframes based on the similarity between those regions. Extensive experiments on KTH and UCF-101 datasets with ConvLSTM-based video classifiers have been conducted. Experimental results are evaluated using one-way analysis of variance, which reveals the effectiveness of the proposed keyframe extraction method in the sense that it can significantly improve video classification accuracy.


2020 ◽  
Author(s):  
Alexander M. Conway ◽  
Ian N. Durbach ◽  
Alistair McInnes ◽  
Robert N. Harris

AbstractVideo data are widely collected in ecological studies but manual annotation is a challenging and time-consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets, and another collected by animal-borne cameras deployed on African penguins, used to classify behaviour. The combined RNN-CNN led to a relative improvement in test set classification accuracy over an image-only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behaviour classes (12–17%). Image-only and video models classified seal activity with equally high accuracy (90%). Temporal patterns related to movement provide valuable information about animal behaviour, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.


Author(s):  
Alex Hernández-García ◽  
Johannes Mehrer ◽  
Nikolaus Kriegeskorte ◽  
Peter König ◽  
Tim C. Kietzmann

2018 ◽  
Author(s):  
Chi Zhang ◽  
Xiaohan Duan ◽  
Ruyuan Zhang ◽  
Li Tong

Sign in / Sign up

Export Citation Format

Share Document