Exploring Convolutional Recurrent architectures for anomaly detection in videos: a comparative study
AbstractConvolutional Recurrent architectures are currently preferred for spatio-temporal learning tasks in videos to the 3D convolutional networks which accompany a huge computational burden and it is imperative to understand the working of different architectural configurations. But most of the current works on visual learning, especially for video anomaly detection, predominantly employ ConvLSTM networks and focus less on other possible variants of Convolutional Recurrent configurations for temporal learning which warrants a need to study the different possible variants to make informed, optimal design choices according to the nature of the application at hand. We explore a variety of Convolutional Recurrent architectures and the influence of hyper-parameters on their performance for the task of anomaly detection. Through this work, we also intend to quantify the efficiency of the architectures based on the trade-off between their performance and computational complexity. With comprehensive quantitative and visual evidence, we establish that the ConvGRU based configurations are the most effective and perform better than the popular ConvLSTM configurations on video anomaly detection tasks, in contrast to what is seen from the literature.