Multi-scale and multi-column convolutional neural network for crowd density estimation

In this paper, we propose a novel congested crowd counting network for crowd density estimation, i.e., the Adaptive Multi-scale Context Aggregation Network (MSCANet). MSCANet efficiently leverages the spatial context information to accomplish crowd density estimation in a complicated crowd scene. To achieve this, a multi-scale context learning block, called the Multi-scale Context Aggregation module (MSCA), is proposed to first extract different scale information and then adaptively aggregate it to capture the full scale of the crowd. Employing multiple MSCAs in a cascaded manner, the MSCANet can deeply utilize the spatial context information and modulate preliminary features into more distinguishing and scale-sensitive features, which are finally applied to a 1 × 1 convolution operation to obtain the crowd density results. Extensive experiments on three challenging crowd counting benchmarks showed that our model yielded compelling performance against the other state-of-the-art methods. To thoroughly prove the generality of MSCANet, we extend our method to two relevant tasks: crowd localization and remote sensing object counting. The extension experiment results also confirmed the effectiveness of MSCANet.

Download Full-text

Crowd counting via Multi-Scale Adversarial Convolutional Neural Networks

Journal of Intelligent Systems ◽

10.1515/jisys-2019-0157 ◽

2020 ◽

Vol 30 (1) ◽

pp. 180-191

Author(s):

Liping Zhu ◽

Hong Zhang ◽

Sikandar Ali ◽

Baoli Yang ◽

Chengyang Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Density Estimation ◽

Large Scale ◽

Receptive Fields ◽

Crowd Counting ◽

Multi Scale ◽

Training Scheme ◽

Joint Training ◽

End To End

Abstract The purpose of crowd counting is to estimate the number of pedestrians in crowd images. Crowd counting or density estimation is an extremely challenging task in computer vision, due to large scale variations and dense scene. Current methods solve these issues by compounding multi-scale Convolutional Neural Network with different receptive fields. In this paper, a novel end-to-end architecture based on Multi-Scale Adversarial Convolutional Neural Network (MSA-CNN) is proposed to generate crowd density and estimate the amount of crowd. Firstly, a multi-scale network is used to extract the globally relevant features in the crowd image, and then fractionally-strided convolutional layers are designed for up-sampling the output to recover the loss of crucial details caused by the earlier max pooling layers. An adversarial loss is directly employed to shrink the estimated value into the realistic subspace to reduce the blurring effect of density estimation. Joint training is performed in an end-to-end fashion using a combination of Adversarial loss and Euclidean loss. The two losses are integrated via a joint training scheme to improve density estimation performance.We conduct some extensive experiments on available datasets to show the significant improvements and supremacy of the proposed approach over the available state-of-the-art approaches.

Download Full-text