scholarly journals 3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels

2020 ◽  
Vol 34 (07) ◽  
pp. 12837-12844
Author(s):  
Qi Zhang ◽  
Antoni B. Chan

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Pengfei Li ◽  
Min Zhang ◽  
Jian Wan ◽  
Ming Jiang

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.


2018 ◽  
Vol 8 (12) ◽  
pp. 2367 ◽  
Author(s):  
Hongling Luo ◽  
Jun Sang ◽  
Weiqun Wu ◽  
Hong Xiang ◽  
Zhili Xiang ◽  
...  

In recent years, the trampling events due to overcrowding have occurred frequently, which leads to the demand for crowd counting under a high-density environment. At present, there are few studies on monitoring crowds in a large-scale crowded environment, while there exists technology drawbacks and a lack of mature systems. Aiming to solve the crowd counting problem with high-density under complex environments, a feature fusion-based deep convolutional neural network method FF-CNN (Feature Fusion of Convolutional Neural Network) was proposed in this paper. The proposed FF-CNN mapped the crowd image to its crowd density map, and then obtained the head count by integration. The geometry adaptive kernels were adopted to generate high-quality density maps which were used as ground truths for network training. The deconvolution technique was used to achieve the fusion of high-level and low-level features to get richer features, and two loss functions, i.e., density map loss and absolute count loss, were used for joint optimization. In order to increase the sample diversity, the original images were cropped with a random cropping method for each iteration. The experimental results of FF-CNN on the ShanghaiTech public dataset showed that the fusion of low-level and high-level features can extract richer features to improve the precision of density map estimation, and further improve the accuracy of crowd counting.


2020 ◽  
Author(s):  
Liu Bai ◽  
Cheng Wu ◽  
Yufeng Lin ◽  
Jin Zhang ◽  
Jie Sheng ◽  
...  

Abstract With the rapid growth of the world's population and the rapid development of urbanization, the issue of crowd gathering safety has aroused widespread concern in society. Extensive video surveillance systems provide rich data support for dense crowd management. Video-based crowd counting and density estimation methods are the core technologies to ensure the safety of crowd gathering. Different from single-view video analysis, cross-source multi-view multi-granularity video contains more cross-information. The complementary sharing of information is of great help to solve the problems such as occlusion in the current crowd counting. Therefore, this article proposes a crowd counting method based on cross-source multi-view and multi-granularity video distributed information fusion. By establishing a distributed structure from different cameras that matches low-altitude and high-altitude views, it uses fine-grained from low-altitude images. The high-resolution local information corrects and supplements the global information from high-altitude images, so as to calculate a more accurate and global number and density of people. This method is actually applied to the landmark building of Suzhou Life City Square. The changes in the number of people and the movement situation during the evacuation are analyzed and evaluated, and good results are obtained.


2020 ◽  
Vol 31 (8) ◽  
pp. 2705-2715 ◽  
Author(s):  
Xiaoheng Jiang ◽  
Li Zhang ◽  
Pei Lv ◽  
Yibo Guo ◽  
Ruijie Zhu ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jingfan Tang ◽  
Meijia Zhou ◽  
Pengfei Li ◽  
Min Zhang ◽  
Ming Jiang

The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).


Author(s):  
Rodolfo Quispe ◽  
Darwin Ttito ◽  
Adín Rivera ◽  
Helio Pedrini

Crowd scene analysis has received a lot of attention recently due to a wide variety of applications, e.g., forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting [1–6], whose main purpose is to estimate the number of people present in a single image. A multi-stream convolutional neural network is developed and evaluated in this paper, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that the use of our ground truth generation methods achieves superior results.


Author(s):  
Hui Lin ◽  
Xiaopeng Hong ◽  
Zhiheng Ma ◽  
Xing Wei ◽  
Yunfeng Qiu ◽  
...  

Traditional crowd counting approaches usually use Gaussian assumption to generate pseudo density ground truth, which suffers from problems like inaccurate estimation of the Gaussian kernel sizes. In this paper, we propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. First, crowd counting is formulated as a measure matching problem. Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching. Third, we propose a self-supervised mechanism by devising a Sinkhorn scale consistency loss to resist scale changes. Finally, an efficient optimization method is provided to minimize the overall loss function. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++ and NWPU have validated the proposed method.


Author(s):  
H.A. Cohen ◽  
T.W. Jeng ◽  
W. Chiu

This tutorial will discuss the methodology of low dose electron diffraction and imaging of crystalline biological objects, the problems of data interpretation for two-dimensional projected density maps of glucose embedded protein crystals, the factors to be considered in combining tilt data from three-dimensional crystals, and finally, the prospects of achieving a high resolution three-dimensional density map of a biological crystal. This methodology will be illustrated using two proteins under investigation in our laboratory, the T4 DNA helix destabilizing protein gp32*I and the crotoxin complex crystal.


2020 ◽  
Vol 2020 (10) ◽  
pp. 64-1-64-5
Author(s):  
Mustafa I. Jaber ◽  
Christopher W. Szeto ◽  
Bing Song ◽  
Liudmila Beziaeva ◽  
Stephen C. Benz ◽  
...  

In this paper, we propose a patch-based system to classify non-small cell lung cancer (NSCLC) diagnostic whole slide images (WSIs) into two major histopathological subtypes: adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). Classifying patients accurately is important for prognosis and therapy decisions. The proposed system was trained and tested on 876 subtyped NSCLC gigapixel-resolution diagnostic WSIs from 805 patients – 664 in the training set and 141 in the test set. The algorithm has modules for: 1) auto-generated tumor/non-tumor masking using a trained residual neural network (ResNet34), 2) cell-density map generation (based on color deconvolution, local drain segmentation, and watershed transformation), 3) patch-level feature extraction using a pre-trained ResNet34, 4) a tower of linear SVMs for different cell ranges, and 5) a majority voting module for aggregating subtype predictions in unseen testing WSIs. The proposed system was trained and tested on several WSI magnifications ranging from x4 to x40 with a best ROC AUC of 0.95 and an accuracy of 0.86 in test samples. This fully-automated histopathology subtyping method outperforms similar published state-of-the-art methods for diagnostic WSIs.


Sign in / Sign up

Export Citation Format

Share Document