scholarly journals Multi-Stream Networks and Ground Truth Generation for Crowd Counting

Author(s):  
Rodolfo Quispe ◽  
Darwin Ttito ◽  
Adín Rivera ◽  
Helio Pedrini

Crowd scene analysis has received a lot of attention recently due to a wide variety of applications, e.g., forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting [1–6], whose main purpose is to estimate the number of people present in a single image. A multi-stream convolutional neural network is developed and evaluated in this paper, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that the use of our ground truth generation methods achieves superior results.

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 703
Author(s):  
Jun Zhang ◽  
Jiaze Liu ◽  
Zhizhong Wang

Owing to the increased use of urban rail transit, the flow of passengers on metro platforms tends to increase sharply during peak periods. Monitoring passenger flow in such areas is important for security-related reasons. In this paper, in order to solve the problem of metro platform passenger flow detection, we propose a CNN (convolutional neural network)-based network called the MP (metro platform)-CNN to accurately count people on metro platforms. The proposed method is composed of three major components: a group of convolutional neural networks is used on the front end to extract image features, a multiscale feature extraction module is used to enhance multiscale features, and transposed convolution is used for upsampling to generate a high-quality density map. Currently, existing crowd-counting datasets do not adequately cover all of the challenging situations considered in this study. Therefore, we collected images from surveillance videos of a metro platform to form a dataset containing 627 images, with 9243 annotated heads. The results of the extensive experiments showed that our method performed well on the self-built dataset and the estimation error was minimum. Moreover, the proposed method could compete with other methods on four standard crowd-counting datasets.


2021 ◽  
Vol 7 (2) ◽  
pp. 21
Author(s):  
Roland Perko ◽  
Manfred Klopschitz ◽  
Alexander Almer ◽  
Peter M. Roth

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Siqi Tang ◽  
Zhisong Pan ◽  
Xingyu Zhou

This paper proposes an accurate crowd counting method based on convolutional neural network and low-rank and sparse structure. To this end, we firstly propose an effective deep-fusion convolutional neural network to promote the density map regression accuracy. Furthermore, we figure out that most of the existing CNN based crowd counting methods obtain overall counting by direct integral of estimated density map, which limits the accuracy of counting. Instead of direct integral, we adopt a regression method based on low-rank and sparse penalty to promote accuracy of the projection from density map to global counting. Experiments demonstrate the importance of such regression process on promoting the crowd counting performance. The proposed low-rank and sparse based deep-fusion convolutional neural network (LFCNN) outperforms existing crowd counting methods and achieves the state-of-the-art performance.


Author(s):  
Han Jia ◽  
Xuecheng Zou

A major problem of counting high-density crowded scenes is the lack of flexibility and robustness exhibited by existing methods, and almost all recent state-of-the-art methods only show good performance in estimation errors and density map quality for select datasets. The biggest challenge faced by these methods is the analysis of similar features between the crowd and background, as well as overlaps between individuals. Hence, we propose a light and easy-to-train network for congestion cognition based on dilated convolution, which can exponentially enlarge the receptive field, preserve original resolution, and generate a high-quality density map. With the dilated convolutional layers, the counting accuracy can be enhanced as the feature map keeps its original resolution. By removing fully-connected layers, the network architecture becomes more concise, thereby reducing resource consumption significantly. The flexibility and robustness improvements of the proposed network compared to previous methods were validated using the variance of data size and different overlap levels of existing open source datasets. Experimental results showed that the proposed network is suitable for transfer learning on different datasets and enhances crowd counting in highly congested scenes. Therefore, the network is expected to have broader applications, for example in Internet of Things and portable devices.


2018 ◽  
Vol 8 (12) ◽  
pp. 2367 ◽  
Author(s):  
Hongling Luo ◽  
Jun Sang ◽  
Weiqun Wu ◽  
Hong Xiang ◽  
Zhili Xiang ◽  
...  

In recent years, the trampling events due to overcrowding have occurred frequently, which leads to the demand for crowd counting under a high-density environment. At present, there are few studies on monitoring crowds in a large-scale crowded environment, while there exists technology drawbacks and a lack of mature systems. Aiming to solve the crowd counting problem with high-density under complex environments, a feature fusion-based deep convolutional neural network method FF-CNN (Feature Fusion of Convolutional Neural Network) was proposed in this paper. The proposed FF-CNN mapped the crowd image to its crowd density map, and then obtained the head count by integration. The geometry adaptive kernels were adopted to generate high-quality density maps which were used as ground truths for network training. The deconvolution technique was used to achieve the fusion of high-level and low-level features to get richer features, and two loss functions, i.e., density map loss and absolute count loss, were used for joint optimization. In order to increase the sample diversity, the original images were cropped with a random cropping method for each iteration. The experimental results of FF-CNN on the ShanghaiTech public dataset showed that the fusion of low-level and high-level features can extract richer features to improve the precision of density map estimation, and further improve the accuracy of crowd counting.


Sensors ◽  
2019 ◽  
Vol 19 (20) ◽  
pp. 4434 ◽  
Author(s):  
Sangwon Kim ◽  
Jaeyeal Nam ◽  
Byoungchul Ko

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.


2020 ◽  
Vol 34 (07) ◽  
pp. 12837-12844
Author(s):  
Qi Zhang ◽  
Antoni B. Chan

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.


Author(s):  
Ravinath Kausik ◽  
◽  
Augustin Prado ◽  
Vasileios-Marios Gkortsas ◽  
Lalitha Venkataramanan ◽  
...  

The computation of permeability is vital for reservoir characterization because it is a key parameter in the reservoir models used for estimating and optimizing hydrocarbon production. Permeability is routinely predicted as a correlation from near-wellbore formation properties measured through wireline logs. Several such correlations, namely Schlumberger-Doll Research (SDR) permeability and Timur-Coates permeability models using nuclear magnetic resonance (NMR) measurements, K-lambda using mineralogy, and other variants, have often been used, with moderate success. In addition to permeability, the determination of the uncertainties, both epistemic (model) and aleatoric (data), are important for interpreting variations in the predictions of the reservoir models. In this paper, we demonstrate a novel dual deep neural network framework encompassing a Bayesian neural network (BNN) and an artificial neural network (ANN) for determining accurate permeability values along with associated uncertainties. Deep-learning techniques have been shown to be effective for regression problems but quantifying the uncertainty of their predictions and separating them into the epistemic and aleatoric fractions is still considered challenging. This is especially vital for petrophysical answer products because these algorithms need the ability to flag data from new geological formations that the model was not trained on as “out of distribution” and assign them higher uncertainty. Additionally, the model outputs need sensitivity to heteroscedastic aleatoric noise in the feature space arising due to tool and geological origins. Reducing these uncertainties is key to designing intelligent logging tools and applications, such as automated log interpretation. In this paper, we train a BNN with NMR and mineralogy data to determine permeability with associated epistemic uncertainty, obtained by determining the posterior weight distributions of the network by using variational inference. This provides us the ability to differentiate in- and out-of-distribution predictions, thereby identifying the suitability of the trained models for application in new geological formations. The errors in the prediction of the BNN are fed into a second ANN trained to correlate the predicted uncertainty to the error of the first BNN. Both networks are trained simultaneously and therefore optimized together to estimate permeability and associated uncertainty. The machine-learning permeability model is trained on a “ground-truth” core database and demonstrates considerable improvement over traditional SDR and Timur-Coates permeability models on wells from the Ivar Aasen Field. We also demonstrate the value of information (VOI) of different logging measurements by replacing the logs with their median values from nearby wells and studying the increase in the mean square errors.


2019 ◽  
Author(s):  
Henry Pinkard ◽  
Zachary Phillips ◽  
Arman Babakhani ◽  
Daniel A. Fletcher ◽  
Laura Waller

Maintaining an in-focus image over long time scales is an essential and non-trivial task for a variety of microscopic imaging applications. Here, we present an autofocusing method that is inexpensive, fast, and robust. It requires only the addition of one or a few off-axis LEDs to a conventional transmitted light microscope. Defocus distance can be estimated and corrected based on a single image under this LED illumination using a neural network that is small enough to be trained on a desktop CPU in a few hours. In this work, we detail the procedure for generating data and training such a network, explore practical limits, and describe relevant design principles governing the illumination source and network architecture.


Sign in / Sign up

Export Citation Format

Share Document