scholarly journals Direct Measure Matching for Crowd Counting

Author(s):  
Hui Lin ◽  
Xiaopeng Hong ◽  
Zhiheng Ma ◽  
Xing Wei ◽  
Yunfeng Qiu ◽  
...  

Traditional crowd counting approaches usually use Gaussian assumption to generate pseudo density ground truth, which suffers from problems like inaccurate estimation of the Gaussian kernel sizes. In this paper, we propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. First, crowd counting is formulated as a measure matching problem. Second, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching. Third, we propose a self-supervised mechanism by devising a Sinkhorn scale consistency loss to resist scale changes. Finally, an efficient optimization method is provided to minimize the overall loss function. Extensive experiments on four challenging crowd counting datasets namely ShanghaiTech, UCF-QNRF, JHU++ and NWPU have validated the proposed method.

2020 ◽  
Vol 34 (07) ◽  
pp. 12837-12844
Author(s):  
Qi Zhang ◽  
Antoni B. Chan

Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS) has been proposed, which fuses multiple camera views using a CNN to predict a 2D scene-level density map on the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion extracts more information of the people along z-dimension (height), which helps to solve the scale variations across multiple views. The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density. We also explore the projection consistency among the 3D prediction and the ground-truth in the 2D views to further enhance the counting performance. The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.


Drones ◽  
2020 ◽  
Vol 4 (1) ◽  
pp. 2 ◽  
Author(s):  
Anne-Flore Perrin ◽  
Vassilios Krassanakis ◽  
Lu Zhang ◽  
Vincent Ricordel ◽  
Matthieu Perreira Da Silva ◽  
...  

The fast and tremendous evolution of the unmanned aerial vehicle (UAV) imagery gives place to the multiplication of applications in various fields such as military and civilian surveillance, delivery services, and wildlife monitoring. Combining UAV imagery with study of dynamic salience further extends the number of future applications. Indeed, considerations of visual attention open the door to new avenues in a number of scientific fields such as compression, retargeting, and decision-making tools. To conduct saliency studies, we identified the need for new large-scale eye-tracking datasets for visual salience in UAV content. Therefore, we address this need by introducing the dataset EyeTrackUAV2. It consists of the collection of precise binocular gaze information (1000 Hz) over 43 videos (RGB, 30 fps, 1280 × 720 or 720 × 480). Thirty participants observed stimuli under both free viewing and task conditions. Fixations and saccades were then computed with the dispersion-threshold identification (I-DT) algorithm, while gaze density maps were calculated by filtering eye positions with a Gaussian kernel. An analysis of collected gaze positions provides recommendations for visual salience ground-truth generation. It also sheds light upon variations of saliency biases in UAV videos when opposed to conventional content, especially regarding the center bias.


Author(s):  
Brahim Boussidi ◽  
Peter Cornillon ◽  
Gavino Puggioni ◽  
Chelle Gentemann

This study was undertaken to derive and analyze the Advanced Microwave Scanning Radiometer - EOS (AMSR-E) sea surface temperature (SST) footprint associated with the Remote Sensing Systems (RSS) Level-2 (L2) product. The footprint, in this case, is characterized by the weight attributed to each 4 4 km square contributing to the SST value of a given AMSR-E pixel. High-resolution L2 SST fields obtained from the MODerate-resolution Imaging Spectroradiometer (MODIS), carried on the same spacecraft as AMSR-E, are used as the sub-resolution “ground truth“ from which the AMSR-E footprint is determined. Mathematically, the approach is equivalent to a linear inversion problem, and its solution is pursued by means of a constrained least square approximation based on the bootstrap sampling procedure. The method yielded an elliptic-like Gaussian kernel with an aspect ratio 1.58, very close to the AMSR-E 6.93GHz channel aspect ratio, 1.7. (The 6.93GHz channel is the primary spectral frequency used to determine SST.) The semi-major axis of the estimated footprint is found to be alignedwith the instantaneous field-of-view of the sensor as expected fromthe geometric characteristics of AMSR-E. Footprintswere also analyzed year-by-year and as a function of latitude and found to be stable – no dependence on latitude or on time. Precise knowledge of the footprint is central for any satellite-derived product characterization and, in particular, for efforts to deconvolve the heavily oversampled AMSR-E SST fields and for studies devoted to product validation and comparison. A preliminarly analysis suggests that use of the derived footprint will reduce the variance between AMSR-E and MODIS fields compared to the results obtained.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Pengfei Li ◽  
Min Zhang ◽  
Jian Wan ◽  
Ming Jiang

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.


Author(s):  
Alessandro Graziosi ◽  
Giulio Iannello ◽  
Valerio Lapadula ◽  
Mario Merone ◽  
Marco Sabatini ◽  
...  

2019 ◽  
Vol 9 (2) ◽  
pp. 361-422
Author(s):  
Martin Genzel ◽  
Alexander Stollenwerk

Abstract This work theoretically studies the problem of estimating a structured high-dimensional signal $\boldsymbol{x}_0 \in{\mathbb{R}}^n$ from noisy $1$-bit Gaussian measurements. Our recovery approach is based on a simple convex program which uses the hinge loss function as data fidelity term. While such a risk minimization strategy is very natural to learn binary output models, such as in classification, its capacity to estimate a specific signal vector is largely unexplored. A major difficulty is that the hinge loss is just piecewise linear, so that its ‘curvature energy’ is concentrated in a single point. This is substantially different from other popular loss functions considered in signal estimation, e.g. the square or logistic loss, which are at least locally strongly convex. It is therefore somewhat unexpected that we can still prove very similar types of recovery guarantees for the hinge loss estimator, even in the presence of strong noise. More specifically, our non-asymptotic error bounds show that stable and robust reconstruction of $\boldsymbol{x}_0$ can be achieved with the optimal oversampling rate $O(m^{-1/2})$ in terms of the number of measurements $m$. Moreover, we permit a wide class of structural assumptions on the ground truth signal, in the sense that $\boldsymbol{x}_0$ can belong to an arbitrary bounded convex set $K \subset{\mathbb{R}}^n$. The proofs of our main results rely on some recent advances in statistical learning theory due to Mendelson. In particular, we invoke an adapted version of Mendelson’s small ball method that allows us to establish a quadratic lower bound on the error of the first-order Taylor approximation of the empirical hinge loss function.


2019 ◽  
Vol 9 (5) ◽  
pp. 951 ◽  
Author(s):  
Yong Li ◽  
Guofeng Tong ◽  
Xiance Du ◽  
Xiang Yang ◽  
Jianjun Zhang ◽  
...  

3D point cloud classification has wide applications in the field of scene understanding. Point cloud classification based on points can more accurately segment the boundary region between adjacent objects. In this paper, a point cloud classification algorithm based on a single point multilevel features fusion and pyramid neighborhood optimization are proposed for a Airborne Laser Scanning (ALS) point cloud. First, the proposed algorithm determines the neighborhood region of each point, after which the features of each single point are extracted. For the characteristics of the ALS point cloud, two new feature descriptors are proposed, i.e., a normal angle distribution histogram and latitude sampling histogram. Following this, multilevel features of a single point are constructed by multi-resolution of the point cloud and multi-neighborhood spaces. Next, the features are trained by the Support Vector Machine based on a Gaussian kernel function, and the points are classified by the trained model. Finally, a classification results optimization method based on a multi-scale pyramid neighborhood constructed by a multi-resolution point cloud is used. In the experiment, the algorithm is tested by a public dataset. The experimental results show that the proposed algorithm can effectively classify large-scale ALS point clouds. Compared with the existing algorithms, the proposed algorithm has a better classification performance.


Author(s):  
Y Cao ◽  
J Mao ◽  
H Ching ◽  
J Yang

Using the quality loss function developed by Taguchi, the manufacturing time and cost of a product can be reduced to improve the factory's competitiveness. However, the fuzziness in quality loss has not been considered in the Taguchi method. This article presents a fuzzy quality loss function model. First, fuzzy logic is used to describe the semantic of the quality, and the quality level is divided into several grades. Then the fuzzy quality loss function is developed utilizing the loss in monetary terms, which indicates the quality loss of each quality level and the normalized expected probability to each quality grade. Moreover, a new optimization model for tolerance design under fuzzy quality loss function is established. An example is used to illustrate the validity of the proposed model. The result shows that the proposed method is more flexible and can achieve the balance of quality and cost in tolerance design. It can be easily used in accordance with practical engineering applications.


2021 ◽  
Vol 5 (4 (113)) ◽  
pp. 45-54
Author(s):  
Alexander Nechaev ◽  
Vasily Meltsov ◽  
Dmitry Strabykin

Many advanced recommendatory models are implemented using matrix factorization algorithms. Experiments show that the quality of their performance depends significantly on the selected hyperparameters. Analysis of the effectiveness of using various methods for solving this problem of optimizing hyperparameters was made. It has shown that the use of classical Bayesian optimization which treats the model as a «black box» remains the standard solution. However, the models based on matrix factorization have a number of characteristic features. Their use makes it possible to introduce changes in the optimization process leading to a decrease in the time required to find the sought points without losing quality. Modification of the Gaussian process core which is used as a surrogate model for the loss function when performing the Bayesian optimization was proposed. The described modification at first iterations increases the variance of the values predicted by the Gaussian process over a given region of the hyperparameter space. In some cases, this makes it possible to obtain more information about the real form of the investigated loss function in less time. Experiments were carried out using well-known data sets for recommendatory systems. Total optimization time when applying the modification was reduced by 16 % (or 263 seconds) at best and remained the same at worst (less than 1-second difference). In this case, the expected error of the recommendatory model did not change (the absolute difference in values is two orders of magnitude lower than the value of error reduction in the optimization process). Thus, the use of the proposed modification contributes to finding a better set of hyperparameters in less time without loss of quality


Author(s):  
Y. Xun ◽  
W. Q. Yu

Abstract. As one of the important sources of meteorological information, satellite nephogram is playing an increasingly important role in the detection and forecast of disastrous weather. The predictions about the movement and transformation of cloud with certain timeliness can enhance the practicability of satellite nephogram. Based on the generative adversarial network in unsupervised learning, we propose a prediction model of time series nephogram, which construct the internal representation of cloud evolution accurately and realize nephogram prediction for the next several hours. We improve the traditional generative adversarial network by constructing the generator and discriminator used the multi-scale convolution network. After the scale transform process, different scales operate convolutions in parallel and then merge the features. This structure can solve the problem of long-term dependence in the traditional network, and both global and detailed features are considered. Then according to the network structure and practical application, we define a new loss function combined with adversarial loss function to accelerate the convergence of model and sharpen predictions which keeps the effectivity of predictions further. Our method has no need to carry out the stack mathematics calculation and the manual operations, has greatly enhanced the feasibility and the efficiency. The results show that this model can reasonably describe the basic characteristics and evolution trend of cloud cluster, the prediction nephogram has very high similarity to the ground-truth nephogram.


Sign in / Sign up

Export Citation Format

Share Document