A High-robustness and Low Resource-consumption Crowd Counting Model

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.6 ◽

2021 ◽

Vol 15 ◽

pp. 46-54

Author(s):

Han Jia ◽

Xuecheng Zou

Keyword(s):

Network Architecture ◽

Resource Consumption ◽

Portable Devices ◽

Estimation Errors ◽

Crowd Counting ◽

Density Map ◽

Almost All ◽

Fully Connected ◽

Crowded Scenes ◽

Counting Model

A major problem of counting high-density crowded scenes is the lack of flexibility and robustness exhibited by existing methods, and almost all recent state-of-the-art methods only show good performance in estimation errors and density map quality for select datasets. The biggest challenge faced by these methods is the analysis of similar features between the crowd and background, as well as overlaps between individuals. Hence, we propose a light and easy-to-train network for congestion cognition based on dilated convolution, which can exponentially enlarge the receptive field, preserve original resolution, and generate a high-quality density map. With the dilated convolutional layers, the counting accuracy can be enhanced as the feature map keeps its original resolution. By removing fully-connected layers, the network architecture becomes more concise, thereby reducing resource consumption significantly. The flexibility and robustness improvements of the proposed network compared to previous methods were validated using the variance of data size and different overlap levels of existing open source datasets. Experimental results showed that the proposed network is suitable for transfer learning on different datasets and enhances crowd counting in highly congested scenes. Therefore, the network is expected to have broader applications, for example in Internet of Things and portable devices.

Download Full-text

MH-MetroNet—A Multi-Head CNN for Passenger-Crowd Attendance Estimation

Journal of Imaging ◽

10.3390/jimaging6070062 ◽

2020 ◽

Vol 6 (7) ◽

pp. 62

Author(s):

Pier Luigi Mazzeo ◽

Riccardo Contino ◽

Paolo Spagnolo ◽

Cosimo Distante ◽

Ettore Stella ◽

...

Keyword(s):

Network Architecture ◽

Network Performance ◽

State Of The Art ◽

Mean Absolute Error ◽

Absolute Error ◽

Input Image ◽

Mean Square ◽

Metro Station ◽

Crowd Counting ◽

Density Map

Knowing an accurate passengers attendance estimation on each metro car contributes to the safely coordination and sorting the crowd-passenger in each metro station. In this work we propose a multi-head Convolutional Neural Network (CNN) architecture trained to infer an estimation of passenger attendance in a metro car. The proposed network architecture consists of two main parts: a convolutional backbone, which extracts features over the whole input image, and a multi-head layers able to estimate a density map, needed to predict the number of people within the crowd image. The network performance is first evaluated on publicly available crowd counting datasets, including the ShanghaiTech part_A, ShanghaiTech part_B and UCF_CC_50, and then trained and tested on our dataset acquired in subway cars in Italy. In both cases a comparison is made against the most relevant and latest state of the art crowd counting architectures, showing that our proposed MH-MetroNet architecture outperforms in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE) and passenger-crowd people number prediction.

Download Full-text

Multi-Stream Networks and Ground Truth Generation for Crowd Counting

International journal of electrical and computer engineering systems ◽

10.32985/ijeces.11.1.4 ◽

2020 ◽

Vol 11 (1) ◽

pp. 33-41

Author(s):

Rodolfo Quispe ◽

Darwin Ttito ◽

Adín Rivera ◽

Helio Pedrini

Keyword(s):

Neural Network ◽

Network Architecture ◽

Receptive Fields ◽

Ground Truth ◽

Scene Analysis ◽

Stream Networks ◽

Single Image ◽

Crowd Counting ◽

Ground Truth Generation ◽

Density Map

Crowd scene analysis has received a lot of attention recently due to a wide variety of applications, e.g., forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting [1–6], whose main purpose is to estimate the number of people present in a single image. A multi-stream convolutional neural network is developed and evaluated in this paper, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that the use of our ground truth generation methods achieves superior results.

Download Full-text

Convolutional Neural Network for Crowd Counting on Metro Platforms

Symmetry ◽

10.3390/sym13040703 ◽

2021 ◽

Vol 13 (4) ◽

pp. 703

Author(s):

Jun Zhang ◽

Jiaze Liu ◽

Zhizhong Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Estimation Error ◽

Image Features ◽

Urban Rail Transit ◽

Crowd Counting ◽

Passenger Flow ◽

Urban Rail ◽

Density Map ◽

Flow Detection

Owing to the increased use of urban rail transit, the flow of passengers on metro platforms tends to increase sharply during peak periods. Monitoring passenger flow in such areas is important for security-related reasons. In this paper, in order to solve the problem of metro platform passenger flow detection, we propose a CNN (convolutional neural network)-based network called the MP (metro platform)-CNN to accurately count people on metro platforms. The proposed method is composed of three major components: a group of convolutional neural networks is used on the front end to extract image features, a multiscale feature extraction module is used to enhance multiscale features, and transposed convolution is used for upsampling to generate a high-quality density map. Currently, existing crowd-counting datasets do not adequately cover all of the challenging situations considered in this study. Therefore, we collected images from surveillance videos of a metro platform to form a dataset containing 627 images, with 9243 annotated heads. The results of the extensive experiments showed that our method performed well on the self-built dataset and the estimation error was minimum. Moreover, the proposed method could compete with other methods on four standard crowd-counting datasets.

Download Full-text

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting

Mathematical Problems in Engineering ◽

10.1155/2017/5046727 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Siqi Tang ◽

Zhisong Pan ◽

Xingyu Zhou

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Regression Method ◽

Low Rank ◽

Counting Method ◽

Direct Integral ◽

Crowd Counting ◽

Counting Methods ◽

Density Map

This paper proposes an accurate crowd counting method based on convolutional neural network and low-rank and sparse structure. To this end, we firstly propose an effective deep-fusion convolutional neural network to promote the density map regression accuracy. Furthermore, we figure out that most of the existing CNN based crowd counting methods obtain overall counting by direct integral of estimated density map, which limits the accuracy of counting. Instead of direct integral, we adopt a regression method based on low-rank and sparse penalty to promote accuracy of the projection from density map to global counting. Experiments demonstrate the importance of such regression process on promoting the crowd counting performance. The proposed low-rank and sparse based deep-fusion convolutional neural network (LFCNN) outperforms existing crowd counting methods and achieves the state-of-the-art performance.

Download Full-text

Multiscale Aggregate Networks with Dense Connections for Crowd Counting

Computational Intelligence and Neuroscience ◽

10.1155/2021/9996232 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Pengfei Li ◽

Min Zhang ◽

Jian Wan ◽

Ming Jiang

Keyword(s):

Mean Squared Error ◽

Absolute Error ◽

Image Features ◽

Convolutional Network ◽

Crowd Counting ◽

Squared Error ◽

Crowd Density ◽

Density Maps ◽

Density Map ◽

Map Decoder

The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo’10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.

Download Full-text

Diagnosis of hearing deficiency using EEG based AEP signals: CWT and improved-VGG16 pipeline

PeerJ Computer Science ◽

10.7717/peerj-cs.638 ◽

2021 ◽

Vol 7 ◽

pp. e638

Author(s):

Md Nahidul Islam ◽

Norizam Sulaiman ◽

Fahmid Al Farid ◽

Jia Uddin ◽

Salem A. Alyami ◽

...

Keyword(s):

Hearing Loss ◽

Network Architecture ◽

Auditory Evoked Potential ◽

Human Communication ◽

Time Frequency ◽

The Neural Network ◽

Cortex Area ◽

Wide Range ◽

Functional Reliability ◽

Fully Connected

Hearing deficiency is the world’s most common sensation of impairment and impedes human communication and learning. Early and precise hearing diagnosis using electroencephalogram (EEG) is referred to as the optimum strategy to deal with this issue. Among a wide range of EEG control signals, the most relevant modality for hearing loss diagnosis is auditory evoked potential (AEP) which is produced in the brain’s cortex area through an auditory stimulus. This study aims to develop a robust intelligent auditory sensation system utilizing a pre-train deep learning framework by analyzing and evaluating the functional reliability of the hearing based on the AEP response. First, the raw AEP data is transformed into time-frequency images through the wavelet transformation. Then, lower-level functionality is eliminated using a pre-trained network. Here, an improved-VGG16 architecture has been designed based on removing some convolutional layers and adding new layers in the fully connected block. Subsequently, the higher levels of the neural network architecture are fine-tuned using the labelled time-frequency images. Finally, the proposed method’s performance has been validated by a reputed publicly available AEP dataset, recorded from sixteen subjects when they have heard specific auditory stimuli in the left or right ear. The proposed method outperforms the state-of-art studies by improving the classification accuracy to 96.87% (from 57.375%), which indicates that the proposed improved-VGG16 architecture can significantly deal with AEP response in early hearing loss diagnosis.

Download Full-text

Validation of a Mexican food photograph album as a tool to visually estimate food amounts in adolescents

British Journal Of Nutrition ◽

10.1017/s0007114512002127 ◽

2012 ◽

Vol 109 (5) ◽

pp. 944-952 ◽

Cited By ~ 14

Author(s):

M. Fernanda Bernal-Orozco ◽

Barbara Vizmanos-Lamotte ◽

Norma P. Rodríguez-Rocha ◽

Gabriela Macedo-Ojeda ◽

María Orozco-Valerio ◽

...

Keyword(s):

High Schools ◽

Error Estimation ◽

Error Rate ◽

Lower Percentage ◽

Percentage Error ◽

Public High Schools ◽

Estimation Errors ◽

Portion Sizes ◽

The Mean ◽

Almost All

The aim of the present study was to validate a food photograph album (FPA) as a tool to visually estimate food amounts, and to compare this estimation with that attained through the use of measuring cups (MC) and food models (FM). We tested 163 foods over fifteen sessions (thirty subjects/session; 10–12 foods presented in two portion sizes, 20–24 plates/session). In each session, subjects estimated food amounts with the assistance of FPA, MC and FM. We compared (by portion and method) the mean estimated weight and the mean real weight. We also compared the percentage error estimation for each portion, and the mean food percentage error estimation between methods. In addition, we determined the percentage error estimation of each method. We included 463 adolescents from three public high schools (mean age 17·1 (sd1·2) years, 61·8 % females). All foods were assessed using FPA, 53·4 % of foods were assessed using MC, and FM was used for 18·4 % of foods. The mean estimated weight with all methods was statistically different compared with the mean real weight for almost all foods. However, a lower percentage error estimation was observed using FPA (2·3v. 56·9 % for MC and 325 % for FM,P< 0·001). Also, when analysing error rate ranges between methods, there were more observations (P< 0·001) with estimation errors higher than 40 % with the MC (56·1 %), than with the FPA (27·5 %) and FM (44·9 %). In conclusion, although differences between estimated and real weight were statistically significant for almost all foods, comparisons between methods showed FPA to be the most accurate tool for estimating food amounts.

Download Full-text