A Deep-Fusion Network for Crowd Counting in High-Density Crowded Scenes

Sultan Daud Khan; Yasir Salih; Basim Zafar; Abdulfattah Noorwali

doi:10.1007/s44196-021-00016-x

A Deep-Fusion Network for Crowd Counting in High-Density Crowded Scenes

International Journal of Computational Intelligence Systems ◽

10.1007/s44196-021-00016-x ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Sultan Daud Khan ◽

Yasir Salih ◽

Basim Zafar ◽

Abdulfattah Noorwali

Keyword(s):

Computational Cost ◽

Absolute Error ◽

Input Image ◽

High Density ◽

Fusion Model ◽

Public Places ◽

Benchmark Datasets ◽

Deep Layers ◽

Density Map ◽

Hierarchical Features

AbstractPeople counting has been investigated extensively as a tool to increase the individual’s safety and to avoid crowd hazards at public places. It is a challenging task especially in high-density environment such as Hajj and Umrah, where millions of people gathered in a constrained environment to perform rituals. This is due to large variations of scales of people across different scenes. To solve scale problem, a simple and effective solution is to use an image pyramid. However, heavy computational cost is required to process multiple levels of the pyramid. To overcome this issue, we propose deep-fusion model that efficiently and effectively leverages the hierarchical features exits in various convolutional layers deep neural network. Specifically, we propose a network that combine multiscale features from shallow to deep layers of the network and map the input image to a density map. The summation of peaks in the density map provides the final crowd count. To assess the effectiveness of the proposed deep network, we perform experiments on three different benchmark datasets, namely, UCF_CC_50, ShanghaiTech, and UCF-QNRF. From experiments results, we show that the proposed framework outperforms other state-of-the-art methods by achieving low Mean Absolute Error (MAE) and Mean Square Error (MSE) values.

Download Full-text

MH-MetroNet—A Multi-Head CNN for Passenger-Crowd Attendance Estimation

Journal of Imaging ◽

10.3390/jimaging6070062 ◽

2020 ◽

Vol 6 (7) ◽

pp. 62

Author(s):

Pier Luigi Mazzeo ◽

Riccardo Contino ◽

Paolo Spagnolo ◽

Cosimo Distante ◽

Ettore Stella ◽

...

Keyword(s):

Network Architecture ◽

Network Performance ◽

State Of The Art ◽

Mean Absolute Error ◽

Absolute Error ◽

Input Image ◽

Mean Square ◽

Metro Station ◽

Crowd Counting ◽

Density Map

Knowing an accurate passengers attendance estimation on each metro car contributes to the safely coordination and sorting the crowd-passenger in each metro station. In this work we propose a multi-head Convolutional Neural Network (CNN) architecture trained to infer an estimation of passenger attendance in a metro car. The proposed network architecture consists of two main parts: a convolutional backbone, which extracts features over the whole input image, and a multi-head layers able to estimate a density map, needed to predict the number of people within the crowd image. The network performance is first evaluated on publicly available crowd counting datasets, including the ShanghaiTech part_A, ShanghaiTech part_B and UCF_CC_50, and then trained and tested on our dataset acquired in subway cars in Italy. In both cases a comparison is made against the most relevant and latest state of the art crowd counting architectures, showing that our proposed MH-MetroNet architecture outperforms in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE) and passenger-crowd people number prediction.

Download Full-text

Nanosecond Photodynamics Simulations of a Cis-Trans Isomerization Are Enabled by Machine Learning

10.26434/chemrxiv.13047863 ◽

2020 ◽

Author(s):

Jingbai Li ◽

Patrick Reiser ◽

André Eberhard ◽

Pascal Friederich ◽

Steven Lopez

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Excited State ◽

Adaptive Sampling ◽

Computational Cost ◽

Ground Truth ◽

Absolute Error ◽

Photochemical Reactions ◽

Computational Techniques ◽

Full Potential

Photochemical reactions are being increasingly used to construct complex molecular architectures with mild and straightforward reaction conditions. Computational techniques are increasingly important to understand the reactivities and chemoselectivities of photochemical isomerization reactions because they offer molecular bonding information along the excited-state(s) of photodynamics. These photodynamics simulations are resource-intensive and are typically limited to 1–10 picoseconds and 1,000 trajectories due to high computational cost. Most organic photochemical reactions have excited-state lifetimes exceeding 1 picosecond, which places them outside possible computational studies. Westermeyr et al. demonstrated that a machine learning approach could significantly lengthen photodynamics simulation times for a model system, methylenimmonium cation (CH2NH2+).We have developed a Python-based code, Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics (PyRAI2MD), to accomplish the unprecedented 10 ns cis-trans photodynamics of trans-hexafluoro-2-butene (CF3–CH=CH–CF3) in 3.5 days. The same simulation would take approximately 58 years with ground-truth multiconfigurational dynamics. We proposed an innovative scheme combining Wigner sampling, geometrical interpolations, and short-time quantum chemical trajectories to effectively sample the initial data, facilitating the adaptive sampling to generate an informative and data-efficient training set with 6,232 data points. Our neural networks achieved chemical accuracy (mean absolute error of 0.032 eV). Our 4,814 trajectories reproduced the S1 half-life (60.5 fs), the photochemical product ratio (trans: cis = 2.3: 1), and autonomously discovered a pathway towards a carbene. The neural networks have also shown the capability of generalizing the full potential energy surface with chemically incomplete data (trans → cis but not cis → trans pathways) that may offer future automated photochemical reaction discoveries.

Download Full-text

Accurate and Transferable Multitask Prediction of Chemical Properties with an Atoms-in-Molecule Neural Network

10.26434/chemrxiv.7151435.v2 ◽

2018 ◽

Author(s):

Roman Zubatyuk ◽

Justin S. Smith ◽

Jerzy Leszczynski ◽

Olexandr Isayev

Keyword(s):

Neural Network ◽

Molecular System ◽

Computational Cost ◽

Chemical Properties ◽

The State ◽

Molecular Properties ◽

Training Data ◽

Dft Methods ◽

Benchmark Datasets ◽

Quantum Phenomena

Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.

Download Full-text

A Stochastic Model for Block Segmentation of Images Based on the Quadtree and the Bayes Code for It

Entropy ◽

10.3390/e23080991 ◽

2021 ◽

Vol 23 (8) ◽

pp. 991

Author(s):

Yuta Nakahara ◽

Toshiyasu Matsushima

Keyword(s):

Computational Cost ◽

Block Size ◽

Input Image ◽

Generative Model ◽

Image Size ◽

Variable Block ◽

General Data ◽

The Difference ◽

Segmentation Of Images ◽

Target Data

In information theory, lossless compression of general data is based on an explicit assumption of a stochastic generative model on target data. However, in lossless image compression, researchers have mainly focused on the coding procedure that outputs the coded sequence from the input image, and the assumption of the stochastic generative model is implicit. In these studies, there is a difficulty in discussing the difference between the expected code length and the entropy of the stochastic generative model. We solve this difficulty for a class of images, in which they have non-stationarity among segments. In this paper, we propose a novel stochastic generative model of images by redefining the implicit stochastic generative model in a previous coding procedure. Our model is based on the quadtree so that it effectively represents the variable block size segmentation of images. Then, we construct the Bayes code optimal for the proposed stochastic generative model. It requires the summation of all possible quadtrees weighted by their posterior. In general, its computational cost increases exponentially for the image size. However, we introduce an efficient algorithm to calculate it in the polynomial order of the image size without loss of optimality. As a result, the derived algorithm has a better average coding rate than that of JBIG.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Structured Gaussian Process Regression of Music Mood

Fundamenta Informaticae ◽

10.3233/fi-2020-1970 ◽

2020 ◽

Vol 176 (2) ◽

pp. 183-203

Author(s):

Santosh Chapaneri ◽

Deepak Jayaswal

Keyword(s):

Gaussian Process ◽

Computational Cost ◽

Gaussian Process Regression ◽

Gaussian Mixture ◽

Feature Representation ◽

Affective Content ◽

Variational Bayesian Inference ◽

Benchmark Datasets ◽

Valence And Arousal ◽

Structured Regression

Modeling the music mood has wide applications in music categorization, retrieval, and recommendation systems; however, it is challenging to computationally model the affective content of music due to its subjective nature. In this work, a structured regression framework is proposed to model the valence and arousal mood dimensions of music using a single regression model at a linear computational cost. To tackle the subjectivity phenomena, a confidence-interval based estimated consensus is computed by modeling the behavior of various annotators (e.g. biased, adversarial) and is shown to perform better than using the average annotation values. For a compact feature representation of music clips, variational Bayesian inference is used to learn the Gaussian mixture model representation of acoustic features and chord-related features are used to improve the valence estimation by probing the chord progressions between chroma frames. The dimensionality of features is further reduced using an adaptive version of kernel PCA. Using an efficient implementation of twin Gaussian process for structured regression, the proposed work achieves a significant improvement in R2 for arousal and valence dimensions relative to state-of-the-art techniques on two benchmark datasets for music mood estimation.

Download Full-text

No-Reference Image Quality Assessment with Multi-Scale Orderless Pooling of Deep Features

Journal of Imaging ◽

10.3390/jimaging7070112 ◽

2021 ◽

Vol 7 (7) ◽

pp. 112

Author(s):

Domonkos Varga

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Multiple Scales ◽

Digital Images ◽

Input Image ◽

Perceptual Quality ◽

Reference Image ◽

Benchmark Datasets ◽

In The Wild

The goal of no-reference image quality assessment (NR-IQA) is to evaluate their perceptual quality of digital images without using the distortion-free, pristine counterparts. NR-IQA is an important part of multimedia signal processing since digital images can undergo a wide variety of distortions during storage, compression, and transmission. In this paper, we propose a novel architecture that extracts deep features from the input image at multiple scales to improve the effectiveness of feature extraction for NR-IQA using convolutional neural networks. Specifically, the proposed method extracts deep activations for local patches at multiple scales and maps them onto perceptual quality scores with the help of trained Gaussian process regressors. Extensive experiments demonstrate that the introduced algorithm performs favorably against the state-of-the-art methods on three large benchmark datasets with authentic distortions (LIVE In the Wild, KonIQ-10k, and SPAQ).

Download Full-text

Mapping current and future European public water withdrawals and consumption

Hydrology and Earth System Sciences ◽

10.5194/hess-18-407-2014 ◽

2014 ◽

Vol 18 (2) ◽

pp. 407-416 ◽

Cited By ~ 12

Author(s):

I. Vandecasteele ◽

A. Bianchi ◽

F. Batista e Silva ◽

C. Lavalle ◽

O. Batelaan

Keyword(s):

Population Density ◽

Temporal Trends ◽

Absolute Error ◽

Reference Scenario ◽

Total Water ◽

Country Level ◽

European Public ◽

Density Map ◽

Per Capita ◽

Water Withdrawals

Abstract. In Europe, public water withdrawals make up on average 30% and in some cases up to 60% of total water withdrawals. These withdrawals are becoming increasingly important with growing population density; hence there is a need to understand the spatial and temporal trends involved. Pan-European public/municipal water withdrawals and consumption were mapped for 2006 and forecasted for 2030. Population and tourism density were assumed to be the main driving factors for withdrawals. Country-level statistics on public water withdrawals were disaggregated to a combined population and tourism density map (the "user" density map) computed for 2006. The methodology was validated using actual regional withdrawal statistics from France for 2006. The total absolute error (TAE) calculated was proven to be reduced by taking into account the tourism density in addition to the population density. In order to forecast the map to 2030 we considered a reference scenario where per capita withdrawals were kept constant in time. Although there are large variations from region to region, this resulted in a European average increase of water withdrawals of 16%. If we extrapolate the average reduction in per capita withdrawals seen between 2000 and 2008, we forecast a reduction in average total water withdrawals of 4%. Considering a scenario where all countries converge to an optimal water use efficiency, we see an average decrease of 28%.

Download Full-text

Feature Extraction in Six Blocks to Detect and Recognize English Numbers

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.10.37 ◽

2021 ◽

pp. 3790-3803

Author(s):

Heba Kh. Abbas ◽

Haidar J. Mohamad

Keyword(s):

Fuzzy Logic ◽

Feature Extraction ◽

Computational Cost ◽

Absolute Error ◽

Suggested Method ◽

Support Vector ◽

Horizontal Line ◽

Svm Algorithm ◽

Fuzzy Logic Method ◽

Logic Method

The Fuzzy Logic method was implemented to detect and recognize English numbers in this paper. The extracted features within this method make the detection easy and accurate. These features depend on the crossing point of two vertical lines with one horizontal line to be used from the Fuzzy logic method, as shown by the Matlab code in this study. The font types are Times New Roman, Arial, Calabria, Arabic, and Andalus with different font sizes of 10, 16, 22, 28, 36, 42, 50 and 72. These numbers are isolated automatically with the designed algorithm, for which the code is also presented. The number’s image is tested with the Fuzzy algorithm depending on six-block properties only. Groups of regions (High, Medium, and Low) for each number showed unique behavior to recognize any number. Normalized Absolute Error (NAE) equation was used to evaluate the error percentage for the suggested algorithm. The lowest error was 0.001% compared with the real number. The data were checked by the support vector machine (SVM) algorithm to confirm the quality and the efficiency of the suggested method, where the matching was found to be 100% between the data of the suggested method and SVM. The six properties offer a new method to build a rule-based feature extraction technique in different applications and detect any text recognition with a low computational cost.

Download Full-text

Pre-Processing Filter Reflecting Human Visual Perception to Improve Saliency Detection Performance

Electronics ◽

10.3390/electronics10232892 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2892

Author(s):

Kyungjun Lee ◽

Seungwoo Wee ◽

Jechang Jeong

Keyword(s):

Saliency Detection ◽

Visual Saliency ◽

Ground Truth ◽

Bilateral Filter ◽

Input Image ◽

Human Visual Perception ◽

Difference Of Gaussians ◽

Surrounding Environment ◽

Benchmark Datasets ◽

Previous State

Salient object detection is a method of finding an object within an image that a person determines to be important and is expected to focus on. Various features are used to compute the visual saliency, and in general, the color and luminance of the scene are widely used among the spatial features. However, humans perceive the same color and luminance differently depending on the influence of the surrounding environment. As the human visual system (HVS) operates through a very complex mechanism, both neurobiological and psychological aspects must be considered for the accurate detection of salient objects. To reflect this characteristic in the saliency detection process, we have proposed two pre-processing methods to apply to the input image. First, we applied a bilateral filter to improve the segmentation results by smoothing the image so that only the overall context of the image remains while preserving the important borders of the image. Second, although the amount of light is the same, it can be perceived with a difference in the brightness owing to the influence of the surrounding environment. Therefore, we applied oriented difference-of-Gaussians (ODOG) and locally normalized ODOG (LODOG) filters that adjust the input image by predicting the brightness as perceived by humans. Experiments on five public benchmark datasets for which ground truth exists show that our proposed method further improves the performance of previous state-of-the-art methods.

Download Full-text