Hybrid Graph Neural Networks for Crowd Counting

Ao Luo; Fan Yang; Xin Li; Dong Nie; Zhicheng Jiao; Shangchen Zhou; Hong Cheng

doi:10.1609/aaai.v34i07.6839

Hybrid Graph Neural Networks for Crowd Counting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6839 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11693-11700 ◽

Cited By ~ 2

Author(s):

Ao Luo ◽

Fan Yang ◽

Xin Li ◽

Dong Nie ◽

Zhicheng Jiao ◽

...

Keyword(s):

Network Architecture ◽

Message Passing ◽

Large Scale ◽

State Of The Art ◽

Density Variation ◽

Feature Maps ◽

Crowd Counting ◽

Multi Scale ◽

Crowd Density ◽

Graph Neural Networks

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.

Download Full-text

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

ACM Transactions on Graphics ◽

10.1145/3450284 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-13

Author(s):

Lumin Yang ◽

Jiajie Zhuang ◽

Hongbo Fu ◽

Xiangzhi Wei ◽

Kun Zhou ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Large Scale ◽

State Of The Art ◽

Semantic Segmentation ◽

Structure Information ◽

Graph Neural Networks ◽

Node Labels ◽

Point Level

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.

Download Full-text

Congested Crowd Counting via Adaptive Multi-Scale Context Learning

Sensors ◽

10.3390/s21113777 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3777

Author(s):

Yani Zhang ◽

Huailin Zhao ◽

Zuodong Duan ◽

Liangjun Huang ◽

Jiahao Deng ◽

...

Keyword(s):

Density Estimation ◽

State Of The Art ◽

Spatial Context ◽

Context Information ◽

Crowd Counting ◽

Multi Scale ◽

Context Learning ◽

Crowd Density Estimation ◽

Crowd Density ◽

Density Results

In this paper, we propose a novel congested crowd counting network for crowd density estimation, i.e., the Adaptive Multi-scale Context Aggregation Network (MSCANet). MSCANet efficiently leverages the spatial context information to accomplish crowd density estimation in a complicated crowd scene. To achieve this, a multi-scale context learning block, called the Multi-scale Context Aggregation module (MSCA), is proposed to first extract different scale information and then adaptively aggregate it to capture the full scale of the crowd. Employing multiple MSCAs in a cascaded manner, the MSCANet can deeply utilize the spatial context information and modulate preliminary features into more distinguishing and scale-sensitive features, which are finally applied to a 1 × 1 convolution operation to obtain the crowd density results. Extensive experiments on three challenging crowd counting benchmarks showed that our model yielded compelling performance against the other state-of-the-art methods. To thoroughly prove the generality of MSCANet, we extend our method to two relevant tasks: crowd localization and remote sensing object counting. The extension experiment results also confirmed the effectiveness of MSCANet.

Download Full-text

Coloring Graph Neural Networks for Node Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/294 ◽

2020 ◽

Author(s):

George Dasoulas ◽

Ludovic Dos Santos ◽

Kevin Scaman ◽

Aladin Virmaux

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Structural Characteristics ◽

Expressive Power ◽

Continuous Functions ◽

Graph Classification ◽

Node Attributes ◽

Graph Neural Networks ◽

Coloring Graph

In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks (MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability, a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish, while being state-of-the-art on benchmark graph classification datasets.

Download Full-text

GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6758 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11037-11044

Author(s):

Lianghua Huang ◽

Xin Zhao ◽

Kaiqi Huang

Keyword(s):

Online Learning ◽

Success Rate ◽

Large Scale ◽

State Of The Art ◽

Temporal Consistency ◽

Post Processing ◽

Two Stage ◽

Multi Scale ◽

Instance Search

A key capability of a long-term tracker is to search for targets in very large areas (typically the entire image) to handle possible target absences or tracking failures. However, currently there is a lack of such a strong baseline for global instance search. In this work, we aim to bridge this gap. Specifically, we propose GlobalTrack, a pure global instance search based tracker that makes no assumption on the temporal consistency of the target's positions and scales. GlobalTrack is developed based on two-stage object detectors, and it is able to perform full-image and multi-scale search of arbitrary instances with only a single query as the guide. We further propose a cross-query loss to improve the robustness of our approach against distractors. With no online learning, no punishment on position or scale changes, no scale smoothing and no trajectory refinement, our pure global instance search based tracker achieves comparable, sometimes much better performance on four large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet), compared to state-of-the-art approaches that typically require complex post-processing. More importantly, our tracker runs without cumulative errors, i.e., any type of temporary tracking failures will not affect its performance on future frames, making it ideal for long-term tracking. We hope this work will be a strong baseline for long-term tracking and will stimulate future works in this area.

Download Full-text

Discriminative Deep Hashing for Scalable Face Image Retrieval

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/315 ◽

2017 ◽

Cited By ~ 14

Author(s):

Jie Lin ◽

Zechao Li ◽

Jinhui Tang

Keyword(s):

Image Retrieval ◽

Large Scale ◽

State Of The Art ◽

Face Image ◽

Superior Performance ◽

Prediction Errors ◽

Unified Framework ◽

Multi Scale ◽

Deep Hashing ◽

Hash Codes

With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for large-scale face image retrieval. The proposed network incorporates the end-to-end learning, the divide-and-encode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolution-pooling layers is proposed to extract multi-scale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divide-and-encode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some state-of-the-art hashing methods.

Download Full-text

SafeNet: Scale-normalization and Anchor-based Feature Extraction Network for Person Re-identification

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/156 ◽

2018 ◽

Author(s):

Kun Yuan ◽

Qian Zhang ◽

Chang Huang ◽

Shiming Xiang ◽

Chunhong Pan

Keyword(s):

Large Scale ◽

State Of The Art ◽

Spatial Distributions ◽

Body Parts ◽

Retrieval Task ◽

Multi Scale ◽

Aspect Ratios ◽

Scale Normalization ◽

Full Body

Person Re-identification (ReID) is a challenging retrieval task that requires matching a person's image across non-overlapping camera views. The quality of fulfilling this task is largely determined on the robustness of the features that are used to describe the person. In this paper, we show the advantage of jointly utilizing multi-scale abstract information to learn powerful features over full body and parts. A scale normalization module is proposed to balance different scales through residual-based integration. To exploit the information hidden in non-rigid body parts, we propose an anchor-based method to capture the local contents by stacking convolutions of kernels with various aspect ratios, which focus on different spatial distributions. Finally, a well-defined framework is constructed for simultaneously learning the representations of both full body and parts. Extensive experiments conducted on current challenging large-scale person ReID datasets, including Market1501, CUHK03 and DukeMTMC, demonstrate that our proposed method achieves the state-of-the-art results.

Download Full-text

Collaborative Reflection-Augmented Autoencoder Network for Recommender Systems

ACM Transactions on Information Systems ◽

10.1145/3467023 ◽

2022 ◽

Vol 40 (1) ◽

pp. 1-22

Author(s):

Lianghao Xia ◽

Chao Huang ◽

Yong Xu ◽

Huance Xu ◽

Xiang Li ◽

...

Keyword(s):

Collaborative Filtering ◽

Information Fusion ◽

Network Architecture ◽

State Of The Art ◽

Feature Space ◽

Collaborative Reflection ◽

Pairwise Preference ◽

Learning Techniques ◽

Benchmark Datasets ◽

Graph Neural Networks

As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, autoencoder, and graph neural networks. However, the majority of existing collaborative filtering systems are not well designed to handle missing data. Particularly, in order to inject the negative signals in the training phase, these solutions largely rely on negative sampling from unobserved user-item interactions and simply treating them as negative instances, which brings the recommendation performance degradation. To address the issues, we develop a C ollaborative R eflection-Augmented A utoencoder N etwork (CRANet), that is capable of exploring transferable knowledge from observed and unobserved user-item interactions. The network architecture of CRANet is formed of an integrative structure with a reflective receptor network and an information fusion autoencoder module, which endows our recommendation framework with the ability of encoding implicit user’s pairwise preference on both interacted and non-interacted items. Additionally, a parametric regularization-based tied-weight scheme is designed to perform robust joint training of the two-stage CRANetmodel. We finally experimentally validate CRANeton four diverse benchmark datasets corresponding to two recommendation tasks, to show that debiasing the negative signals of user-item interactions improves the performance as compared to various state-of-the-art recommendation techniques. Our source code is available at https://github.com/akaxlh/CRANet.

Download Full-text

UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/353 ◽

2021 ◽

Author(s):

Jing Huang ◽

Jie Yang

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Representation Learning ◽

Graph Representation ◽

Challenging Problem ◽

Unified Framework ◽

Real World Datasets ◽

Graph Neural Networks ◽

Research Domains

Hypergraph, an expressive structure with flexibility to model the higher-order correlations among entities, has recently attracted increasing attention from various research domains. Despite the success of Graph Neural Networks (GNNs) for graph representation learning, how to adapt the powerful GNN-variants directly into hypergraphs remains a challenging problem. In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs. In this framework, meticulously-designed architectures aiming to deepen GNNs can also be incorporated into hypergraphs with the least effort. Extensive experiments have been conducted to demonstrate the effectiveness of UniGNN on multiple real-world datasets, which outperform the state-of-the-art approaches with a large margin. Especially for the DBLP dataset, we increase the accuracy from 77.4% to 88.8% in the semi-supervised hypernode classification task. We further prove that the proposed message-passing based UniGNN models are at most as powerful as the 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs. Our code is available at https://github.com/OneForward/UniGNN.

Download Full-text

Multi-Scale Collaborative Network for Human Pose Estimation

International Journal of Humanoid Robotics ◽

10.1142/s0219843619410032 ◽

2019 ◽

Vol 16 (04) ◽

pp. 1941003

Author(s):

Chunsheng Guo ◽

Jialuo Zhou ◽

Wenlong Du ◽

Xuguang Zhang

Keyword(s):

Loss Function ◽

Pose Estimation ◽

Large Scale ◽

Human Pose Estimation ◽

Small Scale ◽

Feature Maps ◽

Multi Scale ◽

Human Pose ◽

Processing Network ◽

Weight Coefficients

Human pose estimation is a fundamental but challenging task in computer vision. The estimation of human pose mainly depends on the global information of the keypoint type and the local information of the keypoint location. However, the consistency of the cascading process makes it difficult for each stacking network to form a differentiation and collaboration mechanism. In order to solve these problems, this paper introduces a new human pose estimation framework called Multi-Scale Collaborative (MSC) network. The pre-processing network forms feature maps of different sizes, and dispatches them to various locations of the stack network, with small-scale features reaching the front-end stacking network and large-scale features reaching the back-end stacking network. A new loss function is proposed for MSC network. Different keypoints have different weight coefficients of loss function at different scales, and the keypoint weight coefficients are dynamically adjusted from the top hourglass network to the bottom hourglass network. Experimental results show that the proposed method is competitive in MPII and LSP challenge leaderboard among the state-of-the-art methods.

Download Full-text

MH-MetroNet—A Multi-Head CNN for Passenger-Crowd Attendance Estimation

Journal of Imaging ◽

10.3390/jimaging6070062 ◽

2020 ◽

Vol 6 (7) ◽

pp. 62

Author(s):

Pier Luigi Mazzeo ◽

Riccardo Contino ◽

Paolo Spagnolo ◽

Cosimo Distante ◽

Ettore Stella ◽

...

Keyword(s):

Network Architecture ◽

Network Performance ◽

State Of The Art ◽

Mean Absolute Error ◽

Absolute Error ◽

Input Image ◽

Mean Square ◽

Metro Station ◽

Crowd Counting ◽

Density Map

Knowing an accurate passengers attendance estimation on each metro car contributes to the safely coordination and sorting the crowd-passenger in each metro station. In this work we propose a multi-head Convolutional Neural Network (CNN) architecture trained to infer an estimation of passenger attendance in a metro car. The proposed network architecture consists of two main parts: a convolutional backbone, which extracts features over the whole input image, and a multi-head layers able to estimate a density map, needed to predict the number of people within the crowd image. The network performance is first evaluated on publicly available crowd counting datasets, including the ShanghaiTech part_A, ShanghaiTech part_B and UCF_CC_50, and then trained and tested on our dataset acquired in subway cars in Italy. In both cases a comparison is made against the most relevant and latest state of the art crowd counting architectures, showing that our proposed MH-MetroNet architecture outperforms in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE) and passenger-crowd people number prediction.

Download Full-text