Relation Selective Graph Convolutional Network for Skeleton-Based Action Recognition

Graph convolutional networks (GCNs) have made significant progress in the skeletal action recognition task. However, the graphs constructed by these methods are too densely connected, and the same graphs are used repeatedly among channels. Redundant connections will blur the useful interdependencies of joints, and the overly repetitive graphs among channels cannot handle changes in joint relations between different actions. In this work, we propose a novel relation selective graph convolutional network (RS-GCN). We also design a trainable relation selection mechanism. It encourages the model to choose solid edges to work and build a stable and sparse topology of joints. The channel-wise graph convolution and multiscale temporal convolution are proposed to strengthening the model’s representative power. Furthermore, we introduce an asymmetrical module named the spatial-temporal attention module for more stable context modeling. Combining those changes, our model achieves state-of-the-art performance on three public benchmarks, namely NTU-RGB+D, NTU-RGB+D 120, and Northwestern-UCLA.

Download Full-text

Shallow Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s21020452 ◽

2021 ◽

Vol 21 (2) ◽

pp. 452

Author(s):

Wenjie Yang ◽

Jianlin Zhang ◽

Jingju Cai ◽

Zhiyong Xu

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Recognition Task ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatial Graph ◽

Graph Size ◽

Skeleton Graph

Graph convolutional networks (GCNs) have brought considerable improvement to the skeleton-based action recognition task. Existing GCN-based methods usually use the fixed spatial graph size among all the layers. It severely affects the model’s abilities to exploit the global and semantic discriminative information due to the limits of receptive fields. Furthermore, the fixed graph size would cause many redundancies in the representation of actions, which is inefficient for the model. The redundancies could also hinder the model from focusing on beneficial features. To address those issues, we proposed a plug-and-play channel adaptive merging module (CAMM) specific for the human skeleton graph, which can merge the vertices from the same part of the skeleton graph adaptively and efficiently. The merge weights are different across the channels, so every channel has its flexibility to integrate the joints. Then, we build a novel shallow graph convolutional network (SGCN) based on the module, which achieves state-of-the-art performance with less computational cost. Experimental results on NTU-RGB+D and Kinetics-Skeleton illustrates the superiority of our methods.

Download Full-text

Action Recognition Based on the Fusion of Graph Convolutional Networks with High Order Features

Applied Sciences ◽

10.3390/app10041482 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1482 ◽

Cited By ~ 2

Author(s):

Jiuqing Dong ◽

Yongbin Gao ◽

Hyo Jong Lee ◽

Heng Zhou ◽

Yifan Yao ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Feature Fusion ◽

State Of The Art ◽

Recognition Task ◽

High Order ◽

Recognition Method ◽

Related Research ◽

Convolutional Networks ◽

Temporal Features

Skeleton-based action recognition is a widely used task in action related research because of its clear features and the invariance of human appearances and illumination. Furthermore, it can also effectively improve the robustness of the action recognition. Graph convolutional networks have been implemented on those skeletal data to recognize actions. Recent studies have shown that the graph convolutional neural network works well in the action recognition task using spatial and temporal features of skeleton data. The prevalent methods to extract the spatial and temporal features purely rely on a deep network to learn from primitive 3D position. In this paper, we propose a novel action recognition method applying high-order spatial and temporal features from skeleton data, such as velocity features, acceleration features, and relative distance between 3D joints. Meanwhile, a method of multi-stream feature fusion is adopted to fuse these high-order features we proposed. Extensive experiments on Two large and challenging datasets, NTU-RGBD and NTU-RGBD-120, indicate that our model achieves the state-of-the-art performance.

Download Full-text

Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6759 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11045-11052

Author(s):

Linjiang Huang ◽

Yan Huang ◽

Wanli Ouyang ◽

Liang Wang

Keyword(s):

Action Recognition ◽

State Of The Art ◽

The State ◽

Body Parts ◽

Convolutional Network ◽

Joint Level ◽

Convolutional Networks ◽

Level Information ◽

Benchmark Datasets ◽

High Level

Recently, graph convolutional networks have achieved remarkable performance for skeleton-based action recognition. In this work, we identify a problem posed by the GCNs for skeleton-based action recognition, namely part-level action modeling. To address this problem, a novel Part-Level Graph Convolutional Network (PL-GCN) is proposed to capture part-level information of skeletons. Different from previous methods, the partition of body parts is learnable rather than manually defined. We propose two part-level blocks, namely Part Relation block (PR block) and Part Attention block (PA block), which are achieved by two differentiable operations, namely graph pooling operation and graph unpooling operation. The PR block aims at learning high-level relations between body parts while the PA block aims at highlighting the important body parts in the action. Integrating the original GCN with the two blocks, the PL-GCN can learn both part-level and joint-level information of the action. Extensive experiments on two benchmark datasets show the state-of-the-art performance on skeleton-based action recognition and demonstrate the effectiveness of the proposed method.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

Spatial Temporal Attention Graph Convolutional Networks with Mechanics-Stream for Skeleton-Based Action Recognition

Computer Vision – ACCV 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-69541-5_21 ◽

2021 ◽

pp. 341-357

Author(s):

Katsutoshi Shiraki ◽

Tsubasa Hirakawa ◽

Takayoshi Yamashita ◽

Hironobu Fujiyoshi

Keyword(s):

Action Recognition ◽

Temporal Attention ◽

Convolutional Networks

Download Full-text

Topology Optimization based Graph Convolutional Network

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/563 ◽

2019 ◽

Cited By ~ 2

Author(s):

Liang Yang ◽

Zesheng Kang ◽

Xiaochun Cao ◽

Di Jin ◽

Bo Yang ◽

...

Keyword(s):

Topology Optimization ◽

Network Topology ◽

State Of The Art ◽

Convolutional Network ◽

Topological Information ◽

Convolutional Networks ◽

The Past ◽

Attributed Network ◽

Fully Connected ◽

The Given

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

Download Full-text

Improved Action Recognition with Separable Spatio-Temporal Attention Using Alternative Skeletal and Video Pre-Processing

Sensors ◽

10.3390/s21031005 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1005

Author(s):

Pau Climent-Pérez ◽

Francisco Florez-Revuelta

Keyword(s):

Activities Of Daily Living ◽

Action Recognition ◽

Assisted Living ◽

Daily Living ◽

State Of The Art ◽

Temporal Attention ◽

Attention Network ◽

Full Activity ◽

Potential Benefits ◽

Spatio Temporal

The potential benefits of recognising activities of daily living from video for active and assisted living have yet to be fully untapped. These technologies can be used for behaviour understanding, and lifelogging for caregivers and end users alike. The recent publication of realistic datasets for this purpose, such as the Toyota Smarthomes dataset, calls for pushing forward the efforts to improve action recognition. Using the separable spatio-temporal attention network proposed in the literature, this paper introduces a view-invariant normalisation of skeletal pose data and full activity crops for RGB data, which improve the baseline results by 9.5% (on the cross-subject experiments), outperforming state-of-the-art techniques in this field when using the original unmodified skeletal data in dataset. Our code and data are available online.

Download Full-text

Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labeled Nodes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6048 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5892-5899

Author(s):

Ke Sun ◽

Zhouchen Lin ◽

Zhanxing Zhu

Keyword(s):

Supervised Learning ◽

State Of The Art ◽

Difficult Problem ◽

Superior Performance ◽

Learning Approach ◽

Training Algorithm ◽

Convolutional Network ◽

Convolutional Networks ◽

Learning Tasks ◽

Multi Stage

Graph Convolutional Networks (GCNs) play a crucial role in graph learning tasks, however, learning graph embedding with few supervised signals is still a difficult problem. In this paper, we propose a novel training algorithm for Graph Convolutional Network, called Multi-Stage Self-Supervised (M3S) Training Algorithm, combined with self-supervised learning approach, focusing on improving the generalization performance of GCNs on graphs with few labeled nodes. Firstly, a Multi-Stage Training Framework is provided as the basis of M3S training method. Then we leverage DeepCluster technique, a popular form of self-supervised learning, and design corresponding aligning mechanism on the embedding space to refine the Multi-Stage Training Framework, resulting in M3S Training Algorithm. Finally, extensive experimental results verify the superior performance of our algorithm on graphs with few labeled nodes under different label rates compared with other state-of-the-art approaches.

Download Full-text

Multi-Class Imbalanced Graph Convolutional Network Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/398 ◽

2020 ◽

Author(s):

Min Shi ◽

Yufei Tang ◽

Xingquan Zhu ◽

David Wilson ◽

Jianxun Liu

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Convolutional Network ◽

Network Learning ◽

Convolutional Networks ◽

Space Experiments ◽

Latent Distribution ◽

Adversarial Training ◽

Quality Of Fit

Networked data often demonstrate the Pareto principle (i.e., 80/20 rule) with skewed class distributions, where most vertices belong to a few majority classes and minority classes only contain a handful of instances. When presented with imbalanced class distributions, existing graph embedding learning tends to bias to nodes from majority classes, leaving nodes from minority classes under-trained. In this paper, we propose Dual-Regularized Graph Convolutional Networks (DR-GCN) to handle multi-class imbalanced graphs, where two types of regularization are imposed to tackle class imbalanced representation learning. To ensure that all classes are equally represented, we propose a class-conditioned adversarial training process to facilitate the separation of labeled nodes. Meanwhile, to maintain training equilibrium (i.e., retaining quality of fit across all classes), we force unlabeled nodes to follow a similar latent distribution to the labeled nodes by minimizing their difference in the embedding space. Experiments on real-world imbalanced graphs demonstrate that DR-GCN outperforms the state-of-the-art methods in node classification, graph clustering, and visualization.

Download Full-text