scholarly journals Relation-Aware Pedestrian Attribute Recognition with Graph Convolutional Networks

2020 ◽  
Vol 34 (07) ◽  
pp. 12055-12062
Author(s):  
Zichang Tan ◽  
Yang Yang ◽  
Jun Wan ◽  
Guodong Guo ◽  
Stan Z. Li

In this paper, we propose a new end-to-end network, named Joint Learning of Attribute and Contextual relations (JLAC), to solve the task of pedestrian attribute recognition. It includes two novel modules: Attribute Relation Module (ARM) and Contextual Relation Module (CRM). For ARM, we construct an attribute graph with attribute-specific features which are learned by the constrained losses, and further use Graph Convolutional Network (GCN) to explore the correlations among multiple attributes. For CRM, we first propose a graph projection scheme to project the 2-D feature map into a set of nodes from different image regions, and then employ GCN to explore the contextual relations among those regions. Since the relation information in the above two modules is correlated and complementary, we incorporate them into a unified framework to learn both together. Experiments on three benchmarks, including PA-100K, RAP, PETA attribute datasets, demonstrate the effectiveness of the proposed JLAC.

Author(s):  
Shengsheng Qian ◽  
Jun Hu ◽  
Quan Fang ◽  
Changsheng Xu

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2020 ◽  
pp. 1-1
Author(s):  
Haonan Fan ◽  
Hai-Miao Hu ◽  
Shuailing Liu ◽  
Weiqing Lu ◽  
Shiliang Pu

Author(s):  
Zhichao Huang ◽  
Xutao Li ◽  
Yunming Ye ◽  
Michael K. Ng

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.


Author(s):  
Parian Haghighat ◽  
Aden Prince ◽  
Heejin Jeong

The growth in self-fitness mobile applications has encouraged people to turn to personal fitness, which entails integrating self-tracking applications with exercise motion data to reduce fatigue and mitigate the risk of injury. The advancements in computer vision and motion capture technologies hold great promise to improve exercise classification performance. This study investigates a supervised deep learning model performance, Graph Convolutional Network (GCN) to classify three workouts using the Azure Kinect device’s motion data. The model defines the skeleton as a graph and combines GCN layers, a readout layer, and multi-layer perceptrons to build an end-to-end framework for graph classification. The model achieves an accuracy of 95.86% in classifying 19,442 frames. The current model exchanges feature information between each joint and its 1-nearest neighbor, which impact fades in graph-level classification. Therefore, a future study on improved feature utilization can enhance the model performance in classifying inter-user exercise variation.


2020 ◽  
Vol 34 (02) ◽  
pp. 1342-1350 ◽  
Author(s):  
Uttaran Bhattacharya ◽  
Trisha Mittal ◽  
Rohan Chandra ◽  
Tanmay Randhavane ◽  
Aniket Bera ◽  
...  

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4,227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.


Proceedings ◽  
2018 ◽  
Vol 2 (18) ◽  
pp. 1174 ◽  
Author(s):  
Isaac Fernández-Varela ◽  
Elena Hernández-Pereira ◽  
Vicente Moret-Bonillo

The classification of sleep stages is a crucial task in the context of sleep medicine. It involves the analysis of multiple signals thus being tedious and complex. Even for a trained physician scoring a whole night sleep study can take several hours. Most of the automatic methods trying to solve this problem use human engineered features biased for a specific dataset. In this work we use deep learning to avoid human bias. We propose an ensemble of 5 convolutional networks achieving a kappa index of 0.83 when classifying 500 sleep studies.


2019 ◽  
Vol 11 (2) ◽  
pp. 42 ◽  
Author(s):  
Sheeraz Arif ◽  
Jing Wang ◽  
Tehseen Ul Hassan ◽  
Zesong Fei

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document