Relation-Aware Pedestrian Attribute Recognition with Graph Convolutional Networks

In this paper, we propose a new end-to-end network, named Joint Learning of Attribute and Contextual relations (JLAC), to solve the task of pedestrian attribute recognition. It includes two novel modules: Attribute Relation Module (ARM) and Contextual Relation Module (CRM). For ARM, we construct an attribute graph with attribute-specific features which are learned by the constrained losses, and further use Graph Convolutional Network (GCN) to explore the correlations among multiple attributes. For CRM, we first propose a graph projection scheme to project the 2-D feature map into a set of nodes from different image regions, and then employ GCN to explore the contextual relations among those regions. Since the relation information in the above two modules is correlated and complementary, we incorporate them into a unified framework to learn both together. Experiments on three benchmarks, including PA-100K, RAP, PETA attribute datasets, demonstrate the effectiveness of the proposed JLAC.

Download Full-text

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Download Full-text

Joint Learning of Dictionary and Convolutional Network for Pedestrian Attribute Recognition

2019 IEEE Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip47243.2019.8965801 ◽

2019 ◽

Author(s):

Yan Sha ◽

Congyan Lang ◽

Peixi Peng ◽

Junliang Xing ◽

Danxia Li

Keyword(s):

Convolutional Network ◽

Joint Learning ◽

Attribute Recognition

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Correlation Graph Convolutional Network for Pedestrian Attribute Recognition

IEEE Transactions on Multimedia ◽

10.1109/tmm.2020.3045286 ◽

2020 ◽

pp. 1-1

Author(s):

Haonan Fan ◽

Hai-Miao Hu ◽

Shuailing Liu ◽

Weiqing Lu ◽

Shiliang Pu

Keyword(s):

Convolutional Network ◽

Correlation Graph ◽

Attribute Recognition

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

A Unified Framework Integrating Recurrent Fully-Convolutional Networks and Optical Flow for Segmentation of the Left Ventricle in Echocardiography Data

Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00889-5_4 ◽

2018 ◽

pp. 29-37 ◽

Cited By ~ 9

Author(s):

Mohammad H. Jafari ◽

Hany Girgis ◽

Zhibin Liao ◽

Delaram Behnami ◽

Amir Abdi ◽

...

Keyword(s):

Left Ventricle ◽

Optical Flow ◽

Unified Framework ◽

Convolutional Networks ◽

Fully Convolutional Networks

Download Full-text

Graph Convolutional Networks for Exercise Motion Classification

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651255 ◽

2021 ◽

Vol 65 (1) ◽

pp. 685-689

Author(s):

Parian Haghighat ◽

Aden Prince ◽

Heejin Jeong

Keyword(s):

Nearest Neighbor ◽

Current Model ◽

Model Performance ◽

Classification Performance ◽

Great Promise ◽

Graph Classification ◽

Convolutional Network ◽

Motion Data ◽

Convolutional Networks ◽

Personal Fitness

The growth in self-fitness mobile applications has encouraged people to turn to personal fitness, which entails integrating self-tracking applications with exercise motion data to reduce fatigue and mitigate the risk of injury. The advancements in computer vision and motion capture technologies hold great promise to improve exercise classification performance. This study investigates a supervised deep learning model performance, Graph Convolutional Network (GCN) to classify three workouts using the Azure Kinect device’s motion data. The model defines the skeleton as a graph and combines GCN layers, a readout layer, and multi-layer perceptrons to build an end-to-end framework for graph classification. The model achieves an accuracy of 95.86% in classifying 19,442 frames. The current model exchanges feature information between each joint and its 1-nearest neighbor, which impact fades in graph-level classification. Therefore, a future study on improved feature utilization can enhance the model performance in classifying inter-user exercise variation.

Download Full-text

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5490 ◽

2020 ◽

Vol 34 (02) ◽

pp. 1342-1350 ◽

Cited By ~ 3

Author(s):

Uttaran Bhattacharya ◽

Trisha Mittal ◽

Rohan Chandra ◽

Tanmay Randhavane ◽

Aniket Bera ◽

...

Keyword(s):

Real World ◽

Classification Accuracy ◽

Emotion Perception ◽

Convolutional Network ◽

Human Emotion ◽

Convolutional Networks ◽

Variational Autoencoder ◽

Gait Features ◽

Temporal Graph ◽

Perceived Emotion

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4,227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.

Download Full-text

A Convolutional Network for the Classification of Sleep Stages

Proceedings ◽

10.3390/proceedings2181174 ◽

2018 ◽

Vol 2 (18) ◽

pp. 1174 ◽

Cited By ~ 1

Author(s):

Isaac Fernández-Varela ◽

Elena Hernández-Pereira ◽

Vicente Moret-Bonillo

Keyword(s):

Sleep Medicine ◽

Sleep Stages ◽

Kappa Index ◽

Night Sleep ◽

Convolutional Network ◽

Sleep Study ◽

Convolutional Networks ◽

Sleep Studies ◽

Automatic Methods

The classification of sleep stages is a crucial task in the context of sleep medicine. It involves the analysis of multiple signals thus being tedious and complex. Even for a trained physician scoring a whole night sleep study can take several hours. Most of the automatic methods trying to solve this problem use human engineered features biased for a specific dataset. In this work we use deep learning to avoid human bias. We propose an ensemble of 5 convolutional networks achieving a kappa index of 0.83 when classifying 500 sleep studies.

Download Full-text

3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition

Future Internet ◽

10.3390/fi11020042 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42 ◽

Cited By ~ 5

Author(s):

Sheeraz Arif ◽

Jing Wang ◽

Tehseen Ul Hassan ◽

Zesong Fei

Keyword(s):

Short Term Memory ◽

Research Work ◽

Video Frame ◽

Feature Maps ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatio Temporal ◽

3D Cnn ◽

Public Datasets ◽

Motion Map

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.

Download Full-text