Bilateral Filtering Graph Convolutional Network for Multi-relational Social Recommendation in the Power-law Networks

In recent years, advances in Graph Convolutional Networks (GCNs) have given new insights into the development of social recommendation. However, many existing GCN-based social recommendation methods often directly apply GCN to capture user-item and user-user interactions, which probably have two main limitations: (a) Due to the power-law property of the degree distribution, the vanilla GCN with static normalized adjacency matrix has limitations in learning node representations, especially for the long-tail nodes; (b) multi-typed social relationships between users that are ubiquitous in the real world are rarely considered. In this article, we propose a novel Bilateral Filtering Heterogeneous Attention Network (BFHAN), which improves long-tail node representations and leverages multi-typed social relationships between user nodes. First, we propose a novel graph convolutional filter for the user-item bipartite network and extend it to the user-user homogeneous network. Further, we theoretically analyze the correlation between the convergence values of different graph convolutional filters and node degrees after stacking multiple layers. Second, we model multi-relational social interactions between users as the multiplex network and further propose a multiplex attention network to capture distinctive inter-layer influences for user representations. Last but not least, the experimental results demonstrate that our proposed method outperforms several state-of-the-art GCN-based methods for social recommendation tasks.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

Graph Convolutional Networks for Exercise Motion Classification

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651255 ◽

2021 ◽

Vol 65 (1) ◽

pp. 685-689

Author(s):

Parian Haghighat ◽

Aden Prince ◽

Heejin Jeong

Keyword(s):

Nearest Neighbor ◽

Current Model ◽

Model Performance ◽

Classification Performance ◽

Great Promise ◽

Graph Classification ◽

Convolutional Network ◽

Motion Data ◽

Convolutional Networks ◽

Personal Fitness

The growth in self-fitness mobile applications has encouraged people to turn to personal fitness, which entails integrating self-tracking applications with exercise motion data to reduce fatigue and mitigate the risk of injury. The advancements in computer vision and motion capture technologies hold great promise to improve exercise classification performance. This study investigates a supervised deep learning model performance, Graph Convolutional Network (GCN) to classify three workouts using the Azure Kinect device’s motion data. The model defines the skeleton as a graph and combines GCN layers, a readout layer, and multi-layer perceptrons to build an end-to-end framework for graph classification. The model achieves an accuracy of 95.86% in classifying 19,442 frames. The current model exchanges feature information between each joint and its 1-nearest neighbor, which impact fades in graph-level classification. Therefore, a future study on improved feature utilization can enhance the model performance in classifying inter-user exercise variation.

Download Full-text

Research on power-law distribution of long-tail data and its application to tourism recommendation

Industrial Management & Data Systems ◽

10.1108/imds-10-2019-0584 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiang Chen ◽

Yaohui Pan ◽

Bin Luo

Keyword(s):

Power Law ◽

Computational Cost ◽

High Dimensional ◽

Power Law Distribution ◽

Long Tail ◽

Content Type ◽

User Similarity ◽

Similarity Calculation ◽

Recommendation Diversity ◽

Long Tail Phenomenon

PurposeOne challenge for tourism recommendation systems (TRSs) is the long-tail phenomenon of ratings or popularity among tourist products. This paper aims to improve the diversity and efficiency of TRSs utilizing the power-law distribution of long-tail data.Design/methodology/approachUsing Sina Weibo check-in data for example, this paper demonstrates that the long-tail phenomenon exists in user travel behaviors and fits the long-tail travel data with power-law distribution. To solve data sparsity in the long-tail part and increase recommendation diversity of TRSs, the paper proposes a collaborative filtering (CF) recommendation algorithm combining with power-law distribution. Furthermore, by combining power-law distribution with locality sensitive hashing (LSH), the paper optimizes user similarity calculation to improve the calculation efficiency of TRSs.FindingsThe comparison experiments show that the proposed algorithm greatly improves the recommendation diversity and calculation efficiency while maintaining high precision and recall of recommendation, providing basis for further dynamic recommendation.Originality/valueTRSs provide a better solution to the problem of information overload in the tourism field. However, based on the historical travel data over the whole population, most current TRSs tend to recommend hot and similar spots to users, lacking in diversity and failing to provide personalized recommendations. Meanwhile, the large high-dimensional sparse data in online social networks (OSNs) brings huge computational cost when calculating user similarity with traditional CF algorithms. In this paper, by integrating the power-law distribution of travel data and tourism recommendation technology, the authors’ work solves the problem existing in traditional TRSs that recommendation results are overly narrow and lack in serendipity, and provides users with a wider range of choices and hence improves user experience in TRSs. Meanwhile, utilizing locality sensitive hash functions, the authors’ work hashes users from high-dimensional vectors to one-dimensional integers and maps similar users into the same buckets, which realizes fast nearest neighbors search in high-dimensional space and solves the extreme sparsity problem of high dimensional travel data. Furthermore, applying the hashing results to user similarity calculation, the paper greatly reduces computational complexity and improves calculation efficiency of TRSs, which reduces the system load and enables TRSs to provide effective and timely recommendations for users.

Download Full-text

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5490 ◽

2020 ◽

Vol 34 (02) ◽

pp. 1342-1350 ◽

Cited By ~ 3

Author(s):

Uttaran Bhattacharya ◽

Trisha Mittal ◽

Rohan Chandra ◽

Tanmay Randhavane ◽

Aniket Bera ◽

...

Keyword(s):

Real World ◽

Classification Accuracy ◽

Emotion Perception ◽

Convolutional Network ◽

Human Emotion ◽

Convolutional Networks ◽

Variational Autoencoder ◽

Gait Features ◽

Temporal Graph ◽

Perceived Emotion

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel push-pull regularization loss in the CVAE formulation of STEP-Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4,227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.

Download Full-text

A Convolutional Network for the Classification of Sleep Stages

Proceedings ◽

10.3390/proceedings2181174 ◽

2018 ◽

Vol 2 (18) ◽

pp. 1174 ◽

Cited By ~ 1

Author(s):

Isaac Fernández-Varela ◽

Elena Hernández-Pereira ◽

Vicente Moret-Bonillo

Keyword(s):

Sleep Medicine ◽

Sleep Stages ◽

Kappa Index ◽

Night Sleep ◽

Convolutional Network ◽

Sleep Study ◽

Convolutional Networks ◽

Sleep Studies ◽

Automatic Methods

The classification of sleep stages is a crucial task in the context of sleep medicine. It involves the analysis of multiple signals thus being tedious and complex. Even for a trained physician scoring a whole night sleep study can take several hours. Most of the automatic methods trying to solve this problem use human engineered features biased for a specific dataset. In this work we use deep learning to avoid human bias. We propose an ensemble of 5 convolutional networks achieving a kappa index of 0.83 when classifying 500 sleep studies.

Download Full-text

3D-CNN-Based Fused Feature Maps with LSTM Applied to Action Recognition

Future Internet ◽

10.3390/fi11020042 ◽

2019 ◽

Vol 11 (2) ◽

pp. 42 ◽

Cited By ~ 5

Author(s):

Sheeraz Arif ◽

Jing Wang ◽

Tehseen Ul Hassan ◽

Zesong Fei

Keyword(s):

Short Term Memory ◽

Research Work ◽

Video Frame ◽

Feature Maps ◽

Convolutional Network ◽

Convolutional Networks ◽

Spatio Temporal ◽

3D Cnn ◽

Public Datasets ◽

Motion Map

Human activity recognition is an active field of research in computer vision with numerous applications. Recently, deep convolutional networks and recurrent neural networks (RNN) have received increasing attention in multimedia studies, and have yielded state-of-the-art results. In this research work, we propose a new framework which intelligently combines 3D-CNN and LSTM networks. First, we integrate discriminative information from a video into a map called a ‘motion map’ by using a deep 3-dimensional convolutional network (C3D). A motion map and the next video frame can be integrated into a new motion map, and this technique can be trained by increasing the training video length iteratively; then, the final acquired network can be used for generating the motion map of the whole video. Next, a linear weighted fusion scheme is used to fuse the network feature maps into spatio-temporal features. Finally, we use a Long-Short-Term-Memory (LSTM) encoder-decoder for final predictions. This method is simple to implement and retains discriminative and dynamic information. The improved results on benchmark public datasets prove the effectiveness and practicability of the proposed method.

Download Full-text

Revisiting Graph Based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5330 ◽

2020 ◽

Vol 34 (01) ◽

pp. 27-34 ◽

Cited By ~ 5

Author(s):

Lei Chen ◽

Le Wu ◽

Richang Hong ◽

Kun Zhang ◽

Meng Wang

Keyword(s):

Collaborative Filtering ◽

Representation Learning ◽

Superior Performance ◽

Convolutional Network ◽

Convolutional Networks ◽

Proposed Model ◽

Non Linear ◽

Efficiency And Effectiveness ◽

Residual Graph ◽

Interaction Modeling

Graph Convolutional Networks~(GCNs) are state-of-the-art graph based representation learning models by iteratively stacking multiple layers of convolution aggregation operations and non-linear activation operations. Recently, in Collaborative Filtering~(CF) based Recommender Systems~(RS), by treating the user-item interaction behavior as a bipartite graph, some researchers model higher-layer collaborative signals with GCNs. These GCN based recommender models show superior performance compared to traditional works. However, these models suffer from training difficulty with non-linear activations for large user-item graphs. Besides, most GCN based models could not model deeper layers due to the over smoothing effect with the graph convolution operation. In this paper, we revisit GCN based CF models from two aspects. First, we empirically show that removing non-linearities would enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data. The proposed model is a linear model and it is easy to train, scale to large datasets, and yield better efficiency and effectiveness on two real datasets. We publish the source code at https://github.com/newlei/LR-GCCF.

Download Full-text

Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks

Sensors ◽

10.3390/s20041085 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1085

Author(s):

Kaifeng Zhang ◽

Dan Li ◽

Jiayun Huang ◽

Yifei Chen

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Optical Flow ◽

Network Models ◽

Well Being ◽

Motion Information ◽

Behavior Recognition ◽

Convolutional Network ◽

Convolutional Networks ◽

Effective Manner

The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition.

Download Full-text