A Novel Data Analytics Oriented Approach for Image Representation Learning in Manufacturing Systems

Journal of Sensors ◽

10.1155/2022/1807103 ◽

2022 ◽

Vol 2022 ◽

pp. 1-14

Author(s):

Yue Liu ◽

Junqi Ma ◽

Xingzhen Tao ◽

Jingyun Liao ◽

Tao Wang ◽

...

Keyword(s):

Manufacturing Systems ◽

Image Representation ◽

Image Data ◽

Representation Learning ◽

Hybrid Architecture ◽

Convolutional Network ◽

Learning Framework ◽

Learning Tasks ◽

Supervised Methods ◽

Learned Features

In the era of digital manufacturing, huge amount of image data generated by manufacturing systems cannot be instantly handled to obtain valuable information due to the limitations (e.g., time) of traditional techniques of image processing. In this paper, we propose a novel self-supervised self-attention learning framework—TriLFrame for image representation learning. The TriLFrame is based on the hybrid architecture of Convolutional Network and Transformer. Experiments show that TriLFrame outperforms state-of-the-art self-supervised methods on the ImageNet dataset and achieves competitive performances when transferring learned features on ImageNet to other classification tasks. Moreover, TriLFrame verifies the proposed hybrid architecture, which combines the powerful local convolutional operation and the long-range nonlocal self-attention operation and works effectively in image representation learning tasks.

Multimodal Semisupervised Deep Graph Learning for Automatic Precipitation Nowcasting

Mathematical Problems in Engineering ◽

10.1155/2020/4018042 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Kaichao Miao ◽

Wei Wang ◽

Rui Hu ◽

Lei Zhang ◽

Yali Zhang ◽

...

Keyword(s):

Information Source ◽

Nonlinear Function ◽

Observation Data ◽

Graph Structure ◽

Convolutional Network ◽

Graph Learning ◽

Learning Framework ◽

Local Areas ◽

Fully Connected ◽

Learned Features

Precipitation nowcasting plays a key role in land security and emergency management of natural calamities. A majority of existing deep learning-based techniques realize precipitation nowcasting by learning a deep nonlinear function from a single information source, e.g., weather radar. In this study, we propose a novel multimodal semisupervised deep graph learning framework for precipitation nowcasting. Unlike existing studies, different modalities of observation data (including both meteorological and nonmeteorological data) are modeled jointly, thereby benefiting each other. All information is converted into image structures, next, precipitation nowcasting is deemed as a computer vision task to be optimized. To handle areas with unavailable precipitation, we convert all observation information into a graph structure and introduce a semisupervised graph convolutional network with a sequence connect architecture to learn the features of all local areas. With the learned features, precipitation is predicted through a multilayer fully connected regression network. Experiments on real datasets confirm the effectiveness of the proposed method.

Group-wise Deep Co-saliency Detection

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/424 ◽

2017 ◽

Cited By ~ 22

Author(s):

Lina Wei ◽

Shanshan Zhao ◽

Omar El Farouk Bourahla ◽

Xi Li ◽

Fei Wu

Keyword(s):

Collaborative Learning ◽

Saliency Detection ◽

Representation Learning ◽

Feature Representation ◽

Convolutional Network ◽

Learning Framework ◽

Detection Approach ◽

End To End ◽

Set Up ◽

End Group

In this paper, we propose an end-to-end group-wise deep co-saliency detection approach to address the co-salient object discovery problem based on the fully convolutional network (FCN) with group input and group output. The proposed approach captures the group-wise interaction information for group images by learning a semantics-aware image representation based on a convolutional neural network, which adaptively learns the group-wise features for co-saliency detection. Furthermore, the proposed approach discovers the collaborative and interactive relationships between group-wise feature representation and single-image individual feature representation, and model this in a collaborative learning framework. Finally, we set up a unified end-to-end deep learning scheme to jointly optimize the process of group-wise feature representation learning and the collaborative learning, leading to more reliable and robust co-saliency detection results. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.

Stacked Convolutional Sparse Auto-Encoders for Representation Learning

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3434767 ◽

2021 ◽

Vol 15 (2) ◽

pp. 1-21

Author(s):

Yi Zhu ◽

Lei Li ◽

Xindong Wu

Keyword(s):

Deep Learning ◽

Image Data ◽

Representation Learning ◽

Classification Performance ◽

Support Vector ◽

Learning Models ◽

Feature Representations ◽

Learning Framework ◽

Label Information ◽

Unsupervised Deep Learning

Deep learning seeks to achieve excellent performance for representation learning in image datasets. However, supervised deep learning models such as convolutional neural networks require a large number of labeled image data, which is intractable in applications, while unsupervised deep learning models like stacked denoising auto-encoder cannot employ label information. Meanwhile, the redundancy of image data incurs performance degradation on representation learning for aforementioned models. To address these problems, we propose a semi-supervised deep learning framework called stacked convolutional sparse auto-encoder, which can learn robust and sparse representations from image data with fewer labeled data records. More specifically, the framework is constructed by stacking layers. In each layer, higher layer feature representations are generated by features of lower layers in a convolutional way with kernels learned by a sparse auto-encoder. Meanwhile, to solve the data redundance problem, the algorithm of Reconstruction Independent Component Analysis is designed to train on patches for sphering the input data. The label information is encoded using a Softmax Regression model for semi-supervised learning. With this framework, higher level representations are learned by layers mapping from image data. It can boost the performance of the base subsequent classifiers such as support vector machines. Extensive experiments demonstrate the superior classification performance of our framework compared to several state-of-the-art representation learning methods.

NPU RGB+D Dataset and a Feature-Enhanced LSTM-DGCN Method for Action Recognition of Basketball Players

Applied Sciences ◽

10.3390/app11104426 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4426

Author(s):

Chunyan Ma ◽

Ji Fan ◽

Jinghao Yao ◽

Tao Zhang

Keyword(s):

Action Recognition ◽

Large Scale ◽

Short Term Memory ◽

Evaluation Criteria ◽

Image Data ◽

Basketball Player ◽

Basketball Players ◽

Convolutional Network ◽

Atomic Actions ◽

New Feature

Computer vision-based action recognition of basketball players in basketball training and competition has gradually become a research hotspot. However, owing to the complex technical action, diverse background, and limb occlusion, it remains a challenging task without effective solutions or public dataset benchmarks. In this study, we defined 32 kinds of atomic actions covering most of the complex actions for basketball players and built the dataset NPU RGB+D (a large scale dataset of basketball action recognition with RGB image data and Depth data captured in Northwestern Polytechnical University) for 12 kinds of actions of 10 professional basketball players with 2169 RGB+D videos and 75 thousand frames, including RGB frame sequences, depth maps, and skeleton coordinates. Through extracting the spatial features of the distances and angles between the joint points of basketball players, we created a new feature-enhanced skeleton-based method called LSTM-DGCN for basketball player action recognition based on the deep graph convolutional network (DGCN) and long short-term memory (LSTM) methods. Many advanced action recognition methods were evaluated on our dataset and compared with our proposed method. The experimental results show that the NPU RGB+D dataset is very competitive with the current action recognition algorithms and that our LSTM-DGCN outperforms the state-of-the-art action recognition methods in various evaluation criteria on our dataset. Our action classifications and this NPU RGB+D dataset are valuable for basketball player action recognition techniques. The feature-enhanced LSTM-DGCN has a more accurate action recognition effect, which improves the motion expression ability of the skeleton data.

W-MMP2Vec: Topic-driven network embedding model for link prediction in content-based heterogeneous information network

Intelligent Data Analysis ◽

10.3233/ida-205168 ◽

2021 ◽

Vol 25 (3) ◽

pp. 711-738

Author(s):

Phu Pham ◽

Phuc Do

Keyword(s):

Link Prediction ◽

Representation Learning ◽

Information Network ◽

Network Embedding ◽

Heterogeneous Information Network ◽

Heterogeneous Information ◽

Learning Framework ◽

Novel Approach ◽

Proposed Model ◽

Meta Path

Link prediction on heterogeneous information network (HIN) is considered as a challenge problem due to the complexity and diversity in types of nodes and links. Currently, there are remained challenges of meta-path-based link prediction in HIN. Previous works of link prediction in HIN via network embedding approach are mainly focused on exploiting features of node rather than existing relations in forms of meta-paths between nodes. In fact, predicting the existence of new links between non-linked nodes is absolutely inconvincible. Moreover, recent HIN-based embedding models also lack of thorough evaluations on the topic similarity between text-based nodes along given meta-paths. To tackle these challenges, in this paper, we proposed a novel approach of topic-driven multiple meta-path-based HIN representation learning framework, namely W-MMP2Vec. Our model leverages the quality of node representations by combining multiple meta-paths as well as calculating the topic similarity weight for each meta-path during the processes of network embedding learning in content-based HINs. To validate our approach, we apply W-TMP2Vec model in solving several link prediction tasks in both content-based and non-content-based HINs (DBLP, IMDB and BlogCatalog). The experimental outputs demonstrate the effectiveness of proposed model which outperforms recent state-of-the-art HIN representation learning models.

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Image Representation Learning by Transformation Regression

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412597 ◽

2021 ◽

Author(s):

Xifeng Guo ◽

Jiyuan Liu ◽

Sihang Zhou ◽

En Zhu ◽

Shihao Dong

Keyword(s):

Image Representation ◽

Representation Learning

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Driver Drowsiness Detection Using Condition-Adaptive Representation Learning Framework

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2018.2883823 ◽

2019 ◽

Vol 20 (11) ◽

pp. 4206-4218 ◽

Cited By ~ 12

Author(s):

Jongmin Yu ◽

Sangwoo Park ◽

Sangwook Lee ◽

Moongu Jeon

Keyword(s):

Representation Learning ◽

Drowsiness Detection ◽

Learning Framework ◽

Driver Drowsiness ◽

Adaptive Representation

Image Representation Learning by Deep Appearance and Spatial Coding

Computer Vision -- ACCV 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-16865-4_43 ◽

2015 ◽

pp. 659-672 ◽

Cited By ~ 1

Author(s):

Bingyuan Liu ◽

Jing Liu ◽

Zechao Li ◽

Hanqing Lu

Keyword(s):

Image Representation ◽

Representation Learning ◽

Spatial Coding