Active Discriminative Network Representation Learning

Most of current network representation models are learned in unsupervised fashions, which usually lack the capability of discrimination when applied to network analysis tasks, such as node classification. It is worth noting that label information is valuable for learning the discriminative network representations. However, labels of all training nodes are always difficult or expensive to obtain and manually labeling all nodes for training is inapplicable. Different sets of labeled nodes for model learning lead to different network representation results. In this paper, we propose a novel method, termed as ANRMAB, to learn the active discriminative network representations with a multi-armed bandit mechanism in active learning setting. Specifically, based on the networking data and the learned network representations, we design three active learning query strategies. By deriving an effective reward scheme that is closely related to the estimated performance measure of interest, ANRMAB uses a multi-armed bandit mechanism for adaptive decision making to select the most informative nodes for labeling. The updated labeled nodes are then used for further discriminative network representation learning. Experiments are conducted on three public data sets to verify the effectiveness of ANRMAB.

Download Full-text

Modeling multi-prototype Chinese word representation learning for word similarity

Complex & Intelligent Systems ◽

10.1007/s40747-021-00482-y ◽

2021 ◽

Author(s):

Fulian Yin ◽

Yanyan Wang ◽

Jianbo Liu ◽

Marco Tosato

Keyword(s):

Knowledge Representation ◽

Language Processing ◽

Representation Learning ◽

Word Knowledge ◽

Data Sets ◽

Chinese Word ◽

Word Similarity ◽

Public Data ◽

Word Representation ◽

The Stability

AbstractThe word similarity task is used to calculate the similarity of any pair of words, and is a basic technology of natural language processing (NLP). The existing method is based on word embedding, which fails to capture polysemy and is greatly influenced by the quality of the corpus. In this paper, we propose a multi-prototype Chinese word representation model (MP-CWR) for word similarity based on synonym knowledge base, including knowledge representation module and word similarity module. For the first module, we propose a dual attention to combine semantic information for jointly learning word knowledge representation. The MP-CWR model utilizes the synonyms as prior knowledge to supplement the relationship between words, which is helpful to solve the challenge of semantic expression due to insufficient data. As for the word similarity module, we propose a multi-prototype representation for each word. Then we calculate and fuse the conceptual similarity of two words to obtain the final result. Finally, we verify the effectiveness of our model on three public data sets with other baseline models. In addition, the experiments also prove the stability and scalability of our MP-CWR model under different corpora.

Download Full-text

An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks

Mathematics ◽

10.3390/math9151767 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1767

Author(s):

Xin Xu ◽

Yang Lu ◽

Yupeng Zhou ◽

Zhiguo Fu ◽

Yanjie Fu ◽

...

Keyword(s):

Random Walk ◽

Representation Learning ◽

Local Information ◽

Learning Framework ◽

Network Representation ◽

Label Node ◽

Label Information ◽

Classification Tasks ◽

Node Classification ◽

Low Dimensional

Network representation learning aims to learn low-dimensional, compressible, and distributed representational vectors of nodes in networks. Due to the expensive costs of obtaining label information of nodes in networks, many unsupervised network representation learning methods have been proposed, where random walk strategy is one of the wildly utilized approaches. However, the existing random walk based methods have some challenges, including: 1. The insufficiency of explaining what network knowledge in the walking path-samplings; 2. The adverse effects caused by the mixture of different information in networks; 3. The poor generality of the methods with hyper-parameters on different networks. This paper proposes an information-explainable random walk based unsupervised network representation learning framework named Probabilistic Accepted Walk (PAW) to obtain network representation from the perspective of the stationary distribution of networks. In the framework, we design two stationary distributions based on nodes’ self-information and local-information of networks to guide our proposed random walk strategy to learn representational vectors of networks through sampling paths of nodes. Numerous experimental results demonstrated that the PAW could obtain more expressive representation than the other six widely used unsupervised network representation learning baselines on four real-world networks in single-label and multi-label node classification tasks.

Download Full-text

TransR *: Representation learning model by flexible translation and relation matrix projection

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202177 ◽

2021 ◽

pp. 1-9

Author(s):

Zhenghang Zhang ◽

Jinlu Jia ◽

Yalin Wan ◽

Yang Zhou ◽

Yuting Kong ◽

...

Keyword(s):

Representation Learning ◽

Learning Model ◽

Data Sets ◽

Translation Strategies ◽

Translation Strategy ◽

Public Data ◽

Improvement Effect ◽

Model Training ◽

Relationship Of ◽

The Relationship

The TransR model solves the problem that TransE and TransH models are not sufficient for modeling in public spaces, and is considered a highly potential knowledge representation model. However, TransR still adopts the translation principles based on the TransE model, and the constraints are too strict, which makes the model’s ability to distinguish between very similar entities low. Therefore, we propose a representation learning model TransR* based on flexible translation and relational matrix projection. Firstly, we separate entities and relationships in different vector spaces; secondly, we combine our flexible translation strategy to make translation strategies more flexible. During model training, the quality of generating negative triples is improved by replacing semantically similar entities, and the prior probability of the relationship is used to distinguish the relationship of similar coding. Finally, we conducted link prediction experiments on the public data sets FB15K and WN18, and conducted triple classification experiments on the WN11, FB13, and FB15K data sets to analyze and verify the effectiveness of the proposed model. The evaluation results show that our method has a better improvement effect than TransR on Mean Rank, Hits@10 and ACC indicators.

Download Full-text

Beyond Cross-Validation—Accuracy Estimation for Incremental and Active Learning Models

Machine Learning and Knowledge Extraction ◽

10.3390/make2030018 ◽

2020 ◽

Vol 2 (3) ◽

pp. 327-346

Author(s):

Christian Limberg ◽

Heiko Wersing ◽

Helge Ritter

Keyword(s):

Active Learning ◽

Cross Validation ◽

Recognition Task ◽

Training Data ◽

Data Sets ◽

Accuracy Estimation ◽

Benchmark Data ◽

Machine Learning Applications ◽

Novel Method ◽

Human Teacher

For incremental machine-learning applications it is often important to robustly estimate the system accuracy during training, especially if humans perform the supervised teaching. Cross-validation and interleaved test/train error are here the standard supervised approaches. We propose a novel semi-supervised accuracy estimation approach that clearly outperforms these two methods. We introduce the Configram Estimation (CGEM) approach to predict the accuracy of any classifier that delivers confidences. By calculating classification confidences for unseen samples, it is possible to train an offline regression model, capable of predicting the classifier’s accuracy on novel data in a semi-supervised fashion. We evaluate our method with several diverse classifiers and on analytical and real-world benchmark data sets for both incremental and active learning. The results show that our novel method improves accuracy estimation over standard methods and requires less supervised training data after deployment of the model. We demonstrate the application of our approach to a challenging robot object recognition task, where the human teacher can use our method to judge sufficient training.

Download Full-text

A Drug Target Interaction Prediction Based on LINE-RF Learning

Current Bioinformatics ◽

10.2174/1574893615666191227092453 ◽

2020 ◽

Vol 15 (7) ◽

pp. 750-757

Author(s):

Jihong Wang ◽

Yue Shi ◽

Xiaodan Wang ◽

Huiyou Chang

Keyword(s):

Network Topology ◽

Drug Target ◽

Large Scale ◽

Representation Learning ◽

New Drugs ◽

Combination Method ◽

Learning Methods ◽

Network Representation ◽

On Line ◽

Clinical Experiments

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.

Download Full-text

Hyperspectral Image Classification with Localized Graph Convolutional Filtering

Remote Sensing ◽

10.3390/rs13030526 ◽

2021 ◽

Vol 13 (3) ◽

pp. 526

Author(s):

Shengliang Pu ◽

Yuanfeng Wu ◽

Xu Sun ◽

Xiaotong Sun

Keyword(s):

Hyperspectral Image ◽

Principal Component ◽

Representation Learning ◽

Classification Performance ◽

Hyperspectral Data ◽

Spectral Graph Theory ◽

Feature Reduction ◽

Graph Representation ◽

Novel Method ◽

Local Graph

The nascent graph representation learning has shown superiority for resolving graph data. Compared to conventional convolutional neural networks, graph-based deep learning has the advantages of illustrating class boundaries and modeling feature relationships. Faced with hyperspectral image (HSI) classification, the priority problem might be how to convert hyperspectral data into irregular domains from regular grids. In this regard, we present a novel method that performs the localized graph convolutional filtering on HSIs based on spectral graph theory. First, we conducted principal component analysis (PCA) preprocessing to create localized hyperspectral data cubes with unsupervised feature reduction. These feature cubes combined with localized adjacent matrices were fed into the popular graph convolution network in a standard supervised learning paradigm. Finally, we succeeded in analyzing diversified land covers by considering local graph structure with graph convolutional filtering. Experiments on real hyperspectral datasets demonstrated that the presented method offers promising classification performance compared with other popular competitors.

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

The Network Representation Learning Algorithm Based on Semi-Supervised Random Walk

IEEE Access ◽

10.1109/access.2020.3044367 ◽

2020 ◽

Vol 8 ◽

pp. 222956-222965

Author(s):

Dong Liu ◽

Qinpeng Li ◽

Yan Ru ◽

Jun Zhang

Keyword(s):

Random Walk ◽

Learning Algorithm ◽

Representation Learning ◽

Network Representation

Download Full-text

Integrating Multimodal and Longitudinal Neuroimaging Data with Multi-Source Network Representation Learning

Neuroinformatics ◽

10.1007/s12021-021-09523-w ◽

2021 ◽

Author(s):

Wen Zhang ◽

B. Blair Braden ◽

Gustavo Miranda ◽

Kai Shu ◽

Suhang Wang ◽

...

Keyword(s):

Representation Learning ◽

Network Representation ◽

Neuroimaging Data

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text