Deep Reinforcement Learning Based on Link Prediction Method in Social Network Analysis

Improving the performance of link prediction is a significant role in the evaluation of social network. Link prediction is known as one of the primary purposes for recommended systems, bio information, and web. Most machine learning methods that depend on SNA model’s metrics use supervised learning to develop link prediction models. Supervised learning actually needed huge amount of data set to train the model of link prediction to obtain an optimal level of performance. In few years, Deep Reinforcement Learning (DRL) has achieved excellent success in various domain such as SNA. In this paper, we present the use of deep reinforcement learning (DRL) to improve the performance and accuracy of the model for the applied dataset. The experiment shows that the dataset created by the DRL model through self-play or auto-simulation can be utilized to improve the link prediction model. We have used three different datasets: JUNANES, MAMBO, JAKE. Experimental results show that the DRL proposed method provide accuracy of 85% for JUNANES, 87% for MAMABO, and 78% for JAKE dataset which outperforms the GBM next highest accuracy of 75% for JUNANES, 79% for MAMBO and 71% for JAKE dataset respectively trained with 2500 iteration and also in terms of AUC measures as well. The DRL model shows the better efficiency than a traditional machine learning strategy, such as, Random Forest and the gradient boosting machine (GBM).

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Hidden Link Prediction in Criminal Networks Using the Deep Reinforcement Learning Technique

Computers ◽

10.3390/computers8010008 ◽

2019 ◽

Vol 8 (1) ◽

pp. 8 ◽

Cited By ~ 7

Author(s):

Marcus Lim ◽

Azween Abdullah ◽

NZ Jhanjhi ◽

Mahadevan Supramaniam

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Network Analysis ◽

Prediction Model ◽

Supervised Learning ◽

Link Prediction ◽

Supervised Machine Learning ◽

Criminal Networks ◽

Criminal Network ◽

Learning Technique

Criminal network activities, which are usually secret and stealthy, present certain difficulties in conducting criminal network analysis (CNA) because of the lack of complete datasets. The collection of criminal activities data in these networks tends to be incomplete and inconsistent, which is reflected structurally in the criminal network in the form of missing nodes (actors) and links (relationships). Criminal networks are commonly analyzed using social network analysis (SNA) models. Most machine learning techniques that rely on the metrics of SNA models in the development of hidden or missing link prediction models utilize supervised learning. However, supervised learning usually requires the availability of a large dataset to train the link prediction model in order to achieve an optimum performance level. Therefore, this research is conducted to explore the application of deep reinforcement learning (DRL) in developing a criminal network hidden links prediction model from the reconstruction of a corrupted criminal network dataset. The experiment conducted on the model indicates that the dataset generated by the DRL model through self-play or self-simulation can be used to train the link prediction model. The DRL link prediction model exhibits a better performance than a conventional supervised machine learning technique, such as the gradient boosting machine (GBM) trained with a relatively smaller domain dataset.

Download Full-text

Application of Machine-Learning-Based Fusion Model in Visibility Forecast: A Case Study of Shanghai, China

Remote Sensing ◽

10.3390/rs13112096 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2096

Author(s):

Zhongqi Yu ◽

Yuanhao Qu ◽

Yunxin Wang ◽

Jinghui Ma ◽

Yu Cao

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Eastern China ◽

Prediction Method ◽

Sampling Technique ◽

Environmental Modeling ◽

Gradient Boosting ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting

A visibility forecast model called a boosting-based fusion model (BFM) was established in this study. The model uses a fusion machine learning model based on multisource data, including air pollutants, meteorological observations, moderate resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) data, and an operational regional atmospheric environmental modeling System for eastern China (RAEMS) outputs. Extreme gradient boosting (XGBoost), a light gradient boosting machine (LightGBM), and a numerical prediction method, i.e., RAEMS were fused to establish this prediction model. Three sets of prediction models, that is, BFM, LightGBM based on multisource data (LGBM), and RAEMS, were used to conduct visibility prediction tasks. The training set was from 1 January 2015 to 31 December 2018 and used several data pre-processing methods, including a synthetic minority over-sampling technique (SMOTE) data resampling, a loss function adjustment, and a 10-fold cross verification. Moreover, apart from the basic features (variables), more spatial and temporal gradient features were considered. The testing set was from 1 January to 31 December 2019 and was adopted to validate the feasibility of the BFM, LGBM, and RAEMS. Statistical indicators confirmed that the machine learning methods improved the RAEMS forecast significantly and consistently. The root mean square error and correlation coefficient of BFM for the next 24/48 h were 5.01/5.47 km and 0.80/0.77, respectively, which were much higher than those of RAEMS. The statistics and binary score analysis for different areas in Shanghai also proved the reliability and accuracy of using BFM, particularly in low-visibility forecasting. Overall, BFM is a suitable tool for predicting the visibility. It provides a more accurate visibility forecast for the next 24 and 48 h in Shanghai than LGBM and RAEMS. The results of this study provide support for real-time operational visibility forecasts.

Download Full-text

Follower Link Prediction using the XGBoostClassification Model with Multiple Graph Features

10.21203/rs.3.rs-239295/v1 ◽

2021 ◽

Author(s):

Dayal Kumar Behera ◽

Madhabananda Dash ◽

Subhra Swetanisha ◽

Janmenjoy Nayak ◽

S Vimal ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Supervised Learning ◽

Link Prediction ◽

Prediction Models ◽

Binary Classification ◽

Prediction Problem ◽

Global Features ◽

Proposed Model ◽

Real World Datasets

Abstract The Follower Link Prediction is an emerging application preferred by social networking sites to increase their user network. It helps in finding potential unseen individual and can be used for identifying relationship between nodes in social network. With the rapid growth of many users in social media, which users to follow leads to information overload problems. Previous works on link prediction problem are generally based on local and global features of a graph and limited to a smaller dataset. The number of users in social media is increasing in an extraordinary rate. Generating features for supervised learning from a large user network is challenging. In this paper, a supervised learning model (LPXGB) using XGBoost is proposed to consider the link prediction problem as a binary classification problem. Many hybrid graph feature techniques are used to represent the dataset suitable for machine learning. The efficiency of the LPXGB model is tested with three real world datasets Karate, Polblogs and Facebook. The proposed model is compared with various machine learning classifiers and also with traditional link prediction models. Experimental results are evident that the proposed model achieves higher classification accuracy and AUC value.

Download Full-text

Constructing the social network prediction model based on data mining and link prediction analysis

Library Hi Tech ◽

10.1108/lht-11-2018-0179 ◽

2019 ◽

Vol 38 (2) ◽

pp. 320-333

Author(s):

Yuxian Gao

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Link Prediction ◽

Prediction Method ◽

Analysis Method ◽

Data Set ◽

Content Type ◽

Community Mining ◽

The Relationship

Purpose The purpose of this paper is to apply link prediction to community mining and to clarify the role of link prediction in improving the performance of social network analysis. Design/methodology/approach In this study, the 2009 version of Enron e-mail data set provided by Carnegie Mellon University was selected as the research object first, and bibliometric analysis method and citation analysis method were adopted to compare the differences between various studies. Second, based on the impact of various interpersonal relationships, the link model was adopted to analyze the relationship among people. Finally, the factorization of the matrix was further adopted to obtain the characteristics of the research object, so as to predict the unknown relationship. Findings The experimental results show that the prediction results obtained by considering multiple relationships are more accurate than those obtained by considering only one relationship. Research limitations/implications Due to the limited number of objects in the data set, the link prediction method has not been tested on the large-scale data set, and the validity and correctness of the method need to be further verified with larger data. In addition, the research on algorithm complexity and algorithm optimization, including the storage of sparse matrix, also need to be further studied. At the same time, in the case of extremely sparse data, the accuracy of the link prediction method will decline a lot, and further research and discussion should be carried out on the sparse data. Practical implications The focus of this research is on link prediction in social network analysis. The traditional prediction model is based on a certain relationship between the objects to predict and analyze, but in real life, the relationship between people is diverse, and different relationships are interactive. Therefore, in this study, the graph model is used to express different kinds of relations, and the influence between different kinds of relations is considered in the actual prediction process. Finally, experiments on real data sets prove the effectiveness and accuracy of this method. In addition, link prediction, as an important part of social network analysis, is also of great significance for other applications of social network analysis. This study attempts to prove that link prediction is helpful to the improvement of performance analysis of social network by applying link prediction to community mining. Originality/value This study adopts a variety of methods, such as link prediction, data mining, literature analysis and citation analysis. The research direction is relatively new, and the experimental results obtained have a certain degree of credibility, which is of certain reference value for the following related research.

Download Full-text

RLPath: a knowledge graph link prediction method using reinforcement learning based attentive relation path searching and representation learning

Applied Intelligence ◽

10.1007/s10489-021-02672-0 ◽

2021 ◽

Author(s):

Ling Chen ◽

Jun Cui ◽

Xing Tang ◽

Yuntao Qian ◽

Yansheng Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Link Prediction ◽

Prediction Method ◽

Representation Learning ◽

Knowledge Graph ◽

Graph Link

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Exploiting node metadata to predict interactions in large networks using graph embedding and neural networks

10.1101/2021.06.10.447991 ◽

2021 ◽

Author(s):

Rogini Runghen ◽

Daniel B Stouffer ◽

Giulio Valentino Dalla Riva

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Link Prediction ◽

Graph Embedding ◽

Feature Space ◽

Machine Learning Techniques ◽

Large Networks ◽

Data Set ◽

Learning Techniques ◽

Low Dimensional

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.

Download Full-text

Benchmarking of Machine Learning Models to Assist the Prognosis of Tuberculosis

10.20944/preprints202103.0284.v2 ◽

2021 ◽

Author(s):

Maicon Herverton Lino Ferreira da Silva Barros ◽

Geovanne Oliveira Alves ◽

Lubnnia Morais Florêncio Souza ◽

Élisson da Silva Rocha ◽

João Fausto Lorenzato de Oliveira ◽

...

Keyword(s):

Machine Learning ◽

Clinical Symptoms ◽

Treatment Decision ◽

Gradient Boosting ◽

Original Form ◽

Learning Models ◽

Data Set ◽

Risk Of Death ◽

Increased Risk ◽

Machine Learning Models

Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1,139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-layer Perceptron (MLP) models is the best model to predict the cure class.

Download Full-text

Graph-Based Semi-Supervised Learning With Big Data

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch012 ◽

2020 ◽

pp. 214-244

Author(s):

Prithish Banerjee ◽

Mark Vere Culp ◽

Kenneth Jospeh Ryan ◽

George Michailidis

Keyword(s):

Machine Learning ◽

Big Data ◽

Supervised Learning ◽

Prior Knowledge ◽

Linear Algebra ◽

Real Data ◽

Data Set ◽

Regression Problems ◽

Classification And Regression ◽

Empirical Demonstration

This chapter presents some popular graph-based semi-supervised approaches. These techniques apply to classification and regression problems and can be extended to big data problems using recently developed anchor graph enhancements. The background necessary for understanding this Chapter includes linear algebra and optimization. No prior knowledge in methods of machine learning is necessary. An empirical demonstration of the techniques for these methods is also provided on real data set benchmarks.

Download Full-text