An Efficient Link Prediction Model Using Supervised Machine Learning

Author(s):  
Praveen Kumar Bhanodia ◽  
Aditya Khamparia ◽  
Babita Pandey
Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 8 ◽  
Author(s):  
Marcus Lim ◽  
Azween Abdullah ◽  
NZ Jhanjhi ◽  
Mahadevan Supramaniam

Criminal network activities, which are usually secret and stealthy, present certain difficulties in conducting criminal network analysis (CNA) because of the lack of complete datasets. The collection of criminal activities data in these networks tends to be incomplete and inconsistent, which is reflected structurally in the criminal network in the form of missing nodes (actors) and links (relationships). Criminal networks are commonly analyzed using social network analysis (SNA) models. Most machine learning techniques that rely on the metrics of SNA models in the development of hidden or missing link prediction models utilize supervised learning. However, supervised learning usually requires the availability of a large dataset to train the link prediction model in order to achieve an optimum performance level. Therefore, this research is conducted to explore the application of deep reinforcement learning (DRL) in developing a criminal network hidden links prediction model from the reconstruction of a corrupted criminal network dataset. The experiment conducted on the model indicates that the dataset generated by the DRL model through self-play or self-simulation can be used to train the link prediction model. The DRL link prediction model exhibits a better performance than a conventional supervised machine learning technique, such as the gradient boosting machine (GBM) trained with a relatively smaller domain dataset.


Sepsis is a life-threatening disease that causes tissue damage, organ failure and results in the death of millions of people. Sepsis is one of the highest risky diseases identified globally. A large proportion of these deaths occur in developing countries due to inaccessibility of hospitals or lack of resources. Blood samples are taken to confirm sepsis, but it requires the presence of laboratory and is time-consuming. The aim and objective of this study is to develop a practical, non-invasive sepsis prediction model that can be used to detect sepsis using supervised machine Learning algorithms. For this retrospective analysis, we used the data available from Physio-Net database.


2021 ◽  
Author(s):  
Constanza L Andaur Navarro ◽  
Johanna AA Damen ◽  
Toshihiko Takada ◽  
Steven WJ Nijman ◽  
Paula Dhiman ◽  
...  

ABSTRACT Objective. While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. Our aim is to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. Study design and setting: We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields (PROSPERO, CRD42019161764). We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies (www.TRIPOD-statement.org). We measured the overall adherence per article and per TRIPOD item. Results: Our search identified 24 814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4) of TRIPOD items. No articles fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). Conclusion. Similar to studies using conventional statistical techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste.


2021 ◽  
Vol 11 (14) ◽  
pp. 6364
Author(s):  
Chun-Te Huang ◽  
Rong-Ching Chang ◽  
Yi-Lu Tsai ◽  
Kai-Chih Pai ◽  
Tsai-Jung Wang ◽  
...  

Acute kidney injury (AKI) refers to rapid decline of kidney function and is manifested by decreasing urine output or abnormal blood test (elevated serum creatinine). Electronic health records (EHRs) is fundamental for clinicians and machine learning algorithms to predict the clinical outcome of patients in the Intensive Care Unit (ICU). Early prediction of AKI could automatically warn the clinicians to review the possible risk factors and act in advance to prevent it. However, the enormous amount of patient data usually consists of a relatively incomplete data set and is very challenging for supervised machine learning process. In this paper, we propose an entropy-based feature engineering framework for vital signs based on their frequency of records. In particular, we address the missing at random (MAR) and missing not at random (MNAR) types of missing data according to different clinical scenarios. Regarding its applicability, we applied it to establish a prediction model for future AKI in ICU patients using 4278 ICU admissions from a tertiary hospital. Our result shows that the proposed entropy-based features are feasible to be used in the AKI prediction model and its performance improves as the data availability increases. In addition, we study the performance of AKI prediction model by comparing different time gaps and feature windows with the proposed vital sign entropy features. This work could be used as a guidance for feature windows selection and missing data processing during the development of a prediction model in ICU.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Gianluca Moro ◽  
Marco Masseroli

Abstract Background Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied. Results Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/. Conclusions Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available.


Sign in / Sign up

Export Citation Format

Share Document