The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

Mapping Intimacies ◽

10.26434/chemrxiv.6895646.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>

Download Full-text

The Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning

10.26434/chemrxiv.6895646 ◽

2018 ◽

Author(s):

Mahendra Awale ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Target Prediction ◽

Molecular Shape ◽

Public Access ◽

Molecular Fingerprints ◽

Small Molecule Drug ◽

Fold Cross Validation

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

A New Stacking Based Ensemble Predictive Model to Predict the Mortality after Heart Surgery on a Highly Imbalanced Dataset

10.20944/preprints202109.0181.v1 ◽

2021 ◽

Author(s):

Arman Ghavidel ◽

Rouzbeh Ghousi ◽

Alireza Atashi

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Heart Surgery ◽

Cross Validation ◽

Nearest Neighbor ◽

Sampling Algorithm ◽

Predicting Performance ◽

Unbalanced Dataset ◽

Under Sampling ◽

Fold Cross Validation

Nowadays, according to spectacular improvement in health care and biomedical level, a tremendous amount of data is recorded by hospitals. In addition, the most effective approach to reduce disease mortality is to diagnose it as soon as possible. As a result, data mining by applying machine learning in the field of diseases provides good opportunities to examine the hidden patterns of this collection. An exact forecast of the mortality after heart surgery will cause Successful medical treatment and fewer costs. This research wants to recommend a new stacking predictive model after utilizing the random forest feature importance method to foresee the mortality after heart surgery on a highly unbalanced dataset by using the most practical features. To solve the unbalanced data problem, a combination of the SVM-SMOTE over-sampling algorithm and the Edited-Nearest-Neighbor under-sampling algorithm is used. This research compares the introduced model with some other machine learning classifiers to ensure efficiency through shuffle hold-out and 10-fold cross-validation strategies. In order to validate the performance of the implemented machine learning methods in this research, both shuffle hold-out, and 10-fold cross-validation results indicated that our model had the highest efficiency compared to the other models. Furthermore, the Friedman statistical test is applied to survey the differences between models. The result demonstrates that the introduced stacking model reached the most accurate predicting performance after Logistic Regression.

Download Full-text

K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1204 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Zoelkarnain Rinanda Tembusai ◽

Herman Mawengkang ◽

Muhammad Zarlis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Analytic Hierarchy Process ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Analytic Hierarchy ◽

Machine Learning Model ◽

Hierarchy Process ◽

Fold Cross Validation

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

Download Full-text

Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Current Pharmaceutical Design ◽

10.2174/1381612825666191107092214 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4296-4302 ◽

Cited By ~ 2

Author(s):

Yuan Zhang ◽

Zhenyan Han ◽

Qian Gao ◽

Xiaoyi Bai ◽

Chi Zhang ◽

...

Keyword(s):

Machine Learning ◽

Inclusion Bodies ◽

Cross Validation ◽

Independent Set ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Validation Test ◽

Excess Number ◽

Fold Cross Validation

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen. Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors. Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging. Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

Heartbeat Detection by Laser Doppler Vibrometry and Machine Learning

Sensors ◽

10.3390/s20185362 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5362 ◽

Cited By ~ 1

Author(s):

Luca Antognoli ◽

Sara Moccia ◽

Lucia Migliorelli ◽

Sara Casaccia ◽

Lorenzo Scalise ◽

...

Keyword(s):

Machine Learning ◽

Signal Analysis ◽

Cross Validation ◽

Nearest Neighbor ◽

Laser Doppler Vibrometer ◽

Laser Doppler ◽

Support Vector ◽

K Nearest Neighbor ◽

Real World Application ◽

Testing Protocol

Background: Heartbeat detection is a crucial step in several clinical fields. Laser Doppler Vibrometer (LDV) is a promising non-contact measurement for heartbeat detection. The aim of this work is to assess whether machine learning can be used for detecting heartbeat from the carotid LDV signal. Methods: The performances of Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and K-Nearest Neighbor (KNN) were compared using the leave-one-subject-out cross-validation as the testing protocol in an LDV dataset collected from 28 subjects. The classification was conducted on LDV signal windows, which were labeled as beat, if containing a beat, or no-beat, otherwise. The labeling procedure was performed using electrocardiography as the gold standard. Results: For the beat class, the f1-score (f1) values were 0.93, 0.93, 0.95, 0.96 for RF, DT, KNN and SVM, respectively. No statistical differences were found between the classifiers. When testing the SVM on the full-length (10 min long) LDV signals, to simulate a real-world application, we achieved a median macro-f1 of 0.76. Conclusions: Using machine learning for heartbeat detection from carotid LDV signals showed encouraging results, representing a promising step in the field of contactless cardiovascular signal analysis.

Download Full-text

A novel computational model for predicting potential LncRNA-disease associations based on both direct and indirect features of LncRNA-disease pairs

BMC Bioinformatics ◽

10.1186/s12859-020-03906-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Yubin Xiao ◽

Zheng Xiao ◽

Xiang Feng ◽

Zhiping Chen ◽

Linai Kuang ◽

...

Keyword(s):

Computational Model ◽

Cross Validation ◽

State Of The Art ◽

Prediction Methods ◽

Good Prediction ◽

Average Case ◽

Comparison Results ◽

Disease Associations ◽

Fold Cross Validation

Abstract Background Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) are closely associated with human diseases, and it is useful for the diagnosis and treatment of diseases to get the relationships between lncRNAs and diseases. Due to the high costs and time complexity of traditional bio-experiments, in recent years, more and more computational methods have been proposed by researchers to infer potential lncRNA-disease associations. However, there exist all kinds of limitations in these state-of-the-art prediction methods as well. Results In this manuscript, a novel computational model named FVTLDA is proposed to infer potential lncRNA-disease associations. In FVTLDA, its major novelty lies in the integration of direct and indirect features related to lncRNA-disease associations such as the feature vectors of lncRNA-disease pairs and their corresponding association probability fractions, which guarantees that FVTLDA can be utilized to predict diseases without known related-lncRNAs and lncRNAs without known related-diseases. Moreover, FVTLDA neither relies solely on known lncRNA-disease nor requires any negative samples, which guarantee that it can infer potential lncRNA-disease associations more equitably and effectively than traditional state-of-the-art prediction methods. Additionally, to avoid the limitations of single model prediction techniques, we combine FVTLDA with the Multiple Linear Regression (MLR) and the Artificial Neural Network (ANN) for data analysis respectively. Simulation experiment results show that FVTLDA with MLR can achieve reliable AUCs of 0.8909, 0.8936 and 0.8970 in 5-Fold Cross Validation (fivefold CV), 10-Fold Cross Validation (tenfold CV) and Leave-One-Out Cross Validation (LOOCV), separately, while FVTLDA with ANN can achieve reliable AUCs of 0.8766, 0.8830 and 0.8807 in fivefold CV, tenfold CV, and LOOCV respectively. Furthermore, in case studies of gastric cancer, leukemia and lung cancer, experiment results show that there are 8, 8 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with MLR, and 8, 7 and 8 out of top 10 candidate lncRNAs predicted by FVTLDA with ANN, having been verified by recent literature. Comparing with the representative prediction model of KATZLDA, comparison results illustrate that FVTLDA with MLR and FVTLDA with ANN can achieve the average case study contrast scores of 0.8429 and 0.8515 respectively, which are both notably higher than the average case study contrast score of 0.6375 achieved by KATZLDA. Conclusion The simulation results show that FVTLDA has good prediction performance, which is a good supplement to future bioinformatics research.

Download Full-text

Predict Epitranscriptome Targets and Regulatory Functions of N6-Methyladenosine (m6A) Writers and Erasers

Evolutionary Bioinformatics ◽

10.1177/1176934319871290 ◽

2019 ◽

Vol 15 ◽

pp. 117693431987129 ◽

Cited By ~ 5

Author(s):

Yiyou Song ◽

Qingru Xu ◽

Zhen Wei ◽

Di Zhen ◽

Jionglong Su ◽

...

Keyword(s):

Cross Validation ◽

Target Prediction ◽

Learning Approach ◽

Genomic Features ◽

Machine Learning Approach ◽

Functional Relevance ◽

A Site ◽

Site Identification ◽

Regulatory Functions ◽

Fold Cross Validation

Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N6-methyladenosine (m6A) site identification, none is focused on the substrate specificity of different m6A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold cross-validation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.

Download Full-text

Evaluation and Identification of the Neuroprotective Compounds of Xiaoxuming Decoction by Machine Learning: A Novel Mode to Explore the Combination Rules in Traditional Chinese Medicine Prescription

BioMed Research International ◽

10.1155/2019/6847685 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Shilun Yang ◽

Yanjia Shen ◽

Wendan Lu ◽

Yinglin Yang ◽

Haigang Wang ◽

...

Keyword(s):

Machine Learning ◽

Chinese Medicine ◽

Traditional Chinese Medicine ◽

Cross Validation ◽

Bayesian Models ◽

Machine Learning Algorithms ◽

Therapeutic Effects ◽

Test Set ◽

Screening Experiments ◽

Fold Cross Validation

Xiaoxuming decoction (XXMD), a classic traditional Chinese medicine (TCM) prescription, has been used as a therapeutic in the treatment of stroke in clinical practice for over 1200 years. However, the pharmacological mechanisms of XXMD have not yet been elucidated. The purpose of this study was to develop neuroprotective models for identifying neuroprotective compounds in XXMD against hypoxia-induced and H2O2-induced brain cell damage. In this study, a phenotype-based classification method was designed by machine learning to identify neuroprotective compounds and to clarify the compatibility of XXMD components. Four different single classifiers (AB, kNN, CT, and RF) and molecular fingerprint descriptors were used to construct stacked naïve Bayesian models. Among them, the RF algorithm had a better performance with an average MCC value of 0.725±0.014 and 0.774±0.042 from 5-fold cross-validation and test set, respectively. The probability values calculated by four models were then integrated into a stacked Bayesian model. In total, two optimal models, s-NB-1-LPFP6 and s-NB-2-LPFP6, were obtained. The two validated optimal models revealed Matthews correlation coefficients (MCC) of 0.968 and 0.993 for 5-fold cross-validation and of 0.874 and 0.959 for the test set, respectively. Furthermore, the two models were used for virtual screening experiments to identify neuroprotective compounds in XXMD. Ten representative compounds with potential therapeutic effects against the two phenotypes were selected for further cell-based assays. Among the selected compounds, two compounds significantly inhibited H2O2-induced and Na2S2O4-induced neurotoxicity simultaneously. Together, our findings suggested that machine learning algorithms such as combination Bayesian models were feasible to predict neuroprotective compounds and to preliminarily demonstrate the pharmacological mechanisms of TCM.

Download Full-text