scholarly journals A Machine Learning-Based Prediction Platform for P-Glycoprotein Modulators and Its Validation by Molecular Docking

Cells ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1286 ◽  
Author(s):  
Onat Kadioglu ◽  
Thomas Efferth

P-glycoprotein (P-gp) is an important determinant of multidrug resistance (MDR) because its overexpression is associated with increased efflux of various established chemotherapy drugs in many clinically resistant and refractory tumors. This leads to insufficient therapeutic targeting of tumor populations, representing a major drawback of cancer chemotherapy. Therefore, P-gp is a target for pharmacological inhibitors to overcome MDR. In the present study, we utilized machine learning strategies to establish a model for P-gp modulators to predict whether a given compound would behave as substrate or inhibitor of P-gp. Random forest feature selection algorithm-based leave-one-out random sampling was used. Testing the model with an external validation set revealed high performance scores. A P-gp modulator list of compounds from the ChEMBL database was used to test the performance, and predictions from both substrate and inhibitor classes were selected for the last step of validation with molecular docking. Predicted substrates revealed similar docking poses than that of doxorubicin, and predicted inhibitors revealed similar docking poses than that of the known P-gp inhibitor elacridar, implying the validity of the predictions. We conclude that the machine-learning approach introduced in this investigation may serve as a tool for the rapid detection of P-gp substrates and inhibitors in large chemical libraries.

Author(s):  
Tyler F. Rooks ◽  
Andrea S. Dargie ◽  
Valeta Carol Chancey

Abstract A shortcoming of using environmental sensors for the surveillance of potentially concussive events is substantial uncertainty regarding whether the event was caused by head acceleration (“head impacts”) or sensor motion (with no head acceleration). The goal of the present study is to develop a machine learning model to classify environmental sensor data obtained in the field and evaluate the performance of the model against the performance of the proprietary classification algorithm used by the environmental sensor. Data were collected from Soldiers attending sparring sessions conducted under a U.S. Army Combatives School course. Data from one sparring session were used to train a decision tree classification algorithm to identify good and bad signals. Data from the remaining sparring sessions were kept as an external validation set. The performance of the proprietary algorithm used by the sensor was also compared to the trained algorithm performance. The trained decision tree was able to correctly classify 95% of events for internal cross-validation and 88% of events for the external validation set. Comparatively, the proprietary algorithm was only able to correctly classify 61% of the events. In general, the trained algorithm was better able to predict when a signal was good or bad compared to the proprietary algorithm. The present study shows it is possible to train a decision tree algorithm using environmental sensor data collected in the field.


Molecules ◽  
2019 ◽  
Vol 24 (10) ◽  
pp. 2006 ◽  
Author(s):  
Liadys Mora Lagares ◽  
Nikola Minovski ◽  
Marjana Novič

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Lingling Ding ◽  
Zixiao Li ◽  
Yongjun Wang

Objective: We aimed to develop and validate a machine learning-based prediction model that could assess the risk of stroke-associated pneumonia (SAP) for individual patients with acute ischemic stroke (AIS). Methods: A machine-learning model incorporating A 2 DS 2 scores and clinical features (AN-ADCS 2 ) was developed to predict the risk of SAP in patients with AIS. Two independent datasets were used for model derivation and external validation. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were estimated. The further analysis evaluated thresholds from the training set that identified patients as low-risk, intermediate-risk and high-risk, and performance at these thresholds was compared in the external validation set. Results: The AN-ADCS 2 model achieved favorable performance with a high AUC of 0.892 (95% confidence interval [CI] 0.885-0.898) in the test set and similar performance in the external validation set (AUC 0.813 [95% CI 0.812-0.814]). The AN-ADCS 2 threshold identifying low-risk was 0.03, with a NPV of 97.6% (97.2-97.9%) and sensitivity of 93.5% (92.5-94.5%). The AN-ADCS 2 threshold identifying high-risk was 0.65, with a PPV of 94.7% (93.9-95.6%) and specificity of 99.5% (99.5-99.6%). The AN-ADCS 2 model performed better than the A 2 DS 2 score (AUC 0.739, 95%CI [0.720-0.754]). Having a high risk of SAP classified by the AN-ADCS 2 was associated with unfavorable outcomes of mortality and in-hospital stroke recurrence. Conclusions: Using machine learning, the AN-ADCS 2 model provides an individualized risk prediction of SAP, which can be used as an indicator of clinical prognosis for patients with AIS.


Author(s):  
Aki Koivu ◽  
Mikko Sairanen

AbstractModelling the risk of abnormal pregnancy-related outcomes such as stillbirth and preterm birth have been proposed in the past. Commonly they utilize maternal demographic and medical history information as predictors, and they are based on conventional statistical modelling techniques. In this study, we utilize state-of-the-art machine learning methods in the task of predicting early stillbirth, late stillbirth and preterm birth pregnancies. The aim of this experimentation is to discover novel risk models that could be utilized in a clinical setting. A CDC data set of almost sixteen million observations was used conduct feature selection, parameter optimization and verification of proposed models. An additional NYC data set was used for external validation. Algorithms such as logistic regression, artificial neural network and gradient boosting decision tree were used to construct individual classifiers. Ensemble learning strategies of these classifiers were also experimented with. The best performing machine learning models achieved 0.76 AUC for early stillbirth, 0.63 for late stillbirth and 0.64 for preterm birth while using a external NYC test data. The repeatable performance of our models demonstrates robustness that is required in this context. Our proposed novel models provide a solid foundation for risk prediction and could be further improved with the addition of biochemical and/or biophysical markers.


Author(s):  
Renan Bandeira ◽  
Fernando Trinta ◽  
João Gomes ◽  
Marcio Maia ◽  
Alexandre Araripe

Professional sports are increasingly dependents of technological resources given the remarkable level of competitiveness faced by high performance athletes. With such resources, it is possible to analyze matches, avoid mistakes that may be committed by the referee or even to analyze the athletes’ performance. One of these sports is beach volleyball, one of most popular sports in Brazil. In the past 12 years, the Brazilian volleyball teams has been always among the best teams in the world. The athletes’ performance during the jump movement is one of the main factors that one team needs to improve to be successful because it is the movement that is most performed during a volleyball match. There are some approaches that study the jump movement in order to calculate its height and give evidences to improve it. Nevertheless, these solutions are expensive and are not viable to athletes with no sponsorship. Having this in mind, this works presents VolleyJump, an application created to analyze beach volleyball athlete jumps using machine learning strategies to calculate the jump height and classify it as an attack or block jump. Results show that VolleyIoT makes possible to analyze athletes’ jumps using mobile devices sensors, helping them to focus on their trainning to improve its technique.


2018 ◽  
Author(s):  
Maja Malkowska ◽  
Julian Zubek ◽  
Dariusz Plewczynski ◽  
Lucjan S Wyrwicz

Motivation: The identification of functional sequence variations in regulatory DNA regions is one of the major challenges of modern genetics. Here, we report results of a combined multifactor analysis of properties characterizing functional sequence variants located in promoter regions of genes. Results: We demonstrate that GC-content of the local sequence fragments and local DNA shape features play significant role in prioritization of functional variants and outscore features related to histone modifications, transcription factors binding sites, or evolutionary conservation descriptors. Those observations allowed us to build specialized machine learning classifier identifying functional SNPs within promoter regions – ShapeGTB. We compared our method with more general tools predicting pathogenicity of all non-coding variants. ShapeGTB outperformed them by a wide margin (AUC ROC 0.97 vs. 0.57-0.59). On the external validation set based on ClinVar database it displayed only slightly worse performance (AUC ROC 0.92 vs. 0.74-0.81). Such results suggest unique characteristics of mutations located within promoter regions and are a promising signal for the development of more accurate variant prioritization tools in the future. Availability and implementation: The datasets and source code are publicly available at: https://github.com/zubekj/ShapeGTB.


2021 ◽  
Vol 11 (1-s) ◽  
pp. 86-93
Author(s):  
Hiba Hashim Mahgoub Mohamed ◽  
Amna Bint Wahab Elrashid Mohammed Hussien ◽  
Ahmed Elsadig Mohammed Saeed

A quantitative structure-activity relationship (QSAR) study was performed to develop a model on a series of 3, 5-dimethylpyrazole containing furan moiety derivatives which exhibited considerable inhibitory activity against PDE4B. The obtained model has correlation coefficient (r) of 0.934, squared correlation coefficient (r2) of 0.872, and leave-one-out (LOO) cross-validation coefficient (Q2) value of 0.733. The predictive power of the developed model was confirmed by the external validation which has (r2) value of 0.812. These parameters confirm the stability and robustness of the model to predict the activity of a new designed set of 3,5-dimethyl-pyrazole derivatives (I-XV), results indicated that the compound III, V, XIII, and XV showed the strongest inhibition activity (IC50 = 0.2813, 0.5814, 0.6929, 0.6125μM, respectively) against PDE4B compared to the reference rolipram with (IC50=1.9μM). Molecular docking was performed on a new designed compound with PDE4B protein (3o0j). Docking results showed that compounds (X and IX) have high docking affinity of -36.2037 and -33.2888 kcal/mol respectively. Keywords: QSAR, molecular docking, pyrazole derivatives, PDE4 inhibitors, anti-inflammatory.


2020 ◽  
Author(s):  
Yongyue Wei ◽  
Jieyu He ◽  
Jiao Chen ◽  
Ying Zhu ◽  
Jiajin Chen ◽  
...  

Abstract Background Novel coronavirus disease (COVID-19) is an emerging, rapidly evolving situation. At present, the prognosis of severe and critically ill patients has become an important focus of attention. We strived to develop a prognostic prediction model for severe and critically ill COVID-19 patients.MethodsTo assess the factors associated with the prognosis of those patients, we retrospectively investigated the clinical, laboratory characteristics of confirmed 112 cases of COVID-19 admitted between 21 January to 6 March 2020 from Huangshi Central Hospital, Huangshi Hospital of Traditional Chinese Medicine, and Daye People’s Hospital. We applied machine learning method (survival random forest) to select predictors for 28-day survival and taken into account the dynamic trajectory of laboratory indicators. Results Fifteen candidate prognostic features, including 11 baseline measures (including platelet count (PLT), urea, creatine kinase (CK), fibrinogen, creatine kinase isoenzyme activity, aspartate aminotransferase (AST), activation of partial thromboplastin time (APTT), albumin, standard deviation of erythrocyte distribution width (RBC-SD), neutrophils (%) and red blood cell count (RBC)) and 4 trajectory clusters (changes during hospitalization in the white blood cell (WBC), PLT large cell ratio (P-LCR), PLT distribution width (PDW) and AST), combined with covariates achieved 100% (95%CI: 99%-100%) AUC and reached 87% (95%CI: 84%-91%) AUC in an external validation set. Conclusions Taking advantage of random forest technique and laboratory dynamic measures, we developed a forest model to predict survival outcome of COVID-19 patients, which achieved 87% AUC in the external validation set. Our online tool will help to facilitate the early recognition of patients with high risk.


2021 ◽  
Vol 9 ◽  
Author(s):  
Yang Wu ◽  
Haofei Hu ◽  
Jinlin Cai ◽  
Runtian Chen ◽  
Xin Zuo ◽  
...  

Purpose: We aimed to establish and validate a risk assessment system that combines demographic and clinical variables to predict the 3-year risk of incident diabetes in Chinese adults.Methods: A 3-year cohort study was performed on 15,928 Chinese adults without diabetes at baseline. All participants were randomly divided into a training set (n = 7,940) and a validation set (n = 7,988). XGBoost method is an effective machine learning technique used to select the most important variables from candidate variables. And we further established a stepwise model based on the predictors chosen by the XGBoost model. The area under the receiver operating characteristic curve (AUC), decision curve and calibration analysis were used to assess discrimination, clinical use and calibration of the model, respectively. The external validation was performed on a cohort of 11,113 Japanese participants.Result: In the training and validation sets, 148 and 145 incident diabetes cases occurred. XGBoost methods selected the 10 most important variables from 15 candidate variables. Fasting plasma glucose (FPG), body mass index (BMI) and age were the top 3 important variables. And we further established a stepwise model and a prediction nomogram. The AUCs of the stepwise model were 0.933 and 0.910 in the training and validation sets, respectively. The Hosmer-Lemeshow test showed a perfect fit between the predicted diabetes risk and the observed diabetes risk (p = 0.068 for the training set, p = 0.165 for the validation set). Decision curve analysis presented the clinical use of the stepwise model and there was a wide range of alternative threshold probability spectrum. And there were almost no the interactions between these predictors (most P-values for interaction >0.05). Furthermore, the AUC for the external validation set was 0.830, and the Hosmer-Lemeshow test for the external validation set showed no statistically significant difference between the predicted diabetes risk and observed diabetes risk (P = 0.824).Conclusion: We established and validated a risk assessment system for characterizing the 3-year risk of incident diabetes.


2016 ◽  
Vol 35 (1) ◽  
pp. 53 ◽  
Author(s):  
Qi Xu ◽  
Lingling Fan ◽  
Jie Xu

A quantitative structure-property relationship (QSPR) analysis of the Setschenow constants (Ksalt) of organic compounds in a sodium chloride solution was carried out using only two-dimensional (2D) descriptors as input parameters. The whole set of 101 compounds was split into a training set of 71 compounds and a validation set of 30 compounds by means of the Kennard and Stones algorithm. A general four-parameter equation, with correlation coefficient (R) of 0.887 and standard error of estimation (s) of 0.031, was obtained by stepwise multilinear regression analysis (MLRA) on the training set. The reliability and robustness of the present model was verified with leave-one-out cross-validation, randomization tests, and the external validation set. All of the descriptors contained in this model are calculated directly from the molecular 2D structures; thus, this model can be used to easily predict the Ksalt of other compounds not involved in the present dataset.


Sign in / Sign up

Export Citation Format

Share Document