A Computational Toxicology Approach to Screen the Hepatotoxic Ingredients in Traditional Chinese Medicines: Polygonum multiflorum Thunb as a Case Study

In recent years, liver injury induced by Traditional Chinese Medicines (TCMs) has gained increasing attention worldwide. Assessing the hepatotoxicity of compounds in TCMs is essential and inevitable for both doctors and regulatory agencies. However, there has been no effective method to screen the hepatotoxic ingredients in TCMs available until now. In the present study, we initially built a large scale dataset of drug-induced liver injuries (DILIs). Then, 13 types of molecular fingerprints/descriptors and eight machine learning algorithms were utilized to develop single classifiers for DILI, which resulted in 5416 single classifiers. Next, the NaiveBayes algorithm was adopted to integrate the best single classifier of each machine learning algorithm, by which we attempted to build a combined classifier. The accuracy, sensitivity, specificity, and area under the curve of the combined classifier were 72.798, 0.732, 0.724, and 0.793, respectively. Compared to several prior studies, the combined classifier provided better performance both in cross validation and external validation. In our prior study, we developed a herb-hepatotoxic ingredient network and a herb-induced liver injury (HILI) dataset based on pre-clinical evidence published in the scientific literature. Herein, by combining that and the combined classifier developed in this work, we proposed the first instance of a computational toxicology to screen the hepatotoxic ingredients in TCMs. Then Polygonum multiflorum Thunb (PmT) was used as a case to investigate the reliability of the approach proposed. Consequently, a total of 25 ingredients in PmT were identified as hepatotoxicants. The results were highly consistent with records in the literature, indicating that our computational toxicology approach is reliable and effective for the screening of hepatotoxic ingredients in Pmt. The combined classifier developed in this work can be used to assess the hepatotoxic risk of both natural compounds and synthetic drugs. The computational toxicology approach presented in this work will assist with screening the hepatotoxic ingredients in TCMs, which will further lay the foundation for exploring the hepatotoxic mechanisms of TCMs. In addition, the method proposed in this work can be applied to research focused on other adverse effects of TCMs/synthetic drugs.

Download Full-text

Advances in the Study of the Potential Hepatotoxic Components and Mechanism of Polygonum multiflorum

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/6489648 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

He-Shui Yu ◽

Lin-Lin Wang ◽

Ying He ◽

Li-Feng Han ◽

Hui Ding ◽

...

Keyword(s):

Liver Injury ◽

Scientific Literature ◽

Traditional Chinese Medicines ◽

Polygonum Multiflorum ◽

Chinese Medicines ◽

Comprehensive Information ◽

Liver And Kidney

The roots of Polygonum multiflorum (PM) (He Shou Wu in Chinese) are one of the most commonly used tonic traditional Chinese medicines (TCMs) in China. PM is traditionally valued for its antiaging, liver- and kidney-tonifying, and hair-blackening effects. However, an increasing number of hepatotoxicity cases induced by PM attract the attention of scholars worldwide. Thus far, the potential liver injury compounds and the mechanism are still uncertain. The aim of this review is to provide comprehensive information on the potential hepatotoxic components and mechanism of PM based on the scientific literature. Moreover, perspectives for future investigations of hepatotoxic components are discussed. This study will build a new foundation for further study on the hepatotoxic components and mechanism of PM.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Computational Models Using Multiple Machine Learning Algorithms for Predicting Drug Hepatotoxicity with the DILIrank Dataset

10.20944/preprints202002.0178.v1 ◽

2020 ◽

Author(s):

Robert Ancuceanu ◽

Marilena Viorica Hovanet ◽

Adriana Iuliana Anghel ◽

Florentina Furtunescu ◽

Monica Neagu ◽

...

Keyword(s):

Machine Learning ◽

Liver Injury ◽

Computational Models ◽

Liver Toxicity ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Drug Induced ◽

Reference Drug ◽

Drug Induced Liver Injury

Drug induced liver injury (DILI) remains one of the challenges in the safety profile of both authorized drugs and candidate drugs and predicting hepatotoxicity from the chemical structure of a substance remains a challenge worth pursuing, being also coherent with the current tendency for replacing non-clinical tests with in vitro or in silico alternatives. In 2016 a group of researchers from FDA published an improved annotated list of drugs with respect to their DILI risk, constituting “the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans”, DILIrank. This paper is one of the few attempting to predict liver toxicity using the DILIrank dataset. Molecular descriptors were computed with the Dragon 7.0 software, and a variety of feature selection and machine learning algorithms were implemented in the R computing environment. Nested (double) cross-validation was used to externally validate the models selected. A number of 78 models with reasonable performance have been selected and stacked through several approaches, including the building of multiple meta-models. The performance of the stacked models was slightly superior to other models published. The models were applied in a virtual screening exercise on over 100,000 compounds from the ZINC database and about 20% of them were predicted to be non-hepatotoxic.

Download Full-text

Machine Learning Algorithms to Predict Recurrence within 10 Years after Breast Cancer Surgery: A Prospective Cohort Study

Cancers ◽

10.3390/cancers12123817 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3817

Author(s):

Shi-Jer Lou ◽

Ming-Feng Hou ◽

Hong-Tai Chang ◽

Chong-Chi Chiu ◽

Hao-Hsien Lee ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithms ◽

External Validation ◽

Model Development ◽

Cancer Surgery ◽

Machine Learning Algorithms ◽

Breast Cancer Surgery ◽

Training Dataset

No studies have discussed machine learning algorithms to predict recurrence within 10 years after breast cancer surgery. This study purposed to compare the accuracy of forecasting models to predict recurrence within 10 years after breast cancer surgery and to identify significant predictors of recurrence. Registry data for breast cancer surgery patients were allocated to a training dataset (n = 798) for model development, a testing dataset (n = 171) for internal validation, and a validating dataset (n = 171) for external validation. Global sensitivity analysis was then performed to evaluate the significance of the selected predictors. Demographic characteristics, clinical characteristics, quality of care, and preoperative quality of life were significantly associated with recurrence within 10 years after breast cancer surgery (p < 0.05). Artificial neural networks had the highest prediction performance indices. Additionally, the surgeon volume was the best predictor of recurrence within 10 years after breast cancer surgery, followed by hospital volume and tumor stage. Accurate recurrence within 10 years prediction by machine learning algorithms may improve precision in managing patients after breast cancer surgery and improve understanding of risk factors for recurrence within 10 years after breast cancer surgery.

Download Full-text

Predictive model for acute respiratory distress syndrome events in ICU patients in China using machine learning algorithms: a secondary analysis of a cohort study

Journal of Translational Medicine ◽

10.1186/s12967-019-2075-0 ◽

2019 ◽

Vol 17 (1) ◽

Cited By ~ 4

Author(s):

Xian-Fei Ding ◽

Jin-Bo Li ◽

Huo-Yan Liang ◽

Zong-Yu Wang ◽

Ting-Ting Jiao ◽

...

Keyword(s):

Machine Learning ◽

Acute Respiratory Distress Syndrome ◽

Cohort Study ◽

Respiratory Distress Syndrome ◽

Respiratory Distress ◽

Distress Syndrome ◽

External Validation ◽

Secondary Analysis ◽

Machine Learning Algorithms ◽

Chinese Patients

Abstract Background To develop a machine learning model for predicting acute respiratory distress syndrome (ARDS) events through commonly available parameters, including baseline characteristics and clinical and laboratory parameters. Methods A secondary analysis of a multi-centre prospective observational cohort study from five hospitals in Beijing, China, was conducted from January 1, 2011, to August 31, 2014. A total of 296 patients at risk for developing ARDS admitted to medical intensive care units (ICUs) were included. We applied a random forest approach to identify the best set of predictors out of 42 variables measured on day 1 of admission. Results All patients were randomly divided into training (80%) and testing (20%) sets. Additionally, these patients were followed daily and assessed according to the Berlin definition. The model obtained an average area under the receiver operating characteristic (ROC) curve (AUC) of 0.82 and yielded a predictive accuracy of 83%. For the first time, four new biomarkers were included in the model: decreased minimum haematocrit, glucose, and sodium and increased minimum white blood cell (WBC) count. Conclusions This newly established machine learning-based model shows good predictive ability in Chinese patients with ARDS. External validation studies are necessary to confirm the generalisability of our approach across populations and treatment practices.

Download Full-text

Computational Models Using Multiple Machine Learning Algorithms for Predicting Drug Hepatotoxicity with the DILIrank Dataset

International Journal of Molecular Sciences ◽

10.3390/ijms21062114 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2114

Author(s):

Robert Ancuceanu ◽

Marilena Viorica Hovanet ◽

Adriana Iuliana Anghel ◽

Florentina Furtunescu ◽

Monica Neagu ◽

...

Keyword(s):

Machine Learning ◽

Liver Injury ◽

Computational Models ◽

Liver Toxicity ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Drug Induced ◽

Reference Drug ◽

Drug Induced Liver Injury

Drug-induced liver injury (DILI) remains one of the challenges in the safety profile of both authorized and candidate drugs, and predicting hepatotoxicity from the chemical structure of a substance remains a task worth pursuing. Such an approach is coherent with the current tendency for replacing non-clinical tests with in vitro or in silico alternatives. In 2016, a group of researchers from the FDA published an improved annotated list of drugs with respect to their DILI risk, constituting “the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans” (DILIrank). This paper is one of the few attempting to predict liver toxicity using the DILIrank dataset. Molecular descriptors were computed with the Dragon 7.0 software, and a variety of feature selection and machine learning algorithms were implemented in the R computing environment. Nested (double) cross-validation was used to externally validate the models selected. A total of 78 models with reasonable performance were selected and stacked through several approaches, including the building of multiple meta-models. The performance of the stacked models was slightly superior to other models published. The models were applied in a virtual screening exercise on over 100,000 compounds from the ZINC database and about 20% of them were predicted to be non-hepatotoxic.

Download Full-text

Drug-likeness Analysis of Traditional Chinese Medicines: Prediction of Drug-likeness Using Machine Learning Approaches

Molecular Pharmaceutics ◽

10.1021/mp300198d ◽

2012 ◽

Vol 9 (10) ◽

pp. 2875-2886 ◽

Cited By ~ 62

Author(s):

Sheng Tian ◽

Junmei Wang ◽

Youyong Li ◽

Xiaojie Xu ◽

Tingjun Hou

Keyword(s):

Machine Learning ◽

Traditional Chinese Medicines ◽

Learning Approaches ◽

Chinese Medicines

Download Full-text

REDIAL-2020: A Suite of Machine Learning Models to Estimate Anti-SARS-CoV-2 Activities

10.26434/chemrxiv.12915779.v2 ◽

2020 ◽

Author(s):

Govinda KC ◽

Giovanni Bocci ◽

Srijan Verma ◽

Mahmudulla Hassan ◽

Jayme Holmes ◽

...

Keyword(s):

Machine Learning ◽

High Throughput Screening ◽

Web Application ◽

External Validation ◽

Model Development ◽

Machine Learning Algorithms ◽

Virus Infectivity ◽

Learning Models ◽

Live Virus ◽

Machine Learning Models

Strategies for drug discovery and repositioning are an urgent need with respect to COVID-19. We developed "REDIAL-2020", a suite of machine learning models for estimating small molecule activity from molecular structure, for a range of SARS-CoV-2 related assays. Each classifier is based on three distinct types of descriptors (fingerprint, physicochemical, and pharmacophore) for parallel model development. These models were trained using high throughput screening data from the NCATS COVID19 portal (https://opendata.ncats.nih.gov/covid19/index.html), with multiple categorical machine learning algorithms. The “best models” are combined in an ensemble consensus predictor that outperforms single models where external validation is available. This suite of machine learning models is available through the DrugCentral web portal (<a href="https://drugdiscovery.utep.edu/redial">http://drugcentral.org/Redial</a>). Acceptable input formats are: drug name, PubChem CID, or SMILES; the output is an estimate of anti-SARS-CoV-2 activities. The web application reports estimated activity across three areas (viral entry, viral replication, and live virus infectivity) spanning six independent models, followed by a similarity search that displays the most similar molecules to the query among experimentally determined data. The ML models have 60% to 74% external predictivity, based on three separate datasets. Complementing the NCATS COVID19 portal, REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment. The source code and specific models are available through Github (<a href="https://github.com/sirimullalab/ncats_covid">https://github.com/sirimullalab/</a>redial-2020), or via Docker Hub (https://hub.docker.com/r/sirimullalab/redial-2020) for users preferring a containerized version.

Download Full-text

Predicting postoperative surgical site infection with administrative data: a random forests algorithm

BMC Medical Research Methodology ◽

10.1186/s12874-021-01369-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Yelena Petrosyan ◽

Kednapa Thavorn ◽

Glenys Smith ◽

Malcolm Maclure ◽

Roanne Preston ◽

...

Keyword(s):

Machine Learning ◽

Administrative Data ◽

Risk Score ◽

Random Forests ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Improvement Program ◽

Health Administrative Data ◽

Administrative Datasets

Abstract Background Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs) could ideally be monitored using routinely collected administrative data. We derived and internally validated efficient algorithms to identify SSIs within 30 days after surgery with health administrative data, using Machine Learning algorithms. Methods All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospital were linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a Random Forests algorithm and the high-performance logistic regression, were used to derive parsimonious models to predict SSI status. Finally, a risk score methodology was used to transform the final models into the risk score system. The SSI risk models were validated in the validation datasets. Results Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinct administrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedure codes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90–0.92) and calibration (Hosmer-Lemeshow χ2 statistics, 4.531, p = 0.402). Conclusion We demonstrated that health administrative data can be effectively used to identify SSIs. Machine learning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate and utilize a large amount of administrative data. External validation of this model is required before it can be routinely used to identify SSIs.

Download Full-text