PHOTONAI—A Python API for rapid machine learning model development

PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.

Download Full-text

Differentiation of Recurrent Glioblastoma from Radiation Necrosis Using Diffusion Radiomics: Machine Learning Model Development and External Validation

Machine Learning in Clinical Neuroimaging and Radiogenomics in Neuro-oncology - Lecture Notes in Computer Science ◽

10.1007/978-3-030-66843-3_27 ◽

2020 ◽

pp. 276-283

Author(s):

Yae Won Park ◽

Ji Eun Park ◽

Sung Soo Ahn ◽

Hwiyoung Kim ◽

Ho Sung Kim ◽

...

Keyword(s):

Machine Learning ◽

Radiation Necrosis ◽

External Validation ◽

Model Development ◽

Learning Model ◽

Recurrent Glioblastoma ◽

Machine Learning Model

Download Full-text

Development of a machine learning model for predicting pediatric mortality in the early stages of intensive care unit admission

Scientific Reports ◽

10.1038/s41598-020-80474-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Bongjin Lee ◽

Kyunghoon Kim ◽

Hyejin Hwang ◽

You Sun Kim ◽

Eun Hee Chung ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Intensive Care ◽

Validation Cohort ◽

External Validation ◽

Learning Model ◽

Derivation Cohort ◽

Icu Admission ◽

Early Stages ◽

Machine Learning Model

AbstractThe aim of this study was to develop a predictive model of pediatric mortality in the early stages of intensive care unit (ICU) admission using machine learning. Patients less than 18 years old who were admitted to ICUs at four tertiary referral hospitals were enrolled. Three hospitals were designated as the derivation cohort for machine learning model development and internal validation, and the other hospital was designated as the validation cohort for external validation. We developed a random forest (RF) model that predicts pediatric mortality within 72 h of ICU admission, evaluated its performance, and compared it with the Pediatric Index of Mortality 3 (PIM 3). The area under the receiver operating characteristic curve (AUROC) of RF model was 0.942 (95% confidence interval [CI] = 0.912–0.972) in the derivation cohort and 0.906 (95% CI = 0.900–0.912) in the validation cohort. In contrast, the AUROC of PIM 3 was 0.892 (95% CI = 0.878–0.906) in the derivation cohort and 0.845 (95% CI = 0.817–0.873) in the validation cohort. The RF model in our study showed improved predictive performance in terms of both internal and external validation and was superior even when compared to PIM 3.

Download Full-text

Development and external validation of a deep learning-based computed tomography classification system for COVID-19

10.31219/osf.io/j6xhb ◽

2021 ◽

Author(s):

Yuki KATAOKA

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Interstitial Pneumonia ◽

External Validation ◽

Usual Interstitial Pneumonia ◽

Evaluation Process ◽

High Sensitivity ◽

Learning Model ◽

Machine Learning Model ◽

Ablation Study

Rationale: Currently available machine learning models for diagnosing COVID-19 based on computed tomography (CT) images are limited due to concerns regarding methodological flaws or underlying biases in the evaluation process. Objectives: We aimed to develop and externally validate a novel machine learning model that can classify CT image findings as positive or negative for SARS-CoV-2 reverse transcription polymerase chain reaction (RT-PCR).Methods: We used 3128 images from a wide variety of two-gate data sources for the development and ablation study of the machine learning model. A total of 633 COVID-19 cases and 2295 non-COVID-19 cases were included in the study. We randomly divided cases into a development set and ablation set at a ratio of 8:2. For the ablation study, we used another dataset including 150 cases of interstitial pneumonia among non-COVID-19 images. For external validation, we used 893 images from 740 consecutive patients at 11 acute care hospitals suspected of having COVID-19 at the time of diagnosis. The dataset included 343 COVID-19 patients. The reference standard was RT-PCR.Result: In ablation study, using interstitial pneumonia images, the specificity of the model were 0.986 for usual interstitial pneumonia pattern, 0.820 for non-specific interstitial pneumonia pattern, 0.400 for organizing pneumonia pattern. In the external validation study, the sensitivity and specificity of the model were 0.869 and 0.432, respectively, at the low-level cutoff, and 0.724 and 0.721, respectively, at the high-level cutoff.Conclusions: Our machine learning model exhibited a high sensitivity in external validation datasets and may assist physicians to rule out COVID-19 diagnosis in a timely manner. Further studies are warranted to improve model specificity.

Download Full-text

Machine Learning Model for Outcome Prediction of Patients Suffering from Acute Diverticulitis Arriving at the Emergency Department—A Proof of Concept Study

Diagnostics ◽

10.3390/diagnostics11112102 ◽

2021 ◽

Vol 11 (11) ◽

pp. 2102

Author(s):

Eyal Klang ◽

Robert Freeman ◽

Matthew A. Levin ◽

Shelly Soffer ◽

Yiftach Barash ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Acute Diverticulitis ◽

External Validation ◽

White Blood Cells ◽

Learning Model ◽

Complicated Diverticulitis ◽

Gradient Boosting ◽

Proof Of Concept ◽

Machine Learning Model

Background & Aims: We aimed at identifying specific emergency department (ED) risk factors for developing complicated acute diverticulitis (AD) and evaluate a machine learning model (ML) for predicting complicated AD. Methods: We analyzed data retrieved from unselected consecutive large bowel AD patients from five hospitals from the Mount Sinai health system, NY. The study time frame was from January 2011 through March 2021. Data were used to train and evaluate a gradient-boosting machine learning model to identify patients with complicated diverticulitis, defined as a need for invasive intervention or in-hospital mortality. The model was trained and evaluated on data from four hospitals and externally validated on held-out data from the fifth hospital. Results: The final cohort included 4997 AD visits. Of them, 129 (2.9%) visits had complicated diverticulitis. Patients with complicated diverticulitis were more likely to be men, black, and arrive by ambulance. Regarding laboratory values, patients with complicated diverticulitis had higher levels of absolute neutrophils (AUC 0.73), higher white blood cells (AUC 0.70), platelet count (AUC 0.68) and lactate (AUC 0.61), and lower levels of albumin (AUC 0.69), chloride (AUC 0.64), and sodium (AUC 0.61). In the external validation cohort, the ML model showed AUC 0.85 (95% CI 0.78–0.91) for predicting complicated diverticulitis. For Youden’s index, the model showed a sensitivity of 88% with a false positive rate of 1:3.6. Conclusions: A ML model trained on clinical measures provides a proof of concept performance in predicting complications in patients presenting to the ED with AD. Clinically, it implies that a ML model may classify low-risk patients to be discharged from the ED for further treatment under an ambulatory setting.

Download Full-text

Text and Data Formatting for Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5216.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2756-2760

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Data Preprocessing ◽

Learning Model ◽

Poor Quality ◽

Big Data Analysis ◽

Model Quality ◽

Machine Learning Model ◽

Learning Analysis

Machine learning is a prominent tool for getting data from large amounts of information. Whereas a good amount of machine learning analysis has targeted on increasing the accuracy and potency of coaching and reasoning algorithms, there is less attention within the equally vital issues of observing the standard of information fed into the machine learning model. The standard of huge information is far away from good. Recent studies have shown that poor quality will bring serious errors to the result of big data analysis and this could have an effect on in making additional precise results from the information. Advantages of data preprocessing within the context of ML are advanced detection of errors, model-quality improves by the usage of better data, savings in engineering hours to debug issues

Download Full-text

Machine Learning Model Development for Screening Potential Entrepreneurs in the B40 (Bottom 40%) for Targeting Assistance

International Journal of Academic Research in Business and Social Sciences ◽

10.6007/ijarbss/v11-i12/11896 ◽

2021 ◽

Vol 11 (12) ◽

Author(s):

Sagaran Gopal ◽

Sulochana Nair

Keyword(s):

Machine Learning ◽

Model Development ◽

Learning Model ◽

Machine Learning Model

Download Full-text

Abstract P391: An-adcs 2 : A Novel Machine-Learning Model to Predict the Risk of Stroke-Associated Pneumonia

Stroke ◽

10.1161/str.52.suppl_1.p391 ◽

2021 ◽

Vol 52 (Suppl_1) ◽

Author(s):

Lingling Ding ◽

Zixiao Li ◽

Yongjun Wang

Keyword(s):

Machine Learning ◽

High Risk ◽

Predictive Value ◽

External Validation ◽

Learning Model ◽

Low Risk ◽

Stroke Recurrence ◽

Clinical Prognosis ◽

Machine Learning Model ◽

Validation Set

Objective: We aimed to develop and validate a machine learning-based prediction model that could assess the risk of stroke-associated pneumonia (SAP) for individual patients with acute ischemic stroke (AIS). Methods: A machine-learning model incorporating A 2 DS 2 scores and clinical features (AN-ADCS 2 ) was developed to predict the risk of SAP in patients with AIS. Two independent datasets were used for model derivation and external validation. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were estimated. The further analysis evaluated thresholds from the training set that identified patients as low-risk, intermediate-risk and high-risk, and performance at these thresholds was compared in the external validation set. Results: The AN-ADCS 2 model achieved favorable performance with a high AUC of 0.892 (95% confidence interval [CI] 0.885-0.898) in the test set and similar performance in the external validation set (AUC 0.813 [95% CI 0.812-0.814]). The AN-ADCS 2 threshold identifying low-risk was 0.03, with a NPV of 97.6% (97.2-97.9%) and sensitivity of 93.5% (92.5-94.5%). The AN-ADCS 2 threshold identifying high-risk was 0.65, with a PPV of 94.7% (93.9-95.6%) and specificity of 99.5% (99.5-99.6%). The AN-ADCS 2 model performed better than the A 2 DS 2 score (AUC 0.739, 95%CI [0.720-0.754]). Having a high risk of SAP classified by the AN-ADCS 2 was associated with unfavorable outcomes of mortality and in-hospital stroke recurrence. Conclusions: Using machine learning, the AN-ADCS 2 model provides an individualized risk prediction of SAP, which can be used as an indicator of clinical prognosis for patients with AIS.

Download Full-text

Machine Learning Model to Predict Automated Testing Adoption

International Journal of Software Innovation ◽

10.4018/ijsi.293268 ◽

2022 ◽

Vol 10 (1) ◽

pp. 0-0

Keyword(s):

Machine Learning ◽

Software Development ◽

Software Testing ◽

Model Development ◽

Learning Model ◽

Development Project ◽

Automated Testing ◽

Machine Learning Model ◽

Manual Testing ◽

Automation Testing

Software testing is an activity conducted to test the software under test. It has two approaches: manual testing and automation testing. Automation testing is an approach of software testing in which programming scripts are written to automate the process of testing. There are some software development projects under development phase for which automated testing is suitable to use and other requires manual testing. It depends on factors like project requirements nature, team which is working on the project, technology on which software is developing and intended audience that may influence the suitability of automated testing for certain software development project. In this paper we have developed machine learning model for prediction of automated testing adoption. We have used chi-square test for finding factors’ correlation and PART classifier for model development. Accuracy of our proposed model is 93.1624%.

Download Full-text

External Validation of the Bone Metastases Ensemble Trees for Survival (BMETS) Machine Learning Model to Improve Estimation of Life Expectancy

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2020.07.2136 ◽

2020 ◽

Vol 108 (3) ◽

pp. S35

Author(s):

A. LaVigne ◽

C.R. Elledge ◽

J. Fiksel ◽

J.L. Wright ◽

T.R. McNutt ◽

...

Keyword(s):

Machine Learning ◽

Life Expectancy ◽

Bone Metastases ◽

External Validation ◽

Learning Model ◽

Machine Learning Model

Download Full-text

Machine-learning model derived gene signature predictive of paclitaxel survival benefit in gastric cancer: results from the randomised phase III SAMIT trial

Gut ◽

10.1136/gutjnl-2021-324060 ◽

2021 ◽

pp. gutjnl-2021-324060

Author(s):

Raghav Sundar ◽

Nesaretnam Barr Kumarakulasinghe ◽

Yiong Huak Chan ◽

Kazuhiro Yoshida ◽

Takaki Yoshikawa ◽

...

Keyword(s):

Machine Learning ◽

Gastric Cancer ◽

Random Forest ◽

Survival Benefit ◽

Validation Cohort ◽

External Validation ◽

Gene Signature ◽

Learning Model ◽

Phase Iii ◽

Machine Learning Model

ObjectiveTo date, there are no predictive biomarkers to guide selection of patients with gastric cancer (GC) who benefit from paclitaxel. Stomach cancer Adjuvant Multi-Institutional group Trial (SAMIT) was a 2×2 factorial randomised phase III study in which patients with GC were randomised to Pac-S-1 (paclitaxel +S-1), Pac-UFT (paclitaxel +UFT), S-1 alone or UFT alone after curative surgery.DesignThe primary objective of this study was to identify a gene signature that predicts survival benefit from paclitaxel chemotherapy in GC patients. SAMIT GC samples were profiled using a customised 476 gene NanoString panel. A random forest machine-learning model was applied on the NanoString profiles to develop a gene signature. An independent cohort of metastatic patients with GC treated with paclitaxel and ramucirumab (Pac-Ram) served as an external validation cohort.ResultsFrom the SAMIT trial 499 samples were analysed in this study. From the Pac-S-1 training cohort, the random forest model generated a 19-gene signature assigning patients to two groups: Pac-Sensitive and Pac-Resistant. In the Pac-UFT validation cohort, Pac-Sensitive patients exhibited a significant improvement in disease free survival (DFS): 3-year DFS 66% vs 40% (HR 0.44, p=0.0029). There was no survival difference between Pac-Sensitive and Pac-Resistant in the UFT or S-1 alone arms, test of interaction p<0.001. In the external Pac-Ram validation cohort, the signature predicted benefit for Pac-Sensitive (median PFS 147 days vs 112 days, HR 0.48, p=0.022).ConclusionUsing machine-learning techniques on one of the largest GC trials (SAMIT), we identify a gene signature representing the first predictive biomarker for paclitaxel benefit.Trial registration numberUMIN Clinical Trials Registry: C000000082 (SAMIT); ClinicalTrials.gov identifier, 02628951 (South Korean trial)

Download Full-text