scholarly journals PHOTONAI—A Python API for rapid machine learning model development

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254062
Author(s):  
Ramona Leenings ◽  
Nils Ralf Winter ◽  
Lucas Plagwitz ◽  
Vincent Holstein ◽  
Jan Ernsting ◽  
...  

PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bongjin Lee ◽  
Kyunghoon Kim ◽  
Hyejin Hwang ◽  
You Sun Kim ◽  
Eun Hee Chung ◽  
...  

AbstractThe aim of this study was to develop a predictive model of pediatric mortality in the early stages of intensive care unit (ICU) admission using machine learning. Patients less than 18 years old who were admitted to ICUs at four tertiary referral hospitals were enrolled. Three hospitals were designated as the derivation cohort for machine learning model development and internal validation, and the other hospital was designated as the validation cohort for external validation. We developed a random forest (RF) model that predicts pediatric mortality within 72 h of ICU admission, evaluated its performance, and compared it with the Pediatric Index of Mortality 3 (PIM 3). The area under the receiver operating characteristic curve (AUROC) of RF model was 0.942 (95% confidence interval [CI] = 0.912–0.972) in the derivation cohort and 0.906 (95% CI = 0.900–0.912) in the validation cohort. In contrast, the AUROC of PIM 3 was 0.892 (95% CI = 0.878–0.906) in the derivation cohort and 0.845 (95% CI = 0.817–0.873) in the validation cohort. The RF model in our study showed improved predictive performance in terms of both internal and external validation and was superior even when compared to PIM 3.


2021 ◽  
Author(s):  
Yuki KATAOKA

Rationale: Currently available machine learning models for diagnosing COVID-19 based on computed tomography (CT) images are limited due to concerns regarding methodological flaws or underlying biases in the evaluation process. Objectives: We aimed to develop and externally validate a novel machine learning model that can classify CT image findings as positive or negative for SARS-CoV-2 reverse transcription polymerase chain reaction (RT-PCR).Methods: We used 3128 images from a wide variety of two-gate data sources for the development and ablation study of the machine learning model. A total of 633 COVID-19 cases and 2295 non-COVID-19 cases were included in the study. We randomly divided cases into a development set and ablation set at a ratio of 8:2. For the ablation study, we used another dataset including 150 cases of interstitial pneumonia among non-COVID-19 images. For external validation, we used 893 images from 740 consecutive patients at 11 acute care hospitals suspected of having COVID-19 at the time of diagnosis. The dataset included 343 COVID-19 patients. The reference standard was RT-PCR.Result: In ablation study, using interstitial pneumonia images, the specificity of the model were 0.986 for usual interstitial pneumonia pattern, 0.820 for non-specific interstitial pneumonia pattern, 0.400 for organizing pneumonia pattern. In the external validation study, the sensitivity and specificity of the model were 0.869 and 0.432, respectively, at the low-level cutoff, and 0.724 and 0.721, respectively, at the high-level cutoff.Conclusions: Our machine learning model exhibited a high sensitivity in external validation datasets and may assist physicians to rule out COVID-19 diagnosis in a timely manner. Further studies are warranted to improve model specificity.


Diagnostics ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 2102
Author(s):  
Eyal Klang ◽  
Robert Freeman ◽  
Matthew A. Levin ◽  
Shelly Soffer ◽  
Yiftach Barash ◽  
...  

Background & Aims: We aimed at identifying specific emergency department (ED) risk factors for developing complicated acute diverticulitis (AD) and evaluate a machine learning model (ML) for predicting complicated AD. Methods: We analyzed data retrieved from unselected consecutive large bowel AD patients from five hospitals from the Mount Sinai health system, NY. The study time frame was from January 2011 through March 2021. Data were used to train and evaluate a gradient-boosting machine learning model to identify patients with complicated diverticulitis, defined as a need for invasive intervention or in-hospital mortality. The model was trained and evaluated on data from four hospitals and externally validated on held-out data from the fifth hospital. Results: The final cohort included 4997 AD visits. Of them, 129 (2.9%) visits had complicated diverticulitis. Patients with complicated diverticulitis were more likely to be men, black, and arrive by ambulance. Regarding laboratory values, patients with complicated diverticulitis had higher levels of absolute neutrophils (AUC 0.73), higher white blood cells (AUC 0.70), platelet count (AUC 0.68) and lactate (AUC 0.61), and lower levels of albumin (AUC 0.69), chloride (AUC 0.64), and sodium (AUC 0.61). In the external validation cohort, the ML model showed AUC 0.85 (95% CI 0.78–0.91) for predicting complicated diverticulitis. For Youden’s index, the model showed a sensitivity of 88% with a false positive rate of 1:3.6. Conclusions: A ML model trained on clinical measures provides a proof of concept performance in predicting complications in patients presenting to the ED with AD. Clinically, it implies that a ML model may classify low-risk patients to be discharged from the ED for further treatment under an ambulatory setting.


Machine learning is a prominent tool for getting data from large amounts of information. Whereas a good amount of machine learning analysis has targeted on increasing the accuracy and potency of coaching and reasoning algorithms, there is less attention within the equally vital issues of observing the standard of information fed into the machine learning model. The standard of huge information is far away from good. Recent studies have shown that poor quality will bring serious errors to the result of big data analysis and this could have an effect on in making additional precise results from the information. Advantages of data preprocessing within the context of ML are advanced detection of errors, model-quality improves by the usage of better data, savings in engineering hours to debug issues


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Lingling Ding ◽  
Zixiao Li ◽  
Yongjun Wang

Objective: We aimed to develop and validate a machine learning-based prediction model that could assess the risk of stroke-associated pneumonia (SAP) for individual patients with acute ischemic stroke (AIS). Methods: A machine-learning model incorporating A 2 DS 2 scores and clinical features (AN-ADCS 2 ) was developed to predict the risk of SAP in patients with AIS. Two independent datasets were used for model derivation and external validation. The area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were estimated. The further analysis evaluated thresholds from the training set that identified patients as low-risk, intermediate-risk and high-risk, and performance at these thresholds was compared in the external validation set. Results: The AN-ADCS 2 model achieved favorable performance with a high AUC of 0.892 (95% confidence interval [CI] 0.885-0.898) in the test set and similar performance in the external validation set (AUC 0.813 [95% CI 0.812-0.814]). The AN-ADCS 2 threshold identifying low-risk was 0.03, with a NPV of 97.6% (97.2-97.9%) and sensitivity of 93.5% (92.5-94.5%). The AN-ADCS 2 threshold identifying high-risk was 0.65, with a PPV of 94.7% (93.9-95.6%) and specificity of 99.5% (99.5-99.6%). The AN-ADCS 2 model performed better than the A 2 DS 2 score (AUC 0.739, 95%CI [0.720-0.754]). Having a high risk of SAP classified by the AN-ADCS 2 was associated with unfavorable outcomes of mortality and in-hospital stroke recurrence. Conclusions: Using machine learning, the AN-ADCS 2 model provides an individualized risk prediction of SAP, which can be used as an indicator of clinical prognosis for patients with AIS.


2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Software testing is an activity conducted to test the software under test. It has two approaches: manual testing and automation testing. Automation testing is an approach of software testing in which programming scripts are written to automate the process of testing. There are some software development projects under development phase for which automated testing is suitable to use and other requires manual testing. It depends on factors like project requirements nature, team which is working on the project, technology on which software is developing and intended audience that may influence the suitability of automated testing for certain software development project. In this paper we have developed machine learning model for prediction of automated testing adoption. We have used chi-square test for finding factors’ correlation and PART classifier for model development. Accuracy of our proposed model is 93.1624%.


Gut ◽  
2021 ◽  
pp. gutjnl-2021-324060
Author(s):  
Raghav Sundar ◽  
Nesaretnam Barr Kumarakulasinghe ◽  
Yiong Huak Chan ◽  
Kazuhiro Yoshida ◽  
Takaki Yoshikawa ◽  
...  

ObjectiveTo date, there are no predictive biomarkers to guide selection of patients with gastric cancer (GC) who benefit from paclitaxel. Stomach cancer Adjuvant Multi-Institutional group Trial (SAMIT) was a 2×2 factorial randomised phase III study in which patients with GC were randomised to Pac-S-1 (paclitaxel +S-1), Pac-UFT (paclitaxel +UFT), S-1 alone or UFT alone after curative surgery.DesignThe primary objective of this study was to identify a gene signature that predicts survival benefit from paclitaxel chemotherapy in GC patients. SAMIT GC samples were profiled using a customised 476 gene NanoString panel. A random forest machine-learning model was applied on the NanoString profiles to develop a gene signature. An independent cohort of metastatic patients with GC treated with paclitaxel and ramucirumab (Pac-Ram) served as an external validation cohort.ResultsFrom the SAMIT trial 499 samples were analysed in this study. From the Pac-S-1 training cohort, the random forest model generated a 19-gene signature assigning patients to two groups: Pac-Sensitive and Pac-Resistant. In the Pac-UFT validation cohort, Pac-Sensitive patients exhibited a significant improvement in disease free survival (DFS): 3-year DFS 66% vs 40% (HR 0.44, p=0.0029). There was no survival difference between Pac-Sensitive and Pac-Resistant in the UFT or S-1 alone arms, test of interaction p<0.001. In the external Pac-Ram validation cohort, the signature predicted benefit for Pac-Sensitive (median PFS 147 days vs 112 days, HR 0.48, p=0.022).ConclusionUsing machine-learning techniques on one of the largest GC trials (SAMIT), we identify a gene signature representing the first predictive biomarker for paclitaxel benefit.Trial registration numberUMIN Clinical Trials Registry: C000000082 (SAMIT); ClinicalTrials.gov identifier, 02628951 (South Korean trial)


Sign in / Sign up

Export Citation Format

Share Document