scholarly journals Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

2022 ◽  
Vol Volume 14 ◽  
pp. 25-35
Author(s):  
Mo Tang ◽  
Lihao Gao ◽  
Bin He ◽  
Yufei Yang
2021 ◽  
Vol 11 ◽  
Author(s):  
Le Kuai ◽  
Ying Zhang ◽  
Ying Luo ◽  
Wei Li ◽  
Xiao-dong Li ◽  
...  

ObjectiveA proportional hazard model was applied to develop a large-scale prognostic model and nomogram incorporating clinicopathological characteristics, histological type, tumor differentiation grade, and tumor deposit count to provide clinicians and patients diagnosed with colon cancer liver metastases (CLM) a more comprehensive and practical outcome measure.MethodsUsing the Transparent Reporting of multivariable prediction models for individual Prognosis or Diagnosis (TRIPOD) guidelines, this study identified 14,697 patients diagnosed with CLM from 1975 to 2017 in the Surveillance, Epidemiology, and End Results (SEER) 21 registry database. Patients were divided into a modeling group (n=9800), an internal validation group (n=4897) using computerized randomization. An independent external validation cohort (n=60) was obtained. Univariable and multivariate Cox analyses were performed to identify prognostic predictors for overall survival (OS). Subsequently, the nomogram was constructed, and the verification was undertaken by receiver operating curves (AUC) and calibration curves.ResultsHistological type, tumor differentiation grade, and tumor deposit count were independent prognostic predictors for CLM. The nomogram consisted of age, sex, primary site, T category, N category, metastasis of bone, brain or lung, surgery, and chemotherapy. The model achieved excellent prediction power on both internal (mean AUC=0.811) and external validation (mean AUC=0.727), respectively, which were significantly higher than the American Joint Committee on Cancer (AJCC) TNM system.ConclusionThis study proposes a prognostic nomogram for predicting 1- and 2-year survival based on histopathological and population-based data of CLM patients developed using TRIPOD guidelines. Compared with the TNM stage, our nomogram has better consistency and calibration for predicting the OS of CLM patients.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0257857
Author(s):  
Ma’mon M. Hatmal ◽  
Walhan Alshaer ◽  
Ismail S. Mahmoud ◽  
Mohammad A. I. Al-Hatamleh ◽  
Hamzeh J. Al-Ameer ◽  
...  

CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen’s kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen’s κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.


2020 ◽  
Vol 22 (8) ◽  
pp. 914-922
Author(s):  
B. Zhao ◽  
R. A. Gabriel ◽  
F. Vaida ◽  
S. Eisenstein ◽  
G. T. Schnickel ◽  
...  

2019 ◽  
Author(s):  
Herdiantri Sufriyana ◽  
Atina Husnayain ◽  
Ya-Lin Chen ◽  
Chao-Yang Kuo ◽  
Onkar Singh ◽  
...  

BACKGROUND Predictions in pregnancy care are complex because of interactions among multiple factors. Hence, pregnancy outcomes are not easily predicted by a single predictor using only one algorithm or modeling method. OBJECTIVE This study aims to review and compare the predictive performances between logistic regression (LR) and other machine learning algorithms for developing or validating a multivariable prognostic prediction model for pregnancy care to inform clinicians’ decision making. METHODS Research articles from MEDLINE, Scopus, Web of Science, and Google Scholar were reviewed following several guidelines for a prognostic prediction study, including a risk of bias (ROB) assessment. We report the results based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Studies were primarily framed as PICOTS (population, index, comparator, outcomes, timing, and setting): Population: men or women in procreative management, pregnant women, and fetuses or newborns; Index: multivariable prognostic prediction models using non-LR algorithms for risk classification to inform clinicians’ decision making; Comparator: the models applying an LR; Outcomes: pregnancy-related outcomes of procreation or pregnancy outcomes for pregnant women and fetuses or newborns; Timing: pre-, inter-, and peripregnancy periods (predictors), at the pregnancy, delivery, and either puerperal or neonatal period (outcome), and either short- or long-term prognoses (time interval); and Setting: primary care or hospital. The results were synthesized by reporting study characteristics and ROBs and by random effects modeling of the difference of the logit area under the receiver operating characteristic curve of each non-LR model compared with the LR model for the same pregnancy outcomes. We also reported between-study heterogeneity by using <i>τ<sup>2</sup></i> and <i>I<sup>2</sup></i>. RESULTS Of the 2093 records, we included 142 studies for the systematic review and 62 studies for a meta-analysis. Most prediction models used LR (92/142, 64.8%) and artificial neural networks (20/142, 14.1%) among non-LR algorithms. Only 16.9% (24/142) of studies had a low ROB. A total of 2 non-LR algorithms from low ROB studies significantly outperformed LR. The first algorithm was a random forest for preterm delivery (logit AUROC 2.51, 95% CI 1.49-3.53; <i>I<sup>2</sup></i>=86%; <i>τ<sup>2</sup></i>=0.77) and pre-eclampsia (logit AUROC 1.2, 95% CI 0.72-1.67; <i>I<sup>2</sup></i>=75%; <i>τ<sup>2</sup></i>=0.09). The second algorithm was gradient boosting for cesarean section (logit AUROC 2.26, 95% CI 1.39-3.13; <i>I<sup>2</sup></i>=75%; <i>τ<sup>2</sup></i>=0.43) and gestational diabetes (logit AUROC 1.03, 95% CI 0.69-1.37; <i>I<sup>2</sup></i>=83%; <i>τ<sup>2</sup></i>=0.07). CONCLUSIONS Prediction models with the best performances across studies were not necessarily those that used LR but also used random forest and gradient boosting that also performed well. We recommend a reanalysis of existing LR models for several pregnancy outcomes by comparing them with those algorithms that apply standard guidelines. CLINICALTRIAL PROSPERO (International Prospective Register of Systematic Reviews) CRD42019136106; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=136106


10.2196/16503 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e16503
Author(s):  
Herdiantri Sufriyana ◽  
Atina Husnayain ◽  
Ya-Lin Chen ◽  
Chao-Yang Kuo ◽  
Onkar Singh ◽  
...  

Background Predictions in pregnancy care are complex because of interactions among multiple factors. Hence, pregnancy outcomes are not easily predicted by a single predictor using only one algorithm or modeling method. Objective This study aims to review and compare the predictive performances between logistic regression (LR) and other machine learning algorithms for developing or validating a multivariable prognostic prediction model for pregnancy care to inform clinicians’ decision making. Methods Research articles from MEDLINE, Scopus, Web of Science, and Google Scholar were reviewed following several guidelines for a prognostic prediction study, including a risk of bias (ROB) assessment. We report the results based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Studies were primarily framed as PICOTS (population, index, comparator, outcomes, timing, and setting): Population: men or women in procreative management, pregnant women, and fetuses or newborns; Index: multivariable prognostic prediction models using non-LR algorithms for risk classification to inform clinicians’ decision making; Comparator: the models applying an LR; Outcomes: pregnancy-related outcomes of procreation or pregnancy outcomes for pregnant women and fetuses or newborns; Timing: pre-, inter-, and peripregnancy periods (predictors), at the pregnancy, delivery, and either puerperal or neonatal period (outcome), and either short- or long-term prognoses (time interval); and Setting: primary care or hospital. The results were synthesized by reporting study characteristics and ROBs and by random effects modeling of the difference of the logit area under the receiver operating characteristic curve of each non-LR model compared with the LR model for the same pregnancy outcomes. We also reported between-study heterogeneity by using τ2 and I2. Results Of the 2093 records, we included 142 studies for the systematic review and 62 studies for a meta-analysis. Most prediction models used LR (92/142, 64.8%) and artificial neural networks (20/142, 14.1%) among non-LR algorithms. Only 16.9% (24/142) of studies had a low ROB. A total of 2 non-LR algorithms from low ROB studies significantly outperformed LR. The first algorithm was a random forest for preterm delivery (logit AUROC 2.51, 95% CI 1.49-3.53; I2=86%; τ2=0.77) and pre-eclampsia (logit AUROC 1.2, 95% CI 0.72-1.67; I2=75%; τ2=0.09). The second algorithm was gradient boosting for cesarean section (logit AUROC 2.26, 95% CI 1.39-3.13; I2=75%; τ2=0.43) and gestational diabetes (logit AUROC 1.03, 95% CI 0.69-1.37; I2=83%; τ2=0.07). Conclusions Prediction models with the best performances across studies were not necessarily those that used LR but also used random forest and gradient boosting that also performed well. We recommend a reanalysis of existing LR models for several pregnancy outcomes by comparing them with those algorithms that apply standard guidelines. Trial Registration PROSPERO (International Prospective Register of Systematic Reviews) CRD42019136106; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=136106


2021 ◽  
Author(s):  
Jialong Xiao ◽  
Miao Mo ◽  
Zezhou Wang ◽  
Changming Zhou ◽  
Jie Shen ◽  
...  

BACKGROUND Over recent years, machine learning (ML) methods have been increasingly explored in cancer prognosis prediction because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines (SVM) for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or ML-based prognostic prediction models have better predictive performance. OBJECTIVE This study aims to use the machine learning algorithms to predict the survival of breast cancer and compare the predictive performance with the traditional Cox regression. METHODS This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2016. A total of 25267 cases with 21 features were eligible for model development, and the data set was randomly split into a train set (70%) and a test set (30%) for developing four models and predicting overall survival in breast cancer patients. The discriminative ability of models was evaluated by the concordance index (C-index) and the time-dependent area under the curve (AUC); the calibration ability of models was evaluated by the Brier score. RESULTS The RSF model revealed the best discriminative performance among the four models with 3-year, 5-year and 10-year time-dependent AUC of 0.857, 0.838 and 0.781, respectively and C-index of 0.827 (0.809, 0.845), which significantly outperformed the Cox-EN model (0.816, p=0.007), the Cox model (0.814, p=0.003) and the SVM model (0.812, p<0.001). The four models' 3-year, 5-year, and 10-year brier scores were very close, ranging from 0.027 to 0.094, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of breast cancer patients. CONCLUSIONS RSF model slightly outperformed the other models on discriminative ability, revealing the great potential to be used as an effective approach for survival analysis. CLINICALTRIAL ClinicalTrials. gov, registration number: NCT04996732.


2006 ◽  
Vol 66 (S 01) ◽  
Author(s):  
IK Himsl ◽  
MS Lenhard ◽  
F von Koch ◽  
M Wichmann ◽  
A Schulze ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document