NI-13 The effectiveness and limitation of survival prediction in primary glioblastoma using machine learning-based texture analysis

Abstract Introduction: Clinical application of survival prediction of primary glioblastoma (pGBM) using preoperative images remains challenging due to a lack of robustness and standardization of the method. This research focused on validating a machine learning-based texture analysis model for this purpose using internal and external cohorts. Method: We included all cases of IDH wild-type pGBM available of preoperative MRI (T1WI, T2WI, and Gd-T1WI) from the databases of Kansai Molecular Diagnosis Network for CNS tumors (KN) and The Cancer Genome Atlas (TCGA). Of 242 cases from KN, we assigned 137 cases as a training dataset (D1), and the remaining 105 cases as an internal validation dataset (D2). Furthermore, we extracted 96 cases from TCGA as an external validation dataset (D3). Preoperative MRI scans were semi-quantitatively analyzed, leading to the acquisition of 489 texture features as explanatory variables. Dichotomous overall survival (OS) with a 16.6 months cutoff was regarded as the response variable (short/long OS). We employed Lasso regression for feature selection, and a survival prediction model constructed for D1 via cross-validation (M1) was applied to D2 and D3 to ensure the model robustness. Results: The population of predicted short OS by M1 significantly showed poorer prognosis in D2 (median OS 11.1 vs. 19.4 months; log-rank test, p=0.03), while there was no significant difference in D3 (median OS 14.2 vs. 11.9 months; p=0.61). In the comparative analysis using t-SNE, there was little variation in the feature distribution among three datasets. Conclusion: We were able to validate the prediction model in the internal but not in the external cohort. The presented result supports the use of machine learning-based texture analysis for survival prediction of pGBM in a localized population or country. However, further consideration is required to achieve a universal prediction model for pGBM, irrespective of regional difference.

Download Full-text

Development and validation of an online model to predict critical COVID-19 with immune-inflammatory parameters

Journal of Intensive Care ◽

10.1186/s40560-021-00531-1 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yue Gao ◽

Lingxi Chen ◽

Jianhua Chi ◽

Shaoqing Zeng ◽

Xikang Feng ◽

...

Keyword(s):

Machine Learning ◽

Critical Illness ◽

Prediction Model ◽

Validation Cohort ◽

External Validation ◽

Validation Dataset ◽

Support Vector ◽

Internal Validation ◽

Inflammatory Parameters ◽

Severity Prediction

Abstract Background Immune and inflammatory dysfunction was reported to underpin critical COVID-19(coronavirus disease 2019). We aim to develop a machine learning model that enables accurate prediction of critical COVID-19 using immune-inflammatory features at admission. Methods We retrospectively collected 2076 consecutive COVID-19 patients with definite outcomes (discharge or death) between January 27, 2020 and March 30, 2020 from two hospitals in China. Critical illness was defined as admission to intensive care unit, receiving invasive ventilation, or death. Least Absolute Shrinkage and Selection Operator (LASSO) was applied for feature selection. Five machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosted Decision Tree (GBDT), K-Nearest Neighbor (KNN), and Neural Network (NN) were built in a training dataset, and assessed in an internal validation dataset and an external validation dataset. Results Six features (procalcitonin, [T + B + NK cell] count, interleukin 6, C reactive protein, interleukin 2 receptor, T-helper lymphocyte/T-suppressor lymphocyte) were finally used for model development. Five models displayed varying but all promising predictive performance. Notably, the ensemble model, SPMCIIP (severity prediction model for COVID-19 by immune-inflammatory parameters), derived from three contributive algorithms (SVM, GBDT, and NN) achieved the best performance with an area under the curve (AUC) of 0.991 (95% confidence interval [CI] 0.979–1.000) in internal validation cohort and 0.999 (95% CI 0.998–1.000) in external validation cohort to identify patients with critical COVID-19. SPMCIIP could accurately and expeditiously predict the occurrence of critical COVID-19 approximately 20 days in advance. Conclusions The developed online prediction model SPMCIIP is hopeful to facilitate intensive monitoring and early intervention of high risk of critical illness in COVID-19 patients. Trial registration This study was retrospectively registered in the Chinese Clinical Trial Registry (ChiCTR2000032161). Graphical abstracthelper lymphocytve vv

Download Full-text

Machine learning-based prediction of survival prognosis in cervical cancer

BMC Bioinformatics ◽

10.1186/s12859-021-04261-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dongyan Ding ◽

Tingyuan Lang ◽

Dongling Zou ◽

Jiawei Tan ◽

Jia Chen ◽

...

Keyword(s):

Machine Learning ◽

Cervical Cancer ◽

Survival Rate ◽

Prediction Model ◽

Clustering Algorithm ◽

The Cancer Genome Atlas ◽

Survival Prediction ◽

High Survival ◽

Survival Prognosis ◽

Cancer Management

Abstract Background Accurately forecasting the prognosis could improve cervical cancer management, however, the currently used clinical features are difficult to provide enough information. The aim of this study is to improve forecasting capability by developing a miRNAs-based machine learning survival prediction model. Results The expression characteristics of miRNAs were chosen as features for model development. The cervical cancer miRNA expression data was obtained from The Cancer Genome Atlas database. Preprocessing, including unquantified data removal, missing value imputation, samples normalization, log transformation, and feature scaling, was performed. In total, 42 survival-related miRNAs were identified by Cox Proportional-Hazards analysis. The patients were optimally clustered into four groups with three different 5-years survival outcome (≥ 90%, ≈ 65%, ≤ 40%) by K-means clustering algorithm base on top 10 survival-related miRNAs. According to the K-means clustering result, a prediction model with high performance was established. The pathways analysis indicated that the miRNAs used play roles involved in the regulation of cancer stem cells. Conclusion A miRNAs-based machine learning cervical cancer survival prediction model was developed that robustly stratifies cervical cancer patients into high survival rate (5-years survival rate ≥ 90%), moderate survival rate (5-years survival rate ≈ 65%), and low survival rate (5-years survival rate ≤ 40%).

Download Full-text

Machine learning‐based individualized survival prediction model for total knee replacement in osteoarthritis: Data from the Osteoarthritis Initiative

Arthritis Care & Research ◽

10.1002/acr.24601 ◽

2021 ◽

Author(s):

Afshin Jamshidi ◽

Jean‐Pierre Pelletier ◽

Aurelie Labbe ◽

François Abram ◽

Johanne Martel‐Pelletier ◽

...

Keyword(s):

Machine Learning ◽

Total Knee Replacement ◽

Prediction Model ◽

Knee Replacement ◽

Survival Prediction ◽

Osteoarthritis Initiative ◽

Total Knee

Download Full-text

A machine learning‐based survival prediction model of high grade glioma by integration of clinical and dose‐volume histogram parameters

Cancer Medicine ◽

10.1002/cam4.3838 ◽

2021 ◽

Vol 10 (8) ◽

pp. 2774-2786

Author(s):

Haiyan Chen ◽

Chao Li ◽

Lin Zheng ◽

Wei Lu ◽

Yanlin Li ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

High Grade Glioma ◽

Survival Prediction ◽

High Grade ◽

Dose Volume Histogram ◽

Dose Volume

Download Full-text

Quantification of BRAF V600E alleles predicts papillary thyroid cancer progression

Endocrine Related Cancer ◽

10.1530/erc-14-0147 ◽

2014 ◽

Vol 21 (6) ◽

pp. 891-902 ◽

Cited By ~ 11

Author(s):

Min-Hee Kim ◽

Ja Seong Bae ◽

Dong-Jun Lim ◽

Hyoungnam Lee ◽

So Ra Jeon ◽

...

Keyword(s):

Thyroid Cancer ◽

Cancer Progression ◽

Tumour Size ◽

External Validation ◽

Allelic Frequency ◽

The Cancer Genome Atlas ◽

Braf V600e ◽

Validation Dataset ◽

Papillary Thyroid ◽

Additional Information

The BRAF V600E mutation is the most common genetic alteration in thyroid cancer. However, its clinicopathological significance and clonal mutation frequency remain unclear. To clarify the inconsistent results, we investigated the association between the allelic frequency of BRAF V600E and the clinicopathological features of classic papillary thyroid carcinoma (PTC). Tumour tissues from two independent sets of patients with classic PTC were manually microdissected and analysed for the presence or absence of the BRAF mutation and the mutant allelic frequency using quantitative pyrosequencing. For external validation, the Cancer Genome Atlas (TCGA) data were analysed. The BRAF V600E mutation was found in 264 (82.2%) out of 321 classic PTCs in the training set. The presence of BRAF V600E was only associated with extrathyroidal extension and the absence of thyroiditis. In BRAF V600E-positive tumours, the mutant allelic frequency varied from 8 to 41% of the total BRAF alleles (median, 20%) and directly correlated with tumour size and the number of metastatic lymph nodes. Lymph node metastases were more frequent in PTCs with a high (≥20%) abundance of mutant alleles than in those with a low abundance of mutant alleles (P=0.010). These results were reinforced by validation dataset (n=348) analysis but were not reproduced in the TCGA dataset. In a population with prevalent BRAF mutations, quantitative analysis of the BRAF mutation could provide additional information regarding tumour behaviour, which is not reflected by qualitative analysis. Nonetheless, prospective studies are needed before the mutated allele percentage can be considered as a prognostic factor.

Download Full-text

SURG-02. SURVIVAL PREDICTION AFTER NEUROSURGICAL RESECTION OF BRAIN METASTASES: A MACHINE LEARNING APPROACH

Neuro-Oncology ◽

10.1093/neuonc/noaa215.849 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii203-ii203

Author(s):

Alexander Hulsbergen ◽

Yu Tung Lo ◽

Vasileios Kavouridis ◽

John Phillips ◽

Timothy Smith ◽

...

Keyword(s):

Machine Learning ◽

Brain Metastases ◽

External Validation ◽

Superior Performance ◽

Prognostic Models ◽

Receiver Operating Curve ◽

Gradient Boosting ◽

Survival Prediction ◽

Ensemble Model ◽

Adaptive Boosting

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.

Download Full-text

Predicting lung adenocarcinoma disease progression using methylation-correlated blocks and ensemble machine learning classifiers

PeerJ ◽

10.7717/peerj.10884 ◽

2021 ◽

Vol 9 ◽

pp. e10884

Author(s):

Xin Yu ◽

Qian Yang ◽

Dong Wang ◽

Zhaoyang Li ◽

Nianhang Chen ◽

...

Keyword(s):

Machine Learning ◽

Lung Adenocarcinoma ◽

Cox Regression ◽

Characteristic Curve ◽

The Cancer Genome Atlas ◽

Support Vector ◽

Survival Prediction ◽

Ensemble Model ◽

Training Set ◽

Cpg Sites

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.

Download Full-text

Discovery of Highly Polymorphic Organic Materials: A New Machine Learning Approach

10.26434/chemrxiv.9524219 ◽

2019 ◽

Author(s):

Zied Hosni ◽

Annalisa Riccardi ◽

Stephanie Yerdelen ◽

Alan R. G. Martin ◽

Deborah Bowering ◽

...

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

External Validation ◽

New Drugs ◽

Training Dataset ◽

Validation Dataset ◽

Machine Learning Classification ◽

Novel Approach ◽

Physical Form ◽

Machine Learning Approach

<div><div><p>Polymorphism is the capacity of a molecule to adopt different conformations or molecular packing arrangements in the solid state. This is a key property to control during pharmaceutical manufacturing because it can impact a range of properties including stability and solubility. In this study, a novel approach based on machine learning classification methods is used to predict the likelihood for an organic compound to crystallise in multiple forms. A training dataset of drug-like molecules was curated from the Cambridge Structural Database (CSD) and filtered according to entries in the Drug Bank database. The number of separate forms in the CSD for each molecule was recorded. A metaclassifier was trained using this dataset to predict the expected number of crystalline forms from the compound descriptors. This approach was used to estimate the number of crystallographic forms for an external validation dataset. These results suggest this novel methodology can be used to predict the extent of polymorphism of new drugs or not-yet experimentally screened molecules. This promising method complements expensive ab initio methods for crystal structure prediction and as integral to experimental physical form screening, may identify systems that with unexplored potential.</p> </div> </div>

Download Full-text

A Time-Updated, Parsimonious Model to Predict AKI in Hospitalized Children

Journal of the American Society of Nephrology ◽

10.1681/asn.2019070745 ◽

2020 ◽

Vol 31 (6) ◽

pp. 1348-1357 ◽

Cited By ~ 1

Author(s):

Ibrahim Sandokji ◽

Yu Yamamoto ◽

Aditya Biswas ◽

Tanima Arora ◽

Ugochukwu Ugwuowo ◽

...

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Health Record ◽

Hospitalized Children ◽

Operating Characteristic Curve ◽

Electronic Health

BackgroundTimely prediction of AKI in children can allow for targeted interventions, but the wealth of data in the electronic health record poses unique modeling challenges.MethodsWe retrospectively reviewed the electronic medical records of all children younger than 18 years old who had at least two creatinine values measured during a hospital admission from January 2014 through January 2018. We divided the study population into derivation, and internal and external validation cohorts, and used five feature selection techniques to select 10 of 720 potentially predictive variables from the electronic health records. Model performance was assessed by the area under the receiver operating characteristic curve in the validation cohorts. The primary outcome was development of AKI (per the Kidney Disease Improving Global Outcomes creatinine definition) within a moving 48-hour window. Secondary outcomes included severe AKI (stage 2 or 3), inpatient mortality, and length of stay.ResultsAmong 8473 encounters studied, AKI occurred in 516 (10.2%), 207 (9%), and 27 (2.5%) encounters in the derivation, and internal and external validation cohorts, respectively. The highest-performing model used a machine learning-based genetic algorithm, with an overall receiver operating characteristic curve in the internal validation cohort of 0.76 [95% confidence interval (CI), 0.72 to 0.79] for AKI, 0.79 (95% CI, 0.74 to 0.83) for severe AKI, and 0.81 (95% CI, 0.77 to 0.86) for neonatal AKI. To translate this prediction model into a clinical risk-stratification tool, we identified high- and low-risk threshold points.ConclusionsUsing various machine learning algorithms, we identified and validated a time-updated prediction model of ten readily available electronic health record variables to accurately predict imminent AKI in hospitalized children.

Download Full-text

CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network

Scientific Reports ◽

10.1038/s41598-019-53034-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 5

Author(s):

Kanggeun Lee ◽

Hyoung-oh Jeong ◽

Semin Lee ◽

Won-Ki Jeong

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Genomic Data ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Somatic Alterations ◽

The Impact ◽

Type Classification

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.

Download Full-text