scholarly journals NI-13 The effectiveness and limitation of survival prediction in primary glioblastoma using machine learning-based texture analysis

2020 ◽  
Vol 2 (Supplement_3) ◽  
pp. ii14-ii14
Author(s):  
Toru Umehara ◽  
Manabu Kinoshita ◽  
Takahiro Sasaki ◽  
Hideyuki Arita ◽  
Ema Yoshioka ◽  
...  

Abstract Introduction: Clinical application of survival prediction of primary glioblastoma (pGBM) using preoperative images remains challenging due to a lack of robustness and standardization of the method. This research focused on validating a machine learning-based texture analysis model for this purpose using internal and external cohorts. Method: We included all cases of IDH wild-type pGBM available of preoperative MRI (T1WI, T2WI, and Gd-T1WI) from the databases of Kansai Molecular Diagnosis Network for CNS tumors (KN) and The Cancer Genome Atlas (TCGA). Of 242 cases from KN, we assigned 137 cases as a training dataset (D1), and the remaining 105 cases as an internal validation dataset (D2). Furthermore, we extracted 96 cases from TCGA as an external validation dataset (D3). Preoperative MRI scans were semi-quantitatively analyzed, leading to the acquisition of 489 texture features as explanatory variables. Dichotomous overall survival (OS) with a 16.6 months cutoff was regarded as the response variable (short/long OS). We employed Lasso regression for feature selection, and a survival prediction model constructed for D1 via cross-validation (M1) was applied to D2 and D3 to ensure the model robustness. Results: The population of predicted short OS by M1 significantly showed poorer prognosis in D2 (median OS 11.1 vs. 19.4 months; log-rank test, p=0.03), while there was no significant difference in D3 (median OS 14.2 vs. 11.9 months; p=0.61). In the comparative analysis using t-SNE, there was little variation in the feature distribution among three datasets. Conclusion: We were able to validate the prediction model in the internal but not in the external cohort. The presented result supports the use of machine learning-based texture analysis for survival prediction of pGBM in a localized population or country. However, further consideration is required to achieve a universal prediction model for pGBM, irrespective of regional difference.

2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Yue Gao ◽  
Lingxi Chen ◽  
Jianhua Chi ◽  
Shaoqing Zeng ◽  
Xikang Feng ◽  
...  

Abstract Background Immune and inflammatory dysfunction was reported to underpin critical COVID-19(coronavirus disease 2019). We aim to develop a machine learning model that enables accurate prediction of critical COVID-19 using immune-inflammatory features at admission. Methods We retrospectively collected 2076 consecutive COVID-19 patients with definite outcomes (discharge or death) between January 27, 2020 and March 30, 2020 from two hospitals in China. Critical illness was defined as admission to intensive care unit, receiving invasive ventilation, or death. Least Absolute Shrinkage and Selection Operator (LASSO) was applied for feature selection. Five machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosted Decision Tree (GBDT), K-Nearest Neighbor (KNN), and Neural Network (NN) were built in a training dataset, and assessed in an internal validation dataset and an external validation dataset. Results Six features (procalcitonin, [T + B + NK cell] count, interleukin 6, C reactive protein, interleukin 2 receptor, T-helper lymphocyte/T-suppressor lymphocyte) were finally used for model development. Five models displayed varying but all promising predictive performance. Notably, the ensemble model, SPMCIIP (severity prediction model for COVID-19 by immune-inflammatory parameters), derived from three contributive algorithms (SVM, GBDT, and NN) achieved the best performance with an area under the curve (AUC) of 0.991 (95% confidence interval [CI] 0.979–1.000) in internal validation cohort and 0.999 (95% CI 0.998–1.000) in external validation cohort to identify patients with critical COVID-19. SPMCIIP could accurately and expeditiously predict the occurrence of critical COVID-19 approximately 20 days in advance. Conclusions The developed online prediction model SPMCIIP is hopeful to facilitate intensive monitoring and early intervention of high risk of critical illness in COVID-19 patients. Trial registration This study was retrospectively registered in the Chinese Clinical Trial Registry (ChiCTR2000032161). Graphical abstracthelper lymphocytve vv


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dongyan Ding ◽  
Tingyuan Lang ◽  
Dongling Zou ◽  
Jiawei Tan ◽  
Jia Chen ◽  
...  

Abstract Background Accurately forecasting the prognosis could improve cervical cancer management, however, the currently used clinical features are difficult to provide enough information. The aim of this study is to improve forecasting capability by developing a miRNAs-based machine learning survival prediction model. Results The expression characteristics of miRNAs were chosen as features for model development. The cervical cancer miRNA expression data was obtained from The Cancer Genome Atlas database. Preprocessing, including unquantified data removal, missing value imputation, samples normalization, log transformation, and feature scaling, was performed. In total, 42 survival-related miRNAs were identified by Cox Proportional-Hazards analysis. The patients were optimally clustered into four groups with three different 5-years survival outcome (≥ 90%, ≈ 65%, ≤ 40%) by K-means clustering algorithm base on top 10 survival-related miRNAs. According to the K-means clustering result, a prediction model with high performance was established. The pathways analysis indicated that the miRNAs used play roles involved in the regulation of cancer stem cells. Conclusion A miRNAs-based machine learning cervical cancer survival prediction model was developed that robustly stratifies cervical cancer patients into high survival rate (5-years survival rate ≥ 90%), moderate survival rate (5-years survival rate ≈ 65%), and low survival rate (5-years survival rate ≤ 40%).


2014 ◽  
Vol 21 (6) ◽  
pp. 891-902 ◽  
Author(s):  
Min-Hee Kim ◽  
Ja Seong Bae ◽  
Dong-Jun Lim ◽  
Hyoungnam Lee ◽  
So Ra Jeon ◽  
...  

The BRAF V600E mutation is the most common genetic alteration in thyroid cancer. However, its clinicopathological significance and clonal mutation frequency remain unclear. To clarify the inconsistent results, we investigated the association between the allelic frequency of BRAF V600E and the clinicopathological features of classic papillary thyroid carcinoma (PTC). Tumour tissues from two independent sets of patients with classic PTC were manually microdissected and analysed for the presence or absence of the BRAF mutation and the mutant allelic frequency using quantitative pyrosequencing. For external validation, the Cancer Genome Atlas (TCGA) data were analysed. The BRAF V600E mutation was found in 264 (82.2%) out of 321 classic PTCs in the training set. The presence of BRAF V600E was only associated with extrathyroidal extension and the absence of thyroiditis. In BRAF V600E-positive tumours, the mutant allelic frequency varied from 8 to 41% of the total BRAF alleles (median, 20%) and directly correlated with tumour size and the number of metastatic lymph nodes. Lymph node metastases were more frequent in PTCs with a high (≥20%) abundance of mutant alleles than in those with a low abundance of mutant alleles (P=0.010). These results were reinforced by validation dataset (n=348) analysis but were not reproduced in the TCGA dataset. In a population with prevalent BRAF mutations, quantitative analysis of the BRAF mutation could provide additional information regarding tumour behaviour, which is not reflected by qualitative analysis. Nonetheless, prospective studies are needed before the mutated allele percentage can be considered as a prognostic factor.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii203-ii203
Author(s):  
Alexander Hulsbergen ◽  
Yu Tung Lo ◽  
Vasileios Kavouridis ◽  
John Phillips ◽  
Timothy Smith ◽  
...  

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10884
Author(s):  
Xin Yu ◽  
Qian Yang ◽  
Dong Wang ◽  
Zhaoyang Li ◽  
Nianhang Chen ◽  
...  

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.


2019 ◽  
Author(s):  
Zied Hosni ◽  
Annalisa Riccardi ◽  
Stephanie Yerdelen ◽  
Alan R. G. Martin ◽  
Deborah Bowering ◽  
...  

<div><div><p>Polymorphism is the capacity of a molecule to adopt different conformations or molecular packing arrangements in the solid state. This is a key property to control during pharmaceutical manufacturing because it can impact a range of properties including stability and solubility. In this study, a novel approach based on machine learning classification methods is used to predict the likelihood for an organic compound to crystallise in multiple forms. A training dataset of drug-like molecules was curated from the Cambridge Structural Database (CSD) and filtered according to entries in the Drug Bank database. The number of separate forms in the CSD for each molecule was recorded. A metaclassifier was trained using this dataset to predict the expected number of crystalline forms from the compound descriptors. This approach was used to estimate the number of crystallographic forms for an external validation dataset. These results suggest this novel methodology can be used to predict the extent of polymorphism of new drugs or not-yet experimentally screened molecules. This promising method complements expensive ab initio methods for crystal structure prediction and as integral to experimental physical form screening, may identify systems that with unexplored potential.</p> </div> </div>


2020 ◽  
Vol 31 (6) ◽  
pp. 1348-1357 ◽  
Author(s):  
Ibrahim Sandokji ◽  
Yu Yamamoto ◽  
Aditya Biswas ◽  
Tanima Arora ◽  
Ugochukwu Ugwuowo ◽  
...  

BackgroundTimely prediction of AKI in children can allow for targeted interventions, but the wealth of data in the electronic health record poses unique modeling challenges.MethodsWe retrospectively reviewed the electronic medical records of all children younger than 18 years old who had at least two creatinine values measured during a hospital admission from January 2014 through January 2018. We divided the study population into derivation, and internal and external validation cohorts, and used five feature selection techniques to select 10 of 720 potentially predictive variables from the electronic health records. Model performance was assessed by the area under the receiver operating characteristic curve in the validation cohorts. The primary outcome was development of AKI (per the Kidney Disease Improving Global Outcomes creatinine definition) within a moving 48-hour window. Secondary outcomes included severe AKI (stage 2 or 3), inpatient mortality, and length of stay.ResultsAmong 8473 encounters studied, AKI occurred in 516 (10.2%), 207 (9%), and 27 (2.5%) encounters in the derivation, and internal and external validation cohorts, respectively. The highest-performing model used a machine learning-based genetic algorithm, with an overall receiver operating characteristic curve in the internal validation cohort of 0.76 [95% confidence interval (CI), 0.72 to 0.79] for AKI, 0.79 (95% CI, 0.74 to 0.83) for severe AKI, and 0.81 (95% CI, 0.77 to 0.86) for neonatal AKI. To translate this prediction model into a clinical risk-stratification tool, we identified high- and low-risk threshold points.ConclusionsUsing various machine learning algorithms, we identified and validated a time-updated prediction model of ten readily available electronic health record variables to accurately predict imminent AKI in hospitalized children.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kanggeun Lee ◽  
Hyoung-oh Jeong ◽  
Semin Lee ◽  
Won-Ki Jeong

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.


Sign in / Sign up

Export Citation Format

Share Document