scholarly journals Machine Learning-Based Analysis of Magnetic Resonance Radiomics for the Classification of Gliosarcoma and Glioblastoma

2021 ◽  
Vol 11 ◽  
Author(s):  
Zenghui Qian ◽  
Lingling Zhang ◽  
Jie Hu ◽  
Shuguang Chen ◽  
Hongyan Chen ◽  
...  

ObjectiveTo identify optimal machine-learning methods for the radiomics-based differentiation of gliosarcoma (GSM) from glioblastoma (GBM).Materials and MethodsThis retrospective study analyzed cerebral magnetic resonance imaging (MRI) data of 83 patients with pathologically diagnosed GSM (58 men, 25 women; mean age, 50.5 ± 12.9 years; range, 16-77 years) and 100 patients with GBM (58 men, 42 women; mean age, 53.4 ± 14.1 years; range, 12-77 years) and divided them into a training and validation set randomly. Radiomics features were extracted from the tumor mass and peritumoral edema. Three feature selection and classification methods were evaluated in terms of their performance in distinguishing GSM and GBM: the least absolute shrinkage and selection operator (LASSO), Relief, and Random Forest (RF); and adaboost classifier (Ada), support vector machine (SVM), and RF; respectively. The area under the receiver operating characteristic curve (AUC) and accuracy (ACC) of each method were analyzed.ResultsBased on tumor mass features, the selection method LASSO + classifier SVM was found to feature the highest AUC (0.85) and ACC (0.77) in the validation set, followed by Relief + RF (AUC = 0.84, ACC = 0.72) and LASSO + RF (AUC = 0.82, ACC = 0.75). Based on peritumoral edema features, Relief + SVM was found to have the highest AUC (0.78) and ACC (0.73) in the validation set. Regardless of the method, tumor mass features significantly outperformed peritumoral edema features in the differentiation of GSM from GBM (P < 0.05). Furthermore, the sensitivity, specificity, and accuracy of the best radiomics model were superior to those obtained by the neuroradiologists.ConclusionOur radiomics study identified the selection method LASSO combined with the classifier SVM as the optimal method for differentiating GSM from GBM based on tumor mass features.

Author(s):  
František Sabovčik ◽  
Nicholas Cauwenberghs ◽  
Dmitry Kouznetsov ◽  
Francois Haddad ◽  
Amparo Alonso-Betanzos ◽  
...  

Abstract Aims  Both left ventricular (LV) diastolic dysfunction (LVDD) and hypertrophy (LVH) as assessed by echocardiography are independent prognostic markers of future cardiovascular events in the community. However, selective screening strategies to identify individuals at risk who would benefit most from cardiac phenotyping are lacking. We, therefore, assessed the utility of several machine learning (ML) classifiers built on routinely measured clinical, biochemical, and electrocardiographic features for detecting subclinical LV abnormalities. Methods and results  We included 1407 participants (mean age, 51 years, 51% women) randomly recruited from the general population. We used echocardiographic parameters reflecting LV diastolic function and structure to define LV abnormalities (LVDD, n = 252; LVH, n = 272). Next, four supervised ML algorithms (XGBoost, AdaBoost, Random Forest (RF), Support Vector Machines, and Logistic regression) were used to build classifiers based on clinical data (67 features) to categorize LVDD and LVH. We applied a nested 10-fold cross-validation set-up. XGBoost and RF classifiers exhibited a high area under the receiver operating characteristic curve with values between 86.2% and 88.1% for predicting LVDD and between 77.7% and 78.5% for predicting LVH. Age, body mass index, different components of blood pressure, history of hypertension, antihypertensive treatment, and various electrocardiographic variables were the top selected features for predicting LVDD and LVH. Conclusion  XGBoost and RF classifiers combining routinely measured clinical, laboratory, and electrocardiographic data predicted LVDD and LVH with high accuracy. These ML classifiers might be useful to pre-select individuals in whom further echocardiographic examination, monitoring, and preventive measures are warranted.


2021 ◽  
Author(s):  
Huan Wang ◽  
Wei Wu ◽  
Chunxia Han ◽  
Jiaqi Zheng ◽  
Xinyu Cai ◽  
...  

BACKGROUND The absolute number of femoral neck fractures (FNFs) is increasing; however, the prediction of traumatic femoral head necrosis remains difficult. Machine learning algorithms have the potential to be superior to traditional prediction methods for the prediction of traumatic femoral head necrosis. OBJECTIVE The aim of this study is to use machine learning to construct a model for the analysis of risk factors and prediction of osteonecrosis of the femoral head (ONFH) in patients with FNF after internal fixation. METHODS We retrospectively collected preoperative, intraoperative, and postoperative clinical data of patients with FNF in 4 hospitals in Shanghai and followed up the patients for more than 2.5 years. A total of 259 patients with 43 variables were included in the study. The data were randomly divided into a training set (181/259, 69.8%) and a validation set (78/259, 30.1%). External data (n=376) were obtained from a retrospective cohort study of patients with FNF in 3 other hospitals. Least absolute shrinkage and selection operator regression and the support vector machine algorithm were used for variable selection. Logistic regression, random forest, support vector machine, and eXtreme Gradient Boosting (XGBoost) were used to develop the model on the training set. The validation set was used to tune the model hyperparameters to determine the final prediction model, and the external data were used to compare and evaluate the model performance. We compared the accuracy, discrimination, and calibration of the models to identify the best machine learning algorithm for predicting ONFH. Shapley additive explanations and local interpretable model-agnostic explanations were used to determine the interpretability of the black box model. RESULTS A total of 11 variables were selected for the models. The XGBoost model performed best on the validation set and external data. The accuracy, sensitivity, and area under the receiver operating characteristic curve of the model on the validation set were 0.987, 0.929, and 0.992, respectively. The accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of the model on the external data were 0.907, 0.807, 0.935, and 0.933, respectively, and the log-loss was 0.279. The calibration curve demonstrated good agreement between the predicted probability and actual risk. The interpretability of the features and individual predictions were realized using the Shapley additive explanations and local interpretable model-agnostic explanations algorithms. In addition, the XGBoost model was translated into a self-made web-based risk calculator to estimate an individual’s probability of ONFH. CONCLUSIONS Machine learning performs well in predicting ONFH after internal fixation of FNF. The 6-variable XGBoost model predicted the risk of ONFH well and had good generalization ability on the external data, which can be used for the clinical prediction of ONFH after internal fixation of FNF.


2018 ◽  
Vol 26 (1) ◽  
pp. 141-155 ◽  
Author(s):  
Li Luo ◽  
Fengyi Zhang ◽  
Yao Yao ◽  
RenRong Gong ◽  
Martina Fu ◽  
...  

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.


2020 ◽  
Author(s):  
Murad Megjhani ◽  
Kalijah Terilli ◽  
Ayham Alkhachroum ◽  
David J. Roh ◽  
Sachin Agarwal ◽  
...  

AbstractObjectiveTo develop a machine learning based tool, using routine vital signs, to assess delayed cerebral ischemia (DCI) risk over time.MethodsIn this retrospective analysis, physiologic data for 540 consecutive acute subarachnoid hemorrhage patients were collected and annotated as part of a prospective observational cohort study between May 2006 and December 2014. Patients were excluded if (i) no physiologic data was available, (ii) they expired prior to the DCI onset window (< post bleed day 3) or (iii) early angiographic vasospasm was detected on admitting angiogram. DCI was prospectively labeled by consensus of treating physicians. Occurrence of DCI was classified using various machine learning approaches including logistic regression, random forest, support vector machine (linear and kernel), and an ensemble classifier, trained on vitals and subject characteristic features. Hourly risk scores were generated as the posterior probability at time t. We performed five-fold nested cross validation to tune the model parameters and to report the accuracy. All classifiers were evaluated for good discrimination using the area under the receiver operating characteristic curve (AU-ROC) and confusion matrices.ResultsOf 310 patients included in our final analysis, 101 (32.6%) patients developed DCI. We achieved maximal classification of 0.81 [0.75-0.82] AU-ROC. We also predicted 74.7 % of all DCI events 12 hours before typical clinical detection with a ratio of 3 true alerts for every 2 false alerts.ConclusionA data-driven machine learning based detection tool offered hourly assessments of DCI risk and incorporated new physiologic information over time.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10884
Author(s):  
Xin Yu ◽  
Qian Yang ◽  
Dong Wang ◽  
Zhaoyang Li ◽  
Nianhang Chen ◽  
...  

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.


mBio ◽  
2020 ◽  
Vol 11 (3) ◽  
Author(s):  
Begüm D. Topçuoğlu ◽  
Nicholas A. Lesniak ◽  
Mack T. Ruffin ◽  
Jenna Wiens ◽  
Patrick D. Schloss

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.


2020 ◽  
Vol 187 ◽  
pp. 04001
Author(s):  
Ravipat Lapcharoensuk ◽  
Kitticheat Danupattanin ◽  
Chaowarin Kanjanapornprapa ◽  
Tawin Inkawee

This research aimed to study the combination of NIR spectroscopy and machine learning for monitoring chilli sauce adulterated with papaya smoothie. The chilli sauce was produced by the famous community enterprise of chilli sauce processing in Thailand. The ingredients of the chilli sauce consisted of 45% chilli, 25% sugar, 20% garlic, 5% vinegar, and 5% salt. The chilli sauce sample was mixed with ripened papaya (Khaek Dam variety) smoothie with 9 levels from 10 to 90 %w/w. The NIR spectra of pure chilli sauce, papaya smoothie and 9 adulterated chilli sauce samples were recorded using FT-NIR spectrometer in the wavenumber range of 12500 and 4000 cm-1. Three machine learning algorithms were applied to develop a model for monitoring adulterated chilli sauce, including partial least squares regression (PLS), support vector machine (SVM), and backpropagation neural network (BPNN). All model presented performance of prediction in the validation set with R2al = 0.99 while RMSEP of PLS, SVM and BPNN were 1.71, 2.18 and 3.27% w/w respectively. This finding indicated that NIR spectroscopy coupled with machine learning approaches were shown to be an alternative technique to monitor papaya smoothie adulterated in chilli sauce in the global food industry.


2020 ◽  
pp. 009385482096975
Author(s):  
Mehdi Ghasemi ◽  
Daniel Anvari ◽  
Mahshid Atapour ◽  
J. Stephen wormith ◽  
Keira C. Stockdale ◽  
...  

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.


2019 ◽  
Vol 11 (16) ◽  
pp. 1943 ◽  
Author(s):  
Omid Rahmati ◽  
Saleh Yousefi ◽  
Zahra Kalantari ◽  
Evelyn Uuemaa ◽  
Teimur Teimurian ◽  
...  

Mountainous areas are highly prone to a variety of nature-triggered disasters, which often cause disabling harm, death, destruction, and damage. In this work, an attempt was made to develop an accurate multi-hazard exposure map for a mountainous area (Asara watershed, Iran), based on state-of-the art machine learning techniques. Hazard modeling for avalanches, rockfalls, and floods was performed using three state-of-the-art models—support vector machine (SVM), boosted regression tree (BRT), and generalized additive model (GAM). Topo-hydrological and geo-environmental factors were used as predictors in the models. A flood dataset (n = 133 flood events) was applied, which had been prepared using Sentinel-1-based processing and ground-based information. In addition, snow avalanche (n = 58) and rockfall (n = 101) data sets were used. The data set of each hazard type was randomly divided to two groups: Training (70%) and validation (30%). Model performance was evaluated by the true skill score (TSS) and the area under receiver operating characteristic curve (AUC) criteria. Using an exposure map, the multi-hazard map was converted into a multi-hazard exposure map. According to both validation methods, the SVM model showed the highest accuracy for avalanches (AUC = 92.4%, TSS = 0.72) and rockfalls (AUC = 93.7%, TSS = 0.81), while BRT demonstrated the best performance for flood hazards (AUC = 94.2%, TSS = 0.80). Overall, multi-hazard exposure modeling revealed that valleys and areas close to the Chalous Road, one of the most important roads in Iran, were associated with high and very high levels of risk. The proposed multi-hazard exposure framework can be helpful in supporting decision making on mountain social-ecological systems facing multiple hazards.


Cancers ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 3406
Author(s):  
Elisabeth Bumes ◽  
Fro-Philip Wirtz ◽  
Claudia Fellner ◽  
Jirka Grosse ◽  
Dirk Hellwig ◽  
...  

Isocitrate dehydrogenase (IDH)-1 mutation is an important prognostic factor and a potential therapeutic target in glioma. Immunohistological and molecular diagnosis of IDH mutation status is invasive. To avoid tumor biopsy, dedicated spectroscopic techniques have been proposed to detect D-2-hydroxyglutarate (2-HG), the main metabolite of IDH, directly in vivo. However, these methods are technically challenging and not broadly available. Therefore, we explored the use of machine learning for the non-invasive, inexpensive and fast diagnosis of IDH status in standard 1H-magnetic resonance spectroscopy (1H-MRS). To this end, 30 of 34 consecutive patients with known or suspected glioma WHO grade II-IV were subjected to metabolic positron emission tomography (PET) imaging with O-(2-18F-fluoroethyl)-L-tyrosine (18F-FET) for optimized voxel placement in 1H-MRS. Routine 1H-magnetic resonance (1H-MR) spectra of tumor and contralateral healthy brain regions were acquired on a 3 Tesla magnetic resonance (3T-MR) scanner, prior to surgical tumor resection and molecular analysis of IDH status. Since 2-HG spectral signals were too overlapped for reliable discrimination of IDH mutated (IDHmut) and IDH wild-type (IDHwt) glioma, we used a nested cross-validation approach, whereby we trained a linear support vector machine (SVM) on the complete spectral information of the 1H-MRS data to predict IDH status. Using this approach, we predicted IDH status with an accuracy of 88.2%, a sensitivity of 95.5% (95% CI, 77.2–99.9%) and a specificity of 75.0% (95% CI, 42.9–94.5%), respectively. The area under the curve (AUC) amounted to 0.83. Subsequent ex vivo 1H-nuclear magnetic resonance (1H-NMR) measurements performed on metabolite extracts of resected tumor material (eight specimens) revealed myo-inositol (M-ins) and glycine (Gly) to be the major discriminators of IDH status. We conclude that our approach allows a reliable, non-invasive, fast and cost-effective prediction of IDH status in a standard clinical setting.


Sign in / Sign up

Export Citation Format

Share Document