Prediction of Radiation Pneumonitis With Machine Learning in Stage III Lung Cancer: A Pilot Study

Technology in Cancer Research & Treatment ◽

10.1177/15330338211016373 ◽

2021 ◽

Vol 20 ◽

pp. 153303382110163

Author(s):

Melek Yakar ◽

Durmus Etiz ◽

Muzaffer Metintas ◽

Guntulu Ak ◽

Ozer Celik

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Radiation Pneumonitis ◽

Stage Iii ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Volume Number ◽

Light Gradient ◽

Extreme Gradient Boosting

Background: Radiation pneumonitis (RP) is a dose-limiting toxicity in lung cancer radiotherapy (RT). As risk factors in the development of RP, patient and tumor characteristics, dosimetric parameters, and treatment features are intertwined, and it is not always possible to associate RP with a single parameter. This study aimed to determine the algorithm that most accurately predicted RP development with machine learning. Methods: Of the 197 cases diagnosed with stage III lung cancer and underwent RT and chemotherapy between 2014 and 2020, 193 were evaluated. The CTCAE 5.0 grading system was used for the RP evaluation. Synthetic minority oversampling technique was used to create a balanced data set. Logistic regression, artificial neural networks, eXtreme Gradient Boosting (XGB), Support Vector Machines, Random Forest, Gaussian Naive Bayes and Light Gradient Boosting Machine algorithms were used. After the correlation analysis, a permutation-based method was utilized for as a variable selection. Results: RP was seen in 51 of the 193 cases. Parameters affecting RP were determined as, total(t)V5, ipsilateral lung Dmax, contralateral lung Dmax, total lung Dmax, gross tumor volume, number of chemotherapy cycles before RT, tumor size, lymph node localization and asbestos exposure. LGBM was found to be the algorithm that best predicted RP at 85% accuracy (confidence interval: 0.73-0.96), 97% sensitivity, and 50% specificity. Conclusion: When the clinical and dosimetric parameters were evaluated together, the LGBM algorithm had the highest accuracy in predicting RP. However, in order to use this algorithm in clinical practice, it is necessary to increase data diversity and the number of patients by sharing data between centers.

Get full-text (via PubEx)

Interpretable Machine Learning for Early Neurological Deterioration Prediction in Atrial Fibrillation-Related Stroke

10.21203/rs.3.rs-446890/v1 ◽

2021 ◽

Author(s):

Seong Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi O ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multi-center prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanations (SHAP) method to evaluate feature importance. Of the 3,623 stroke patients, the 2,363 who had arrived at the hospital within 24 hours of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.778, 95% CI, 0.726 - 0.830). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the SHAP method can be adjusted to individualize the features’ effects on the predictive power of the model.

Get full-text (via PubEx)

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Get full-text (via PubEx)

Interpretable Machine Learning Model to Predict Rupture of Small Intracranial Aneurysms and Facilitate Clinical Decision

10.21203/rs.3.rs-1015315/v1 ◽

2021 ◽

Author(s):

WeiGen Xiong ◽

TingTing Chen ◽

ZhiHong Zhao ◽

XueMei Li ◽

YaJie Shan ◽

...

Keyword(s):

Machine Learning ◽

Intracranial Aneurysms ◽

External Validation ◽

Maximum Size ◽

Clinical Decision ◽

Gradient Boosting ◽

Support Vector ◽

Rupture Risk ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Estimating the rupture risk of small intracranial aneurysms (IAs) to determine whether to treat is difficult but crucial. We aimed to construct and external validation a convenient machine learning (ML) model for assessing the rupture risk of small IAs.1004 patients with small IAs recruited from two hospitals were included in our retrospective research. The patients at hospital 1 were stratified into training (70%) and internal validation set (30%) randomly, and the patients at hospital 2 were used for external validation. We selected predictive features using the least absolute shrinkage and selection operator (LASSO) method, and constructed five ML models applying diverse algorithms including random forest classifier (RFC), categorical boosting (CatBoost), support vector machine (SVM) with linear kernel, light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost). The Shapley Additive Explanations (SHAP) analysis provided interpretation for the best ML model.The training, internal and external validation cohorts included 658, 282, and 64 IAs, respectively. The best performance was presented by SVM as AUC of 0.817 in the internal [95% confidence interval (CI), 0.769-0.866] and 0.893 in the external (95% CI, 0.808-0.979) validation cohorts, overperformed than the PHASES score significantly (all P < 0.001). SHAP analysis showed maximum size, location and irregular shape were the top three important features to predict rupture. Our SVM model based on readily accessible features presented satisfying ability of discrimination in predicting the rupture IAs with small size. Morphological parameters made important contributions to prediction result.

Get full-text (via PubEx)

A Predictive Mimicker of Fracture Behavior in Fiber Reinforced Concrete Using Machine Learning

Materials ◽

10.3390/ma14247669 ◽

2021 ◽

Vol 14 (24) ◽

pp. 7669

Author(s):

Sikandar Ali Khokhar ◽

Touqeer Ahmed ◽

Rao Arsalan Khushnood ◽

Syed Muhammad Ali ◽

Shahnawaz

Keyword(s):

Machine Learning ◽

Reinforced Concrete ◽

Fracture Behavior ◽

Fiber Reinforced Concrete ◽

Gradient Boosting ◽

Support Vector ◽

Fiber Reinforced ◽

Mixed Design ◽

Data Set ◽

Extreme Gradient Boosting

Due to the exceptional qualities of fiber reinforced concrete, its application is expanding day by day. However, its mixed design is mainly based on extensive experimentations. This study aims to construct a machine learning model capable of predicting the fracture behavior of all conceivable fiber reinforced concrete subclasses, especially strain hardening engineered cementitious composites. This study evaluates 15x input parameters that include the ingredients of the mixed design and the fiber properties. As a result, it predicts, for the first time, the post-peak fracture behavior of fiber-reinforced concrete matrices. Five machine learning models are developed, and their outputs are compared. These include artificial neural networks, the support vector machine, the classification and regression tree, the Gaussian process of regression, and the extreme gradient boosting tree. Due to the small size of the available dataset, this article employs a unique technique called the generative adversarial network to build a virtual data set to augment the data and improve accuracy. The results indicate that the extreme gradient boosting tree model has the lowest error and, therefore, the best mimicker in predicting fiber reinforced concrete properties. This article is anticipated to provide a considerable improvement in the recipe design of effective fiber reinforced concrete formulations.

Get full-text (via PubEx)

Interpretable machine learning for early neurological deterioration prediction in atrial fibrillation-related stroke

Scientific Reports ◽

10.1038/s41598-021-99920-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Seong-Hwan Kim ◽

Eun-Tae Jeon ◽

Sungwook Yu ◽

Kyungmi Oh ◽

Chi Kyung Kim ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Neurological Deterioration ◽

Gradient Boosting ◽

Support Vector ◽

Light Gradient ◽

Interpretable Machine Learning ◽

Extreme Gradient Boosting ◽

Early Neurological Deterioration ◽

Feature Importance

AbstractWe aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multicenter prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanation (SHAP) method to evaluate feature importance. Of the 3,213 stroke patients, the 2,363 who had arrived at the hospital within 24 h of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.772; 95% confidence interval, 0.715–0.829). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the effects of the features on the predictive power of the model were individualized using the SHAP method.

Get full-text (via PubEx)

Machine Learning in Prediction of Bladder Cancer on Clinical Laboratory Data

Diagnostics ◽

10.3390/diagnostics12010203 ◽

2022 ◽

Vol 12 (1) ◽

pp. 203

Author(s):

I-Jung Tsai ◽

Wen-Chi Shen ◽

Chia-Ling Lee ◽

Horng-Dar Wang ◽

Ching-Yu Lin

Keyword(s):

Machine Learning ◽

Bladder Cancer ◽

Clinical Laboratory ◽

Screening Method ◽

Laboratory Data ◽

Gradient Boosting ◽

Support Vector ◽

Urinary Cytology ◽

Light Gradient ◽

Extreme Gradient Boosting

Bladder cancer has been increasing globally. Urinary cytology is considered a major screening method for bladder cancer, but it has poor sensitivity. This study aimed to utilize clinical laboratory data and machine learning methods to build predictive models of bladder cancer. A total of 1336 patients with cystitis, bladder cancer, kidney cancer, uterus cancer, and prostate cancer were enrolled in this study. Two-step feature selection combined with WEKA and forward selection was performed. Furthermore, five machine learning models, including decision tree, random forest, support vector machine, extreme gradient boosting (XGBoost), and light gradient boosting machine (GBM) were applied. Features, including calcium, alkaline phosphatase (ALP), albumin, urine ketone, urine occult blood, creatinine, alanine aminotransferase (ALT), and diabetes were selected. The lightGBM model obtained an accuracy of 84.8% to 86.9%, a sensitivity 84% to 87.8%, a specificity of 82.9% to 86.7%, and an area under the curve (AUC) of 0.88 to 0.92 in discriminating bladder cancer from cystitis and other cancers. Our study provides a demonstration of utilizing clinical laboratory data to predict bladder cancer.

Get full-text (via PubEx)

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Get full-text (via PubEx)

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Get full-text (via PubEx)

Development and validation of a difficult laryngoscopy prediction model using machine learning of neck circumference and thyromental height

BMC Anesthesiology ◽

10.1186/s12871-021-01343-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Jong Ho Kim ◽

Haewon Kim ◽

Ji Su Jang ◽

Sung Mi Hwang ◽

So Young Lim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Confidence Interval ◽

Neck Circumference ◽

Difficult Laryngoscopy ◽

Gradient Boosting ◽

Test Set ◽

Equal Distribution ◽

Light Gradient ◽

Extreme Gradient Boosting

Abstract Background Predicting difficult airway is challengeable in patients with limited airway evaluation. The aim of this study is to develop and validate a model that predicts difficult laryngoscopy by machine learning of neck circumference and thyromental height as predictors that can be used even for patients with limited airway evaluation. Methods Variables for prediction of difficulty laryngoscopy included age, sex, height, weight, body mass index, neck circumference, and thyromental distance. Difficult laryngoscopy was defined as Grade 3 and 4 by the Cormack-Lehane classification. The preanesthesia and anesthesia data of 1677 patients who had undergone general anesthesia at a single center were collected. The data set was randomly stratified into a training set (80%) and a test set (20%), with equal distribution of difficulty laryngoscopy. The training data sets were trained with five algorithms (logistic regression, multilayer perceptron, random forest, extreme gradient boosting, and light gradient boosting machine). The prediction models were validated through a test set. Results The model’s performance using random forest was best (area under receiver operating characteristic curve = 0.79 [95% confidence interval: 0.72–0.86], area under precision-recall curve = 0.32 [95% confidence interval: 0.27–0.37]). Conclusions Machine learning can predict difficult laryngoscopy through a combination of several predictors including neck circumference and thyromental height. The performance of the model can be improved with more data, a new variable and combination of models.

Get full-text (via PubEx)

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Get full-text (via PubEx)