scholarly journals Development and Validation of a Novel Diagnostic Model for Initially Clinical Diagnosed Gastrointestinal Stromal Tumors Using An Extreme Gradient-Boosting Machine

Author(s):  
Bozhi Hu ◽  
Chao Wang ◽  
Kewei Jiang ◽  
Zhanlong Shen ◽  
Xiaodong Yang ◽  
...  

Abstract INTRODUCTION Gastrointestinal stromal tumor (GIST) is the most common gastrointestinal soft tissue tumor. Clinical diagnosis mainly relies on enhanced CT, endoscopy and endoscopic ultrasound (EUS), but the misdiagnosis rate is still high without fine needle aspiration biopsy. We aim to develop a novel diagnostic model by analyzing the preoperative data of the patients. METHODS We used the data of patients who were initially diagnosed as gastric GIST and underwent partial gastrectomy. The patients were randomly divided into training dataset and test dataset at a ratio of 3 to 1. After pre-experimental screening, max depth = 2, eta = 0.1, gamma = 0.5, and nrounds = 200 were defined as the best parameters, and in this way we developed the initial extreme gradient-boosting (XGBoost) model. Based on the importance of the features in the initial model, we improved the model by excluding the hematological features. In this way we obtained the final XGBoost model and underwent validation using the test dataset. RESULTS In the initial XGBoost model, we found that the hematological indicators (including inflammation and nutritional indicators) examined before the surgery had little effect on the outcome, so we subsequently excluded the hematological indicators. Similarly, we also screened the features from enhanced CT and ultrasound gastroscopy, and finally determined the 6 most important predictors for GIST diagnosis, including the ratio of long and short diameter under CT, the CT value of the tumor, the enhancement of the tumor in arterial period and venous period, existence of liquid area and calcific area inside the tumor under EUS. Round or round-like tumors with a CT value of around 30 (25–37) and delayed enhancement, as well as liquid but not calcific area inside the tumor best indicate the diagnosis of GIST. CONCLUSIONS We developed a model to further differential diagnose GIST from other tumors in initially clinical diagnosed gastric GIST patients by analyzing the results of clinical examinations that most patients should have completed before surgical resection.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Bozhi Hu ◽  
Chao Wang ◽  
Kewei Jiang ◽  
Zhanlong Shen ◽  
Xiaodong Yang ◽  
...  

Abstract Introduction Gastrointestinal stromal tumor (GIST) is the most common gastrointestinal soft tissue tumor. Clinical diagnosis mainly relies on enhanced CT, endoscopy and endoscopic ultrasound (EUS), but the misdiagnosis rate is still high without fine needle aspiration biopsy. We aim to develop a novel diagnostic model by analyzing the preoperative data of the patients. Methods We used the data of patients who were initially diagnosed as gastric GIST and underwent partial gastrectomy. The patients were randomly divided into training dataset and test dataset at a ratio of 3 to 1. After pre-experimental screening, max depth = 2, eta = 0.1, gamma = 0.5, and nrounds = 200 were defined as the best parameters, and in this way we developed the initial extreme gradient-boosting (XGBoost) model. Based on the importance of the features in the initial model, we improved the model by excluding the hematological features. In this way we obtained the final XGBoost model and underwent validation using the test dataset. Results In the initial XGBoost model, we found that the hematological indicators (including inflammation and nutritional indicators) examined before the surgery had little effect on the outcome, so we subsequently excluded the hematological indicators. Similarly, we also screened the features from enhanced CT and ultrasound gastroscopy, and finally determined the 6 most important predictors for GIST diagnosis, including the ratio of long and short diameter under CT, the CT value of the tumor, the enhancement of the tumor in arterial period and venous period, existence of liquid area and calcific area inside the tumor under EUS. Round or round-like tumors with a CT value of around 30 (25–37) and delayed enhancement, as well as liquid but not calcific area inside the tumor best indicate the diagnosis of GIST. Conclusions We developed a model to further differential diagnose GIST from other tumors in initially clinical diagnosed gastric GIST patients by analyzing the results of clinical examinations that most patients should have completed before surgical resection.


2020 ◽  
Vol 19 (1) ◽  
Author(s):  
Lulu Liu ◽  
Fangxiao Lu ◽  
Peipei Pang ◽  
Guoliang Shao

Abstract Background Anterior mediastinal cysts (AMC) are often misdiagnosed as thymomas and undergo surgical resection, which caused unnecessary treatment and medical resource waste. The purpose of this study is to explore potential possibility of computed tomography (CT)-based radiomics for the diagnosis of AMC and type B1 and B2 thymomas. Methods A group of 188 patients with pathologically confirmed AMC (106 cases misdiagnosed as thymomas in CT) and thymomas (82 cases) and underwent routine chest CT from January 2010 to December 2018 were retrospectively analyzed. The lesions were manually delineated using ITK-SNAP software, and radiomics features were performed using the artificial intelligence kit (AK) software. A total of 180 tumour texture features were extracted from enhanced CT and unenhanced CT, respectively. The general test, correlation analysis, and LASSO were used to features selection and then the radiomics signature (radscore) was obtained. The combined model including radscore and independent clinical factors was developed. The model performances were evaluated on discrimination, calibration curve. Results Two radscore models were constructed from the unenhanced and enhanced phases based on the selected four and three features, respectively. The AUC, sensitivity, and specificity of the enhanced radscore model were 0.928, 89.3%, and 83.8% in the training dataset and 0.899, 84.6%, and 87.5% in the test dataset (higher than the unenhanced radscore model). The combined model of enhanced CT including radiomics features and independent clinical factors yielded an AUC, sensitivity and specificity of 0.941, 82.1%, and 94.6% in the training dataset and 0.938, 92.3%, and 87.5% in the test dataset (higher than the unenhanced combined model and enhanced radscore model). Conclusions The study suggested the possibility that the combined model in enhanced CT provided a potential tool to facilitate the differential diagnosis of AMC and type B1 and B2 thymomas.


2021 ◽  
Vol 3 (3) ◽  
pp. 63-72
Author(s):  
Wanjun Zhao ◽  

Background: We aimed to establish a novel diagnostic model for kidney diseases by combining artificial intelligence with complete mass spectrum information from urinary proteomics. Methods: We enrolled 134 patients (IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as controls, with a total of 610,102 mass spectra from their urinary proteomic profiles. The training data set (80%) was used to create a diagnostic model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix with a test dataset (20%). We also constructed receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnostic model. Results: Compared with the RF, SVM, and ANNs, the modified XGBoost model, called Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the XGBoost diagnostic model was 96.03%. The area under the curve of the extreme gradient boosting (XGBoost) model was 0.952 (95% confidence interval, 0.9307–0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model. Conclusions: The KDClassifier achieved high accuracy and robustness and thus provides a potential tool for the classification of kidney diseases


2020 ◽  
Author(s):  
Lulu Liu ◽  
Fangxiao Lu ◽  
Peipei Pang ◽  
Guoliang Shao

Abstract Background Anterior mediastinal cysts (AMC) are often misdiagnosed as thymomas and undergo surgical resection, which caused unnecessary treatment and medical resources waste. The purpose of this study was to explore potential possibility of computed tomography (CT)-based radiomics for the diagnosis of AMC and type B1 and B2 thymomas. Methods A group of 188 patients with pathologically confirmed AMC (106 cases mischarged as thymomas in CT) and thymomas (82 cases) and underwent routine chest CT from January 2010 to December 2018 were retrospectively analyzed. The lesions were manually delineated using ITK-SNAP software, and radiomics features were performed using the Artificial Intelligence Kit (AK) software. A total of 396 tumor texture features were extracted from enhanced CT and unenhanced CT, respectively. The general test, correlation analysis and LASSO were used to features selection and then the radiomics signature (radscore) were obtained. The combined model including radscore and independent clinical factors were developed. The model performances were evaluated on discrimination, calibration curve. Results Two radscore model were constructed from the unenhanced and enhanced phases based on the selected 4 and 3 features, respectively. The AUC, sensitivity, and specificity of the enhanced radscore model were 0.928, 89.3% and 83.8% in the training dataset and 0.899, 84.6%, 87.5% in the test dataset (higher than the unenhanced radscore model). The combined model of enhanced CT including radiomics features and independent clinical factors yielded an AUC, sensitivity and specificity of 0.941, 82.1%, and 94.6% in the training dataset and 0.938, 92.3%, and 87.5% in the test dataset (higher than the unenhanced combined model and enhanced radscore model). Conclusions The study suggested the possibility that the combined model in enhanced CT provided a potential tool to facilitate the differential diagnosis of AMC and type B1 and B2 thymomas.


2021 ◽  
Vol 5 (2) ◽  
pp. 377-395
Author(s):  
Iqbal Hanif ◽  
Regita Fachri Septiani

Rating is one of the most frequently used metrics in the television industry to evaluate television programs or channels. This research is an attempt to develop a prediction model of television program ratings using rating data gathered from UseeTV (interned-based television service from Telkom Indonesia). The machine learning methods (Random Forest and Extreme Gradient Boosting) were tried out utilizing a set of rating data from 20 television programs collected from January 2018 to August 2019 (train dataset) and evaluated using September 2019 rating data (test dataset). Research results show that Random Forest gives a better result than Extreme Gradient Boosting based on evaluation metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). On the training dataset, prediction using Random Forest produced lower RMSE and MAE scores than Extreme Gradient Boosting in all programs, while on the testing dataset, Random Forest produced lower RMSE and MAE scores in 16 programs compared with Extreme Gradient Boosting. According to MAPE score, Random Forest produced more good quality prediction (4 programs in the training dataset, 16 programs in the testing dataset) than Extreme Gradient Boosting method (1 program in the training dataset, 12 programs in the testing dataset) both in training and testing dataset.


2021 ◽  
Vol 11 (9) ◽  
pp. 863
Author(s):  
Jeong-Myeong Choi ◽  
Soo-Young Seo ◽  
Pum-Jun Kim ◽  
Yu-Seop Kim ◽  
Sang-Hwa Lee ◽  
...  

Hemorrhagic transformation (HT) is one of the leading causes of a poor prognostic marker after acute ischemic stroke (AIS). We compared the performances of the several machine learning (ML) algorithms to predict HT after AIS using only structured data. A total of 2028 patients with AIS, who were admitted within seven days of symptoms onset, were included in this analysis. HT was defined based on the criteria of the European Co-operative Acute Stroke Study-II trial. The whole dataset was randomly divided into a training and a test dataset with a 7:3 ratio. Binary logistic regression, support vector machine, extreme gradient boosting, and artificial neural network (ANN) algorithms were used to assess the performance of predicting the HT occurrence after AIS. Five-fold cross validation and a grid search technique were used to optimize the hyperparameters of each ML model, which had its performance measured by the area under the receiver operating characteristic (AUROC) curve. Among the included AIS patients, the mean age and number of male subjects were 69.6 years and 1183 (58.3%), respectively. HT was observed in 318 subjects (15.7%). There were no significant differences in corresponding variables between the training and test dataset. Among all the ML algorithms, the ANN algorithm showed the best performance in terms of predicting the occurrence of HT in our dataset (0.844). Feature scaling including standardization and normalization, and the resampling strategy showed no additional improvement of the ANN’s performance. The ANN-based prediction of HT after AIS showed better performance than the conventional ML algorithms. Deep learning may be used to predict important outcomes for structured data-based prediction.


Heart ◽  
2021 ◽  
pp. heartjnl-2020-318726 ◽  
Author(s):  
Takahiro Nakashima ◽  
Soshiro Ogata ◽  
Teruo Noguchi ◽  
Yoshio Tahara ◽  
Daisuke Onozuka ◽  
...  

ObjectivesTo evaluate a predictive model for robust estimation of daily out-of-hospital cardiac arrest (OHCA) incidence using a suite of machine learning (ML) approaches and high-resolution meteorological and chronological data.MethodsIn this population-based study, we combined an OHCA nationwide registry and high-resolution meteorological and chronological datasets from Japan. We developed a model to predict daily OHCA incidence with a training dataset for 2005–2013 using the eXtreme Gradient Boosting algorithm. A dataset for 2014–2015 was used to test the predictive model. The main outcome was the accuracy of the predictive model for the number of daily OHCA events, based on mean absolute error (MAE) and mean absolute percentage error (MAPE). In general, a model with MAPE less than 10% is considered highly accurate.ResultsAmong the 1 299 784 OHCA cases, 661 052 OHCA cases of cardiac origin (525 374 cases in the training dataset on which fourfold cross-validation was performed and 135 678 cases in the testing dataset) were included in the analysis. Compared with the ML models using meteorological or chronological variables alone, the ML model with combined meteorological and chronological variables had the highest predictive accuracy in the training (MAE 1.314 and MAPE 7.007%) and testing datasets (MAE 1.547 and MAPE 7.788%). Sunday, Monday, holiday, winter, low ambient temperature and large interday or intraday temperature difference were more strongly associated with OHCA incidence than other the meteorological and chronological variables.ConclusionsA ML predictive model using comprehensive daily meteorological and chronological data allows for highly precise estimates of OHCA incidence.


2020 ◽  
Author(s):  
Lulu Liu ◽  
Fangxiao Lu ◽  
Peipei Pang ◽  
Guoliang Shao

Abstract Background: Anterior mediastinal cysts (AMC) are often misdiagnosed as thymomas and undergo surgical resection, which caused unnecessary treatment and medical resources waste. The purpose of this study is to explore potential possibility of computed tomography (CT)-based radiomics for the diagnosis of AMC and type B1 and B2 thymomas.Methods: A group of 188 patients with pathologically confirmed AMC (106 cases misdiagnosed as thymomas in CT) and thymomas (82 cases) and underwent routine chest CT from January 2010 to December 2018 were retrospectively analyzed. The lesions were manually delineated using ITK-SNAP software, and radiomics features were performed using the Artificial Intelligence Kit (AK) software. A total of 396 tumor texture features were extracted from enhanced CT and unenhanced CT, respectively. The general test, correlation analysis and LASSO were used to features selection and then the radiomics signature (radscore) were obtained. The combined model including radscore and independent clinical factors were developed. The model performances were evaluated on discrimination, calibration curve.Results: Two radscore model were constructed from the unenhanced and enhanced phases based on the selected 4 and 3 features, respectively. The AUC, sensitivity, and specificity of the enhanced radscore model were 0.928, 89.3% and 83.8% in the training dataset and 0.899, 84.6%, 87.5% in the test dataset (higher than the unenhanced radscore model). The combined model of enhanced CT including radiomics features and independent clinical factors yielded an AUC, sensitivity and specificity of 0.941, 82.1%, and 94.6% in the training dataset and 0.938, 92.3%, and 87.5% in the test dataset (higher than the unenhanced combined model and enhanced radscore model). Conclusions: The study suggested the possibility that the combined model in enhanced CT provided a potential tool to facilitate the differential diagnosis of AMC and type B1 and B2 thymomas.


2021 ◽  
Vol 9 (5) ◽  
pp. 409-409
Author(s):  
Xiaohan Tang ◽  
Rui Tang ◽  
Xingzhi Sun ◽  
Xiang Yan ◽  
Gan Huang ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Jun-ichi Takeda ◽  
Sae Fukami ◽  
Akira Tamura ◽  
Akihide Shibata ◽  
Kinji Ohno

Prediction of the effect of a single-nucleotide variant (SNV) in an intronic region on aberrant pre-mRNA splicing is challenging except for an SNV affecting the canonical GU/AG splice sites (ss). To predict pathogenicity of SNVs at intronic positions −50 (Int-50) to −3 (Int-3) close to the 3’ ss, we developed light gradient boosting machine (LightGBM)-based IntSplice2 models using pathogenic SNVs in the human gene mutation database (HGMD) and ClinVar and common SNVs in dbSNP with 0.01 ≤ minor allelic frequency (MAF) < 0.50. The LightGBM models were generated using features representing splicing cis-elements. The average recall/sensitivity and specificity of IntSplice2 by fivefold cross-validation (CV) of the training dataset were 0.764 and 0.884, respectively. The recall/sensitivity of IntSplice2 was lower than the average recall/sensitivity of 0.800 of IntSplice that we previously made with support vector machine (SVM) modeling for the same intronic positions. In contrast, the specificity of IntSplice2 was higher than the average specificity of 0.849 of IntSplice. For benchmarking (BM) of IntSplice2 with IntSplice, we made a test dataset that was not used to train IntSplice. After excluding the test dataset from the training dataset, we generated IntSplice2-BM and compared it with IntSplice using the test dataset. IntSplice2-BM was superior to IntSplice in all of the seven statistical measures of accuracy, precision, recall/sensitivity, specificity, F1 score, negative predictive value (NPV), and matthews correlation coefficient (MCC). We made the IntSplice2 web service at https://www.med.nagoya-u.ac.jp/neurogenetics/IntSplice2.


Sign in / Sign up

Export Citation Format

Share Document