scholarly journals Impact of feature selection methods and subgroup factors on prognostic analysis with CT-based radiomics in non-small cell lung cancer patients

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Yuto Sugai ◽  
Noriyuki Kadoya ◽  
Shohei Tanaka ◽  
Shunpei Tanabe ◽  
Mariko Umeda ◽  
...  

Abstract Background Radiomics is a new technology to noninvasively predict survival prognosis with quantitative features extracted from medical images. Most radiomics-based prognostic studies of non-small-cell lung cancer (NSCLC) patients have used mixed datasets of different subgroups. Therefore, we investigated the radiomics-based survival prediction of NSCLC patients by focusing on subgroups with identical characteristics. Methods A total of 304 NSCLC (Stages I–IV) patients treated with radiotherapy in our hospital were used. We extracted 107 radiomic features (i.e., 14 shape features, 18 first-order statistical features, and 75 texture features) from the gross tumor volume drawn on the free breathing planning computed tomography image. Three feature selection methods [i.e., test–retest and multiple segmentation (FS1), Pearson's correlation analysis (FS2), and a method that combined FS1 and FS2 (FS3)] were used to clarify how they affect survival prediction performance. Subgroup analysis for each histological subtype and each T stage applied the best selection method for the analysis of All data. We used a least absolute shrinkage and selection operator Cox regression model for all analyses and evaluated prognostic performance using the concordance-index (C-index) and the Kaplan–Meier method. For subgroup analysis, fivefold cross-validation was applied to ensure model reliability. Results In the analysis of All data, the C-index for the test dataset is 0.62 (FS1), 0.63 (FS2), and 0.62 (FS3). The subgroup analysis indicated that the prediction model based on specific histological subtypes and T stages had a higher C-index for the test dataset than that based on All data (All data, 0.64 vs. SCCall, 060; ADCall, 0.69; T1, 0.68; T2, 0.65; T3, 0.66; T4, 0.70). In addition, the prediction models unified for each T stage in histological subtype showed a different trend in the C-index for the test dataset between ADC-related and SCC-related models (ADCT1–ADCT4, 0.72–0.83; SCCT1–SCCT4, 0.58–0.71). Conclusions Our results showed that feature selection methods moderately affected the survival prediction performance. In addition, prediction models based on specific subgroups may improve the prediction performance. These results may prove useful for determining the optimal radiomics-based predication model.

Cancers ◽  
2021 ◽  
Vol 13 (16) ◽  
pp. 4030
Author(s):  
Chien-Yi Liao ◽  
Cheng-Chia Lee ◽  
Huai-Che Yang ◽  
Ching-Jen Chen ◽  
Wen-Yuh Chung ◽  
...  

The diagnosis of brain metastasis (BM) is commonly observed in non-small cell lung cancer (NSCLC) with poor outcomes. Accordingly, developing an approach to early predict BM response to Gamma Knife radiosurgery (GKRS) may benefit the patient treatment and monitoring. A total of 237 NSCLC patients with BMs (for survival prediction) and 256 patients with 976 BMs (for prediction of local tumor control) treated with GKRS were retrospectively analyzed. All the survival data were recorded without censoring, and the status of local tumor control was determined by comparing the last MRI follow-up in patients’ lives with the pre-GKRS MRI. Overall 1763 radiomic features were extracted from pre-radiosurgical magnetic resonance images. Three prediction models were constructed, using (1) clinical data, (2) radiomic features, and (3) clinical and radiomic features. Support vector machines with a 30% hold-out validation approach were constructed. For treatment outcome predictions, the models derived from both the clinical and radiomics data achieved the best results. For local tumor control, the combined model achieved an area under the curve (AUC) of 0.95, an accuracy of 90%, a sensitivity of 91%, and a specificity of 89%. For patient survival, the combined model achieved an AUC of 0.81, an accuracy of 77%, a sensitivity of 78%, and a specificity of 80%. The pre-radiosurgical radiomics data enhanced the performance of local tumor control and survival prediction models in NSCLC patients with BMs treated with GRKS. An outcome prediction model based on radiomics combined with clinical features may guide therapy in these patients.


2021 ◽  
Vol 9 ◽  
Author(s):  
Guiyuan Xiang ◽  
Lingna Gu ◽  
Xuan Chen ◽  
Fan Wang ◽  
Bohua Chen ◽  
...  

Background: As the first domestic PD-1 antibody approved for lung cancer in China, camrelizumab has exhibited proven effectiveness for non-small-cell lung cancer (NSCLC) patients. However, the cost-effectiveness of this new regimen remains to be investigated.Objective: To evaluate the cost-effectiveness of camrelizumab combination therapy vs. chemotherapy for previously untreated patients with advanced, non-squamous NSCLC without Alk or Egfr genomic aberrations from the perspective of China's healthcare system.Methods: Based on the CameL trial, the study developed a three-health state Markov model to evaluate the cost-effectiveness of adding camrelizumab to chemotherapy compared to chemotherapy alone in NSCLC patients. The analysis models were conducted for patients unselected by PD-L1 tumor expression (the base case) and the patient subgroup with PD-L1-expressing tumors (≥1%). Primary model outcomes included the costs in US dollars and health outcomes in quality-adjusted life-years (QALYs) as well as the incremental cost-effectiveness ratio (ICER) under a willingness-to-pay threshold of $31,500 per QALY. Additionally, a scenario analysis that adjusted within-trial crossover was employed to evaluate camrelizumab combination therapy compared to chemotherapy without subsequent use of PD1/PD-L1 antibodies.Results: Camrelizumab combination therapy was more costly and provided additional 0.11 QALYs over chemotherapy in the base case analysis (0.86 vs. 0.75 QALYs), 0.12 QALYs over chemotherapy in the subgroup analysis (0.99 vs. 0.88 QALYs), and 0.34 QALYs over chemotherapy in the scenario analysis (0.86 vs. 0.52 QALYs). Correspondingly, the ICER was $63,080 per QALY, $46,311 per QALY, and $30,591 per QALY, in the base case, the subgroup, and the scenario analysis, respectively. One-way sensitivity analyses revealed that ICERs of the base case and the subgroup analysis were most sensitive to the cost of camrelizumab, the cost of pemetrexed. Besides, the base case and subgroup analysis were more sensitive to the risk of neutrophil count decreased in the camrelizumab and the utility of stable disease, respectively.Conclusion: Although camrelizumab combination therapy is not cost-effective as first-line therapy for NSCLC patients in China in the base case, adjusting within-trial crossover would move the treatment regimen toward cost-effectiveness in the scenario analysis.


10.2196/15601 ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. e15601 ◽  
Author(s):  
Quazi Abidur Rahman ◽  
Tahir Janmohamed ◽  
Hance Clarke ◽  
Paul Ritvo ◽  
Jane Heffernan ◽  
...  

Background Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue. Objective This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue. Methods A total of 132 features were extracted from the first month of app use to develop machine learning–based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.


2021 ◽  
Vol 348 ◽  
pp. 01002
Author(s):  
Assia Najm ◽  
Abdelali Zakrani ◽  
Abdelaziz Marzak

The software cost prediction is a crucial element for a project’s success because it helps the project managers to efficiently estimate the needed effort for any project. There exist in literature many machine learning methods like decision trees, artificial neural networks (ANN), and support vector regressors (SVR), etc. However, many studies confirm that accurate estimations greatly depend on hyperparameters optimization, and on the proper input feature selection that impacts highly the accuracy of software cost prediction models (SCPM). In this paper, we propose an enhanced model using SVR and the Optainet algorithm. The Optainet is used at the same time for 1-selecting the best set of features and 2-for tuning the parameters of the SVR model. The experimental evaluation was conducted using a 30% holdout over seven datasets. The performance of the suggested model is then compared to the tuned SVR model using Optainet without feature selection. The results were also compared to the Boruta and random forest features selection methods. The experiments show that for overall datasets, the Optainet-based method improves significantly the accuracy of the SVR model and it outperforms the random forest and Boruta feature selection methods.


2021 ◽  
Vol 12 ◽  
Author(s):  
Nasim Vahabi ◽  
Caitrin W. McDonough ◽  
Ankit A. Desai ◽  
Larisa H. Cavallari ◽  
Julio D. Duarte ◽  
...  

BackgroundThe development of high-throughput techniques has enabled profiling a large number of biomolecules across a number of molecular compartments. The challenge then becomes to integrate such multimodal Omics data to gain insights into biological processes and disease onset and progression mechanisms. Further, given the high dimensionality of such data, incorporating prior biological information on interactions between molecular compartments when developing statistical models for data integration is beneficial, especially in settings involving a small number of samples.ResultsWe develop a supervised model for time to event data (e.g., death, biochemical recurrence) that simultaneously accounts for redundant information within Omics profiles and leverages prior biological associations between them through a multi-block PLS framework. The interactions between data from different molecular compartments (e.g., epigenome, transcriptome, methylome, etc.) were captured by using cis-regulatory quantitative effects in the proposed model. The model, coined Cox-sMBPLS, exhibits superior prediction performance and improved feature selection based on both simulation studies and analysis of data from heart failure patients.ConclusionThe proposed supervised Cox-sMBPLS model can effectively incorporate prior biological information in the survival prediction system, leading to improved prediction performance and feature selection. It also enables the identification of multi-Omics modules of biomolecules that impact the patients’ survival probability and also provides insights into potential relevant risk factors that merit further investigation.


Author(s):  
Thị Minh Phương Hà ◽  
Thi My Hanh Le ◽  
Thanh Binh Nguyen

The rapid growth of data has become a huge challenge for software systems. The quality of fault predictionmodel depends on the quality of software dataset. High-dimensional data is the major problem that affects the performance of the fault prediction models. In order to deal with dimensionality problem, feature selection is proposed by various researchers. Feature selection method provides an effective solution by eliminating irrelevant and redundant features, reducing computation time and improving the accuracy of the machine learning model. In this study, we focus on research and synthesis of the Filter-based feature selection with several search methods and algorithms. In addition, five filter-based feature selection methods are analyzed using five different classifiers over datasets obtained from National Aeronautics and Space Administration (NASA) repository. The experimental results show that Chi-Square and Information Gain methods had the best influence on the results of predictive models over other filter ranking methods.


2021 ◽  
Vol 11 ◽  
Author(s):  
Zongzhen He ◽  
Junying Zhang ◽  
Xiguo Yuan ◽  
Yuanyuan Zhang

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.


Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1244
Author(s):  
Lin Hao ◽  
Juncheol Kim ◽  
Sookhee Kwon ◽  
Il Do Ha

With the development of high-throughput technologies, more and more high-dimensional or ultra-high-dimensional genomic data are being generated. Therefore, effectively analyzing such data has become a significant challenge. Machine learning (ML) algorithms have been widely applied for modeling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model), which was built with Keras and TensorFlow, was developed. However, its results were only evaluated on the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluated the prediction performance of the DNNSurv model using ultra-high-dimensional and high-dimensional survival datasets and compared it with three popular ML survival prediction models (i.e., random survival forest and the Cox-based LASSO and Ridge models). For this purpose, we also present the optimal setting of several hyperparameters, including the selection of a tuning parameter. The proposed method demonstrated via data analysis that the DNNSurv model performed well overall as compared with the ML models, in terms of the three main evaluation measures (i.e., concordance index, time-dependent Brier score, and the time-dependent AUC) for survival prediction performance.


2016 ◽  
Vol 71 ◽  
pp. 76-85 ◽  
Author(s):  
Farideh Bagherzadeh-Khiabani ◽  
Azra Ramezankhani ◽  
Fereidoun Azizi ◽  
Farzad Hadaegh ◽  
Ewout W. Steyerberg ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document