Feature selection based on empirical-risk function to detect lesions in vascular computed tomography

IRBM ◽  
2014 ◽  
Vol 35 (5) ◽  
pp. 244-254 ◽  
Author(s):  
M.A. Zuluaga ◽  
M. Hernández Hoyos ◽  
M. Orkisz
2020 ◽  
Vol 10 (5) ◽  
pp. 1033-1039
Author(s):  
Huihong Duan ◽  
Xu Wang ◽  
Xingyi He ◽  
Yonggang He ◽  
Litao Song ◽  
...  

Background: In the pulmonary nodules computer aided diagnosis systems (CAD), feature selection plays an important role in reducing the false positive rate and improving the system accuracy. To solve the problem of feature selection techniques by which the diversity of features was damaged in the process of distinguishing malignant pulmonary nodules from benign pulmonary nodules, this study developed a novel feature selection algorithm for improving the accuracy of traditional computer-aided differential diagnosis for benign and malignant classification of pulmonary nodules. Method: Firstly, we divided the extracted features of nodules into several groups by using Gaussian mixture model (GMM). Secondly, we applied Relief and sequential forward selection (SFS) algorithm to find local optimum features dataset for each group. Afterwards, we used the optimumpath forest (OPF) classifier with the found features dataset to obtain the classification results. Finally, the local optimum features dataset with the highest area under curve AUC in all groups were added into the final selected set. Results: According to collected pulmonary nodules on computed tomography (CT) scans, tested with two set of samples, we achieved an average accuracy of 89.5%, sensitivity of 87.1% and specificity of 90.9% on the first set of samples, and 90.1%, 88.7% and 92.1% on the second set of samples. The areas under the receiver operating characteristic (ROC) curves based on these two sample sets were 95.2%, and 96.3% respectively. Conclusions: This study shows that the proposed method was promising for improving the pulmonary nodules computer aided diagnosis systems performance of benign and malignant pulmonary nodules.


2021 ◽  
Vol 14 (3) ◽  
Author(s):  
Marjan Firouznia ◽  
Albert K. Feeny ◽  
Michael A. LaBarbera ◽  
Meghan McHale ◽  
Catherine Cantlay ◽  
...  

Background: We hypothesized that computerized morphological analysis of the left atrium (LA) and pulmonary veins (PVs) via fractal measurements of shape and texture features of the LA myocardial wall could predict atrial fibrillation (AF) recurrence after ablation. Methods: Preablation contrast computed tomography scans were collected for 203 patients who underwent AF ablation. The LA body, PVs, and myocardial wall were segmented using a semi-automated region growing method. Twenty-eight fractal-based shape and texture-based features were extracted from resulting segments. The top features most associated with postablation recurrence were identified using feature selection and subsequently evaluated with a Random Forest classifier. Feature selection and classifier construction were performed on a discovery cohort (D 1 ) of 137 patients; classifiers were subsequently validated on an independent set (D 2 ) of 66 patients. Dedicated classifiers to capture the fractal and morphological properties of LA body (C LA ), PVs (C PV ), and LA myocardial (C LAM ) tissue were constructed, as well as a model (C All ) capturing properties of all segmented compartments. Fractal-based models were also compared against a model employing machine estimation of LA volume. To assess the effect of clinical parameters, such as AF type and catheter technique, a clinical model (C clin ) was also compared against C All . Results: Statistically significant differences were observed for fractal features of C LA , C LAM , and C All in distinguishing AF recurrence ( P <0.001) on D 1 . Using the 5 top features, C All had the best prediction performance (area under the receiver operating characteristic curve [AUROC], 0.81 [95% CI, 0.78–0.85]), followed by C PV (AUROC, 0.78 [95% CI, 0.74–0.80]), and C LA (AUROC, 0.70 [95% CI, 0.63–0.78]) on D 2 . The clinical parameter model C clin yielded an AUROC, 0.70 (95% CI, 0.65–0.77), while the atrial volume model yielded an AUROC, 0.59. Combining C All and C clin on D 2 improved the AUROC to 0.87 (95% CI, 0.82–0.93). Conclusions: Fractal measurements of the LA, PVs, and atrial myocardium on computed tomography scans were associated with likelihood of postablation AF recurrence.


2019 ◽  
Vol 109 (5) ◽  
pp. 1729-1737
Author(s):  
Yuanyuan Fang ◽  
Ying Zhou ◽  
Zhenxing Yao

Abstract In geophysical applications, solutions to ill‐posed inverse problems Ax=b are often obtained by analyzing the trade‐off between data residue ‖Ax−b‖2 and model norm ‖x‖2. In this study, we show that the traditional L‐curve analysis does not lead to solutions closest to the true models because the maximum curvature (or the corner of the L‐curve) depends on the relative scaling between data residue and model norm. A Bayes approach based on empirical risk function minimization using training datasets may be designed to find a statistically optimal solution, but its success depends on the true realization of the model. To overcome this limitation, we construct training models using eigenvectors of matrix ATA as well as spectral coefficients calculated from the correlation between observations and eigenvector projected data. This approach accounts for data noise level but does not require it as a priori knowledge. Using global tomography as an example, we show that the solutions are closest to true models.


2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Wei Li ◽  
Yangyong Cao ◽  
Kun Yu ◽  
Yibo Cai ◽  
Feng Huang ◽  
...  

Abstract Background The COVID-19 disease is putting unprecedented pressure on the global healthcare system. The CT (computed tomography) examination as a auxiliary confirmed diagnostic method can help clinicians quickly detect lesions locations of COVID-19 once screening by PCR test. Furthermore, the lesion subtypes classification plays a critical role in the consequent treatment decision. Identifying the subtypes of lesions accurately can help doctors discover changes in lesions in time and better assess the severity of COVID-19. Method The most four typical lesion subtypes of COVID-19 are discussed in this paper, which are GGO (ground-glass opacity), cord, solid and subsolid. A computer-aided diagnosis approach of lesion subtype is proposed in this paper. The radiomics data of lesions are segmented from COVID-19 patients CT images with diagnosis and lesions annotations by radiologists. Then the three-dimensional texture descriptors are applied on the volume data of lesions as well as shape and first-order features. The massive feature data are selected by HAFS (hybrid adaptive feature selection) algorithm and a classification model is trained at the same time. The classifier is used to predict lesion subtypes as side decision information for radiologists. Results There are 3734 lesions extracted from the dataset with 319 patients collection and then 189 radiomics features are obtained finally. The random forest classifier is trained with data augmentation that the number of different subtypes of lesions is imbalanced in initial dataset. The experimental results show that the accuracy of the four subtypes of lesions is (93.06%, 96.84%, 99.58%, and 94.30%), the recall is (95.52%, 91.58%, 95.80% and 80.75%) and the f-score is (93.84%, 92.37%, 95.47%, and 84.42%). Conclusion The three-dimensional radiomics features used in this paper can better express the high-level information of COVID-19 lesions in CT slices. HAFS method aggregates the results of multiple feature selection algorithms intersects with traditional methods to filter out redundant features more accurately. After selection, the subtype of COVID-19 lesion can be judged by inputting the features into the RF (random forest) model, which can help clinicians more accurately identify the subtypes of COVID-19 lesions and provide help for further research.


2010 ◽  
Vol 26 (5) ◽  
pp. 1437-1452 ◽  
Author(s):  
Wenxin Jiang ◽  
Martin A. Tanner

This paper considers the problem of predicting binary choices by selecting from a possibly large set of candidate explanatory variables, which can include both exogenous variables and lagged dependent variables. We consider risk minimization with the risk function being the predictive classification error. We study the convergence rates of empirical risk minimization in both the frequentist and Bayesian approaches. The Bayesian treatment uses a Gibbs posterior constructed directly from the empirical risk instead of using the usual likelihood-based posterior. Therefore these approaches do not require a correctly specified probability model. We show that the proposed methods have near optimal performance relative to a class of linear classification rules with selected variables. Such results in classification are obtained in a framework of dependent data with strong mixing.


2020 ◽  
Vol 25 (3) ◽  
Author(s):  
Andrius Čiginas

Small area estimation techniques are used in sample surveys, where direct estimates for small domains are not reliable due to small sample sizes in the domains. We estimate the domain means by generalized linear compositions of the weighted sample means and the synthetic estimators that are obtained from the regression-synthetic model of fixed effects, based on the domain level auxiliary information. In the proposed method, the number of parameters of optimal compositions is reduced to a single unknown parameter, which is further evaluated by minimizing an empirical risk function. We apply various composite and related estimators to estimate proportions of the unemployed in a simulation study, based on the Lithuanian Labor Force Survey data. Conclusions on advantages and disadvantages of the proposed compositions are obtained from this empirical comparison. 


Sign in / Sign up

Export Citation Format

Share Document