Feature Selection on Supervised Classification Using Wilks Lambda Statistic

Author(s):  
A. el Ouardighi ◽  
A. el Akadi ◽  
D. Aboutajdine
Author(s):  
Jian-Wu Xu ◽  
Kenji Suzuki

One of the major challenges in current Computer-Aided Detection (CADe) of polyps in CT Colonography (CTC) is to improve the specificity without sacrificing the sensitivity. If a large number of False Positive (FP) detections of polyps are produced by the scheme, radiologists might lose their confidence in the use of CADe. In this chapter, the authors used a nonlinear regression model operating on image voxels and a nonlinear classification model with extracted image features based on Support Vector Machines (SVMs). They investigated the feasibility of a Support Vector Regression (SVR) in the massive-training framework, and the authors developed a Massive-Training SVR (MTSVR) in order to reduce the long training time associated with the Massive-Training Artificial Neural Network (MTANN) for reduction of FPs in CADe of polyps in CTC. In addition, the authors proposed a feature selection method directly coupled with an SVM classifier to maximize the CADe system performance. They compared the proposed feature selection method with the conventional stepwise feature selection based on Wilks’ lambda with a linear discriminant analysis classifier. The FP reduction system based on the proposed feature selection method was able to achieve a 96.0% by-polyp sensitivity with an FP rate of 4.1 per patient. The performance is better than that of the stepwise feature selection based on Wilks’ lambda (which yielded the same sensitivity with 18.0 FPs/patient). To test the performance of the proposed MTSVR, the authors compared it with the original MTANN in the distinction between actual polyps and various types of FPs in terms of the training time reduction and FP reduction performance. The authors’ CTC database consisted of 240 CTC datasets obtained from 120 patients in the supine and prone positions. With MTSVR, they reduced the training time by a factor of 190, while achieving a performance (by-polyp sensitivity of 94.7% with 2.5 FPs/patient) comparable to that of the original MTANN (which has the same sensitivity with 2.6 FPs/patient).


Author(s):  
ALEXSEY LIAS-RODRÍGUEZ ◽  
GUILLERMO SANCHEZ-DIAZ

Typical testors are useful tools for feature selection and for determining feature relevance in supervised classification problems. Nowadays, computing all typical testors of a training matrix is very expensive; all reported algorithms have exponential complexity depending on the number of columns in the matrix. In this paper, we introduce the faster algorithm BR (Boolean Recursive), called fast-BR algorithm, that is based on elimination of gaps and reduction of columns. Fast-BR algorithm is designed to generate all typical testors from a training matrix, requiring a reduced number of operations. Experimental results using this fast implementation and the comparison with other state-of-the-art related algorithms that generate typical testors are presented.


2008 ◽  
Vol 41 (12) ◽  
pp. 3706-3719 ◽  
Author(s):  
Wing W.Y. Ng ◽  
Daniel S. Yeung ◽  
Michael Firth ◽  
Eric C.C. Tsang ◽  
Xi-Zhao Wang

2007 ◽  
Vol 24 (1) ◽  
pp. 110-117 ◽  
Author(s):  
M. Draminski ◽  
A. Rada-Iglesias ◽  
S. Enroth ◽  
C. Wadelius ◽  
J. Koronacki ◽  
...  

2021 ◽  
Vol 71 ◽  
pp. 11-20
Author(s):  
Michel Barlaud ◽  
Marc Antonini

This paper deals with supervised classification and feature selection with application in the context of high dimensional features. A classical approach leads to an optimization problem minimizing the within sum of squares in the clusters (I2 norm) with an I1 penalty in order to promote sparsity. It has been known for decades that I1 norm is more robust than I2 norm to outliers. In this paper, we deal with this issue using a new proximal splitting method for the minimization of a criterion using I2 norm both for the constraint and the loss function. Since the I1 criterion is only convex and not gradient Lipschitz, we advocate the use of a Douglas-Rachford minimization solution. We take advantage of the particular form of the cost and, using a change of variable, we provide a new efficient tailored primal Douglas-Rachford splitting algorithm which is very effective on high dimensional dataset. We also provide an efficient classifier in the projected space based on medoid modeling. Experiments on two biological datasets and a computer vision dataset show that our method significantly improves the results compared to those obtained using a quadratic loss function.


Author(s):  
Jianwu Xu ◽  
Amin Zarshenas ◽  
Yisong Chen ◽  
Kenji Suzuki

A major challenge in the latest computer-aided detection (CADe) of polyps in CT colonography (CTC) is to improve the false positive (FP) rate while maintaining detection sensitivity. Radiologists prefer CADe system produce small number of false positive detections, otherwise they might not consider CADe system improve their workflow. Towards this end, in this study, we applied a nonlinear regression model operating on CTC image voxels directly and a nonlinear classification model with extracted image features based on support vector machines (SVMs) in order to improve the specificity of CADe of polyps. We investigated the feasibility of a support vector regression (SVR) in the massive-training framework, and we developed a massive-training SVR (MTSVR) in order to reduce the long training time associated with the massive-training artificial neural network (MTANN) for reduction of FPs in CADe of polyps in CTC. In addition, we proposed a feature selection method directly coupled with an SVM classifier to maximize the CADe system performance. We compared the proposed feature selection method with the conventional stepwise feature selection based on Wilks' lambda with a linear discriminant analysis classifier. The FP reduction system based on the proposed feature selection method was able to achieve a 96.0% by-polyp sensitivity with an FP rate of 4.1 per patient. The performance is better than that of the stepwise feature selection based on Wilks' lambda (which yielded the same sensitivity with 18.0 FPs/patient). To test the performance of the proposed MTSVR, we compared it with the original MTANN in the distinction between actual polyps and various types of FPs in terms of the training time reduction and FP reduction performance. The CTC database used in this study consisted of 240 CTC datasets obtained from 120 patients in the supine and prone positions. With MTSVR, we reduced the training time by a factor of 190, while achieving a performance (by-polyp sensitivity of 94.7% with 2.5 FPs/patient) comparable to that of the original MTANN (which has the same sensitivity with 2.6 FPs/patient).


2018 ◽  
Vol 10 (8) ◽  
pp. 1222 ◽  
Author(s):  
Yanjun Wang ◽  
Qi Chen ◽  
Lin Liu ◽  
Xiong Li ◽  
Arun Kumar Sangaiah ◽  
...  

Power lines classification is important for electric power management and geographical objects extraction using LiDAR (light detection and ranging) point cloud data. Many supervised classification approaches have been introduced for the extraction of features such as ground, trees, and buildings, and several studies have been conducted to evaluate the framework and performance of such supervised classification methods in power lines applications. However, these studies did not systematically investigate all of the relevant factors affecting the classification results, including the segmentation scale, feature selection, classifier variety, and scene complexity. In this study, we examined these factors systematically using airborne laser scanning and mobile laser scanning point cloud data. Our results indicated that random forest and neural network were highly suitable for power lines classification in forest, suburban, and urban areas in terms of the precision, recall, and quality rates of the classification results. In contrast to some previous studies, random forest yielded the best results, while Naïve Bayes was the worst classifier in most cases. Random forest was the more robust classifier with or without feature selection for various LiDAR point cloud data. Furthermore, the classification accuracies were directly related to the selection of the local neighborhood, classifier, and feature set. Finally, it was suggested that random forest should be considered in most cases for power line classification.


Sign in / Sign up

Export Citation Format

Share Document