scholarly journals A Critical Study on Stability Measures of Feature Selection with a Novel Extension of Lustgarten Index

2021 ◽  
Vol 3 (4) ◽  
pp. 771-787
Author(s):  
Rikta Sen ◽  
Ashis Kumar Mandal ◽  
Basabi Chakraborty

Stability of feature selection algorithm refers to its robustness to the perturbations of the training set, parameter settings or initialization. A stable feature selection algorithm is crucial for identifying the relevant feature subset of meaningful and interpretable features which is extremely important in the task of knowledge discovery. Though there are many stability measures reported in the literature for evaluating the stability of feature selection, none of them follows all the requisite properties of a stability measure. Among them, the Kuncheva index and its modifications, are widely used in practical problems. In this work, the merits and limitations of the Kuncheva index and its existing modifications (Lustgarten, Wald, nPOG/nPOGR, Nogueira ) are studied and analysed with respect to the requisite properties of stability measure. One more limitation of the most recent modified similarity measure, Nogueira’s measure, has been pointed out. Finally, corrections to Lustgarten’s measure have been proposed to define a new modified stability measure that satisfies the desired properties and overcomes the limitations of existing popular similarity based stability measures. The effectiveness of the newly modified Lustgarten’s measure has been evaluated with simple toy experiments.


Author(s):  
Hui Wang ◽  
Li Li Guo ◽  
Yun Lin

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.



2015 ◽  
Vol 25 (09n10) ◽  
pp. 1467-1490 ◽  
Author(s):  
Huanjing Wang ◽  
Taghi M. Khoshgoftaar ◽  
Naeem Seliya

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.



2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nor Hamizah Miswan ◽  
Chee Seng Chan ◽  
Chong Guan Ng

PurposeThis paper develops a robust hospital readmission prediction framework by combining the feature selection algorithm and machine learning (ML) classifiers. The improved feature selection is proposed by considering the uncertainty in patient's attributes that leads to the output variable.Design/methodology/approachFirst, data preprocessing is conducted which includes how raw data is managed. Second, the impactful features are selected through feature selection process. It started with calculating the relational grade of each patient towards readmission using grey relational analysis (GRA) and the grade is used as the target values for feature selection. Then, the influenced features are selected using the Least Absolute Shrinkage and Selection Operator (LASSO) method. This proposed method is termed as Grey-LASSO feature selection. The final task is the readmission prediction using ML classifiers.FindingsThe proposed method offered good performances with a minimum feature subset up to 54–65% discarded features. Multi-Layer Perceptron with Grey-LASSO gave the best performance.Research limitations/implicationsThe performance of Grey-LASSO is justified in two readmission datasets. Further research is required to examine the generalisability to other datasets.Originality/valueIn designing the feature selection algorithm, the selection on influenced input variables was based on the integration of GRA and LASSO. Specifically, GRA is a part of the grey system theory, which was employed to analyse the relation between systems under uncertain conditions. The LASSO approach was adopted due to its ability for sparse data representation.



2020 ◽  
Vol 59 (04/05) ◽  
pp. 151-161
Author(s):  
Yuchen Fei ◽  
Fengyu Zhang ◽  
Chen Zu ◽  
Mei Hong ◽  
Xingchen Peng ◽  
...  

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.



2020 ◽  
Vol 30 (11) ◽  
pp. 2050017 ◽  
Author(s):  
Jian Lian ◽  
Yunfeng Shi ◽  
Yan Zhang ◽  
Weikuan Jia ◽  
Xiaojun Fan ◽  
...  

Feature selection plays a vital role in the detection and discrimination of epileptic seizures in electroencephalogram (EEG) signals. The state-of-the-art EEG classification techniques commonly entail the extraction of the multiple features that would be fed into classifiers. For some techniques, the feature selection strategies have been used to reduce the dimensionality of the entire feature space. However, most of these approaches focus on the performance of classifiers while neglecting the association between the feature and the EEG activity itself. To enhance the inner relationship between the feature subset and the epileptic EEG task with a promising classification accuracy, we propose a machine learning-based pipeline using a novel feature selection algorithm built upon a knockoff filter. First, a number of temporal, spectral, and spatial features are extracted from the raw EEG signals. Second, the proposed feature selection algorithm is exploited to obtain the optimal subgroup of features. Afterwards, three classifiers including [Formula: see text]-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) are used. The experimental results on the Bonn dataset demonstrate that the proposed approach outperforms the state-of-the-art techniques, with accuracy as high as 99.93% for normal and interictal EEG discrimination and 98.95% for interictal and ictal EEG classification. Meanwhile, it has achieved satisfactory sensitivity (95.67% in average), specificity (98.83% in average), and accuracy (98.89% in average) over the Freiburg dataset.



2015 ◽  
Vol 26 (7) ◽  
pp. 1388-1402 ◽  
Author(s):  
Yun Li ◽  
Jennie Si ◽  
Guojing Zhou ◽  
Shasha Huang ◽  
Songcan Chen


Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 228
Author(s):  
Hongbin Wang ◽  
Pengming Wang ◽  
Shengchun Deng ◽  
Haoran Li

As the classic feature selection algorithm, the Relief algorithm has the advantages of simple computation and high efficiency, but the algorithm itself is limited to only dealing with binary classification problems, and the comprehensive distinguishing ability of the feature subsets composed of the former K features selected by the Relief algorithm is often redundant, as the algorithm cannot select the ideal feature subset. When calculating the correlation and redundancy between characteristics by mutual information, the computation speed is slow because of the high computational complexity and the method’s need to calculate the probability density function of the corresponding features. Aiming to solve the above problems, we first improve the weight of the Relief algorithm, so that it can be used to evaluate a set of candidate feature sets. Then we use the improved joint mutual information evaluation function to replace the basic mutual information computation and solve the problem of computation speed and correlation, and redundancy between features. Finally, a compound correlation feature selection algorithm based on Relief and joint mutual information is proposed using the evaluation function and the heuristic sequential forward search strategy. This algorithm can effectively select feature subsets with small redundancy and strong classification characteristics, and has the excellent characteristics of faster calculation speed.



2021 ◽  
Author(s):  
Yi Mei ◽  
Su Nguyen ◽  
Bing Xue ◽  
Mengjie Zhang

Automated design of job shop scheduling rules using genetic programming as a hyper-heuristic is an emerging topic that has become more and more popular in recent years. For evolving dispatching rules, feature selection is an important issue for deciding the terminal set of genetic programming. There can be a large number of features, whose importance/relevance varies from one to another. It has been shown that using a promising feature subset can lead to a significant improvement over using all the features. However, the existing feature selection algorithm for job shop scheduling is too slow and inapplicable in practice. In this paper, we propose the first “practical” feature selection algorithm for job shop scheduling. Our contributions are twofold. First, we develop a Niching-based search framework for extracting a diverse set of good rules. Second, we reduce the complexity of fitness evaluation by using a surrogate model. As a result, the proposed feature selection algorithm is very efficient. The experimental studies show that it takes less than 10% of the training time of the standard genetic programming training process, and can obtain much better feature subsets than the entire feature set. Furthermore, it can find better feature subsets than the best-so-far feature subset. © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.



2013 ◽  
Vol 347-350 ◽  
pp. 2712-2716
Author(s):  
Lin Tao Lü ◽  
Peng Li ◽  
Yu Xiang Yang ◽  
Fang Tan

According to the features of Palm bio-impedance spectroscopy (BIS) data, this paper suggests a kind of effective feature model of palm BIS data elliptical model. The model combines immune clone algorithm and least squares method, establishes a palm BIS feature selection algorithm, and uses the algorithm to obtain the optimal feature subset that can completely represent the palm BIS data, and then use several classification algorithms for classification and comparison. The experimental results show that accuracy of the feature subset obtained through the algorithm in SVM classification algorithm test can reach 93.2, thereby verifying the algorithm is a valid and reliable palm BIS feature selection algorithm.



Sign in / Sign up

Export Citation Format

Share Document