A Hybrid PSO-SFS-SBS Algorithm in Feature Selection for Liver Cancer Data

Author(s):  
S. Gunasundari ◽  
S. Janakiraman
Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 187
Author(s):  
Rattanawadee Panthong ◽  
Anongnart Srivihok

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.


2020 ◽  
Vol 23 (65) ◽  
pp. 100-114
Author(s):  
Supoj Hengpraprohm ◽  
Suwimol Jungjit

For breast cancer data classification, we propose an ensemble filter feature selection approach named ‘EnSNR’. Entropy and SNR evaluation functions are used to find the features (genes) for the EnSNR subset. A Genetic Algorithm (GA) generates the classification ‘model’. The efficiency of the ‘model’ is validated using 10-Fold Cross-Validation re-sampling. The Microarray dataset used in our experiments contains 50,739 genes for each of 32 patients. When our proposed ‘EnSNR’ subset of features is used; as well as giving an enhanced degree of prediction accuracy and reducing the number of irrelevant features (genes), there is also a small saving of computer processing time.


2020 ◽  
Vol 108 ◽  
pp. 101928 ◽  
Author(s):  
Susanna Pozzoli ◽  
Amira Soliman ◽  
Leila Bahri ◽  
Rui Mamede Branca ◽  
Sarunas Girdzijauskas ◽  
...  

2019 ◽  
pp. 389
Author(s):  
زينب عبدالأمير ◽  
علياء كريم عبدالحسن

Sign in / Sign up

Export Citation Format

Share Document