IG-C4.5:An Improved Feature Selection Method Based on Information Gain

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

A hybrid feature selection method based on genetic algorithm and information gain

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2016.8070172 ◽

2016 ◽

Cited By ~ 1

Author(s):

Fei He ◽

Huamin Yang ◽

Yu Miao ◽

Rainbow Louis

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Optimization of a Computer-Aided Detection Scheme Using a Logistic Regression Model and Information Gain Feature Selection Method

Global Journal of Breast Cancer Research ◽

10.14205/2309-4419.2013.01.01.1 ◽

2013 ◽

Author(s):

Zheng

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Regression Model ◽

Logistic Regression Model ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Computer Aided Detection ◽

Detection Scheme ◽

Computer Aided

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

A developed feature selection method for classification based on united information gain

2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) ◽

10.1109/uic-atc.2017.8397477 ◽

2017 ◽

Author(s):

Kun Niu ◽

Haizhen Jiao ◽

Zhipeng Gao ◽

Guannan Jia ◽

Guangyu Yang ◽

...

Keyword(s):

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method

Download Full-text

A new feature selection method for text categorization based on information gain and particle swarm optimization

2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems ◽

10.1109/ccis.2014.7175792 ◽

2014 ◽

Cited By ~ 5

Author(s):

Ferruh Yigit ◽

Omer Kaan Baykan

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Text Categorization ◽

Information Gain ◽

Particle Swarm ◽

Feature Selection Method ◽

Selection Method ◽

Swarm Optimization ◽

New Feature

Download Full-text

Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization

IEEE Access ◽

10.1109/access.2019.2948095 ◽

2019 ◽

Vol 7 ◽

pp. 151525-151538 ◽

Cited By ~ 5

Author(s):

Xinzheng Wang ◽

Bing Guo ◽

Yan Shen ◽

Chimin Zhou ◽

Xuliang Duan

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Input Feature

Download Full-text

Comparing PCA to information gain as a feature selection method for Influenza-A classification

2015 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) ◽

10.1109/iciibms.2015.7439550 ◽

2015 ◽

Cited By ~ 4

Author(s):

Nemin Shaltout ◽

Mohamed Moustafa ◽

Ahmed Rafea ◽

Ahmed Moustafa ◽

Mohamed ElHefnawi

Keyword(s):

Feature Selection ◽

Influenza A ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method

Download Full-text

Enhanced Classification Method for Phishing Emails Detection

Journal of Information Security and Cybercrimes Research ◽

10.26735/ygmy6142 ◽

2020 ◽

Vol 3 (1) ◽

pp. 58-63

Author(s):

Y. Mansour Mansour ◽

Majed A. Alenizi

Keyword(s):

Feature Selection ◽

Information Gain ◽

Hybrid Approach ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Classification Model ◽

Selection Methods ◽

Accuracy Rate ◽

Communication Method

Emails are currently the main communication method worldwide as it proven in its efficiency. Phishing emails in the other hand is one of the major threats which results in significant losses, estimated at billions of dollars. Phishing emails is a more dynamic problem, a struggle between the phishers and defenders where the phishers have more flexibility in manipulating the emails features and evading the anti-phishing techniques. Many solutions have been proposed to mitigate the phishing emails impact on the targeted sectors, but none have achieved 100% detection and accuracy. As phishing techniques are evolving, the solutions need to be evolved and generalized in order to mitigate as much as possible. This article presents a new emergent classification model based on hybrid feature selection method that combines two common feature selection methods, Information Gain and Genetic Algorithm that keep only significant and high-quality features in the final classifier. The Proposed hybrid approach achieved 98.9% accuracy rate against phishing emails dataset comprising 8266 instances and results depict enhancement by almost 4%. Furthermore, the presented technique has contributed to reducing the search space by reducing the number of selected features.

Download Full-text