Attribute Selection Based on Information Gain for Automatic Grouping Student System

Analysis of EEG data is one of the most important parts of Brain Computer Interface systems because EEG data consists of a substantial amount of crucial information that can be used for better study and improvements in BCI system. One of the problems with the analysis of EEG is the large amount of data that is produced, some of which might not be useful for the analysis. Therefore identifying the relevant data from the large amount of EEG data is important for better analysis. The objective of this study is to find out the performance of Random Forest classifier on the motor movement EEG data and reducing the number of electrodes that are considered in the EEG recording and analysis so that the amount of data that is produced through EEG recording is reduced and only relevant electrodes are considered in the analysis. The dataset used in the study is Physionet motor movement/imagery data which consists of EEG recordings obtained using 64 electrodes. These 64 electrodes were ranked based on their information gain with respect to the class using Info Gain attribute selection algorithm. The electrodes were then divided into 4 lists. List 1 consists of top 18 ranked electrodes and number of electrodes was increased by 15 [in ranked order] in each subsequent list. List 2, 3 and 4 consists of top 33, 48 and 64 electrodes respectively. The accuracy of random forest classifier for each of the list was compared with the accuracy of the classifier for the List 4 which consists of all the 64 electrodes. The additional electrodes in the List 4 were rejected because the accuracy of the classifier was almost same for List 4 and List3. Through this method we were able to reduce the electrodes from 64 to 48 with an average decrease of only 0.9% in the accuracy of the classifier. This reduction in the electrode can substantially reduce the time and effort required for analysis of EEG data.

Download Full-text

Decision tree classification: Ranking journals using IGIDI

Journal of Information Science ◽

10.1177/0165551519837176 ◽

2019 ◽

Vol 46 (3) ◽

pp. 325-339

Author(s):

Muhammad Shaheen ◽

Tanveer Zafar ◽

Sajid Ali Khan

Keyword(s):

Decision Tree ◽

Gini Index ◽

Information Gain ◽

Diversity Index ◽

Selection Method ◽

Attribute Selection ◽

Data Sets ◽

Average Value ◽

Decision Tree Classification ◽

Selection Measures

Selection of an attribute for placement of the decision tree at an appropriate position (e.g. root of the tree) is an important decision. Many attribute selection measures such as Information Gain, Gini Index and Entropy have been developed for this purpose. The suitability of an attribute generally depends on the diversity of its values, relevance and dependency. Different attribute selection measures have different criteria for measuring the suitability of an attribute. Diversity Index is a classical statistical measure for determining the diversity of values, and according to our knowledge, it has never been used as an attribute selection method. In this article, we propose a novel attribute selection method for decision tree classification. In the proposed scheme, the average of Information Gain, Gini Index and Diversity Index are taken into account for assigning a weight to the attributes. The attribute with the highest average value is selected for the classification. We have empirically tested our proposed algorithm for classification of different data sets of scientific journals and conferences. We have developed a web-based application named JC-Rank that makes use of our proposed algorithm. We have also compared the results of our proposed technique with some existing decision tree classification algorithms.

Download Full-text

Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification

Applied Soft Computing ◽

10.1016/j.asoc.2012.07.029 ◽

2013 ◽

Vol 13 (1) ◽

pp. 211-221 ◽

Cited By ~ 171

Author(s):

Jianhua Dai ◽

Qing Xu

Keyword(s):

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Information Gain ◽

Tumor Classification ◽

Attribute Selection ◽

Fuzzy Rough Set ◽

Gain Ratio ◽

Information Gain Ratio

Download Full-text

Attribute Selection Using Information Gain and Naïve Bayes for Traffic Classification

Journal of Physics Conference Series ◽

10.1088/1742-6596/1196/1/012021 ◽

2019 ◽

Vol 1196 ◽

pp. 012021

Author(s):

Ahmad Fali Oklilas ◽

Tasmi ◽

Sri Desy Siswanti ◽

Mira Afrina ◽

Herri Setiawan

Keyword(s):

Naive Bayes ◽

Information Gain ◽

Naïve Bayes ◽

Attribute Selection ◽

Traffic Classification

Download Full-text

Support Vector Machine with Information Gain Based Classification for Credit Card Fraud Detection System

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/8 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Support Vector Machine ◽

Credit Card ◽

Information Gain ◽

Detection System ◽

Fraud Detection ◽

True Positive Rate ◽

Attribute Selection ◽

Support Vector ◽

Large Sample Size ◽

High True Positive Rate

In the credit card industry, fraud is one of the major issues to handle as sometimes the genuine credit card customers may get misclassified as fraudulent and vice-versa. Several detection systems have been developed but the complexity of these systems along with accuracy and precision limits its usefulness in fraud detection applications. In this paper, a new methodology Support Vector Machine with Information Gain (SVMIG) to improve the accuracy of identifying the fraudulent transactions with high true positive rate for the detection of frauds in credit card is proposed. In SVMIG, the min-max normalization is used to normalize the attributes and the feature set of the attributes are reduced by using information gain based attribute selection. Further, the Apriori algorithm is used to select the frequent attribute set and to reduce the candidate’s itemset size while detecting fraud. The experimental results suggest that the proposed algorithm achieves 94.102% higher accuracy on the standard dataset compared to the existing Bayesian and random forest based approaches for a large sample size in dealing with legal and fraudulent transactions

Download Full-text

Individual Attribute Selection Using Information Gain based Distance for Group Classification of Elderly People with Hypertension

IEEE Access ◽

10.1109/access.2021.3084623 ◽

2021 ◽

pp. 1-1

Author(s):

Supansa Chaising ◽

Punnarumol Temdee ◽

Ramjee Prasad

Keyword(s):

Elderly People ◽

Information Gain ◽

Group Classification ◽

Attribute Selection ◽

Individual Attribute

Download Full-text

A Multiattribute Measurement Algorithm for Packet Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.52-54.168 ◽

2011 ◽

Vol 52-54 ◽

pp. 168-173

Author(s):

Mao Ling Pen ◽

Ai Ming Huang

Keyword(s):

Measurement Accuracy ◽

Information Gain ◽

Packet Classification ◽

Attribute Selection ◽

Classification Rule ◽

Application Technology ◽

Matching Efficiency ◽

Measurement Algorithm ◽

Network Application ◽

Information Gain Ratio

Many network application technology need the algorithm for multi-dimensional packet classification, for example ,network security ,load balancing ,router policy, QoS etc. Considering the levels of multiattribute packet classified are excessive and traverse rule table times without number for matching classification rule, so efficiency is lower. A packet classification algorithm based on decision tree is put forward in the paper. As compared with some traditional packet classification matching algorithms, because three data are adopted including information gain, information gain ratio and Gini to solve attribute selection measurement, accuracy and matching efficiency are both advanced obviously.

Download Full-text

Performance Evaluation of Naive Bayes Classifier with and without Filter Based Feature Selection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9376.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 2154-2158

Keyword(s):

Feature Selection ◽

Business Strategy ◽

Naive Bayes ◽

Information Gain ◽

Pearson Correlation ◽

Poor Performance ◽

Naïve Bayes ◽

Customer Relationship ◽

Attribute Selection ◽

Redundant Data

Customer Relationship Ma agement tends to analyze datasets to find insights about data which in turn helps to frame the business strategy for improvement of enterprises. Analyzing data in CRM requires high intensive models. Machine Learning (ML) algorithms help in analyzing such large dimensional datasets. In most real time datasets, the strong independence assumption of Naive Bayes (NB) between the attributes are violated and due to other various drawbacks in datasets like irrelevant data, partially irrelevant data and redundant data, it leads to poor performance of prediction. Feature selection is a preprocessing method applied, to enhance the predication of the NB model. Further, empirical experiments are conducted based on NB with Feature selection and NB without feature selection. In this paper, a empirical study of attribute selection is experimented for five dissimilar filter based feature selection such as Relief-F, Pearson correlation (PCC), Symmetrical Uncertainty (SU), Gain Ratio (GR) and Information Gain (IG).

Download Full-text

Movie Success Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2484.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 5659-5663

Keyword(s):

Feature Selection ◽

Information Gain ◽

Feature Selection Method ◽

Selection Method ◽

Attribute Selection ◽

Success Prediction ◽

Release Date ◽

Predicting Success ◽

Selection For ◽

Movie Success

The film business is a billion-dollar business, and extensive measure of data identified with motion pictures is accessible over the web. In this system we are analyzing the dataset for predicting the success of the movies. For doing this the analysis of the dataset is done in which the chronicled information of every segment, for example, actor, actress, director, music that impacts the achievement or disappointment of a motion picture is given weight age and after that dependent on different parameters we are predicting whether the movie will be a flop, average or superhit. Certain algorithms are used that can help to predict whether the movies will be a flop, average, or superhit. In this model we focus on the attribute selection for predicting success of the movies. A comparative analysis is to be performed so as to find the accurate results among the algorithms used. Few parameters that are important for predicting success of a movie are gross, genres, release date, star powers of actors, actress, directors, and budget etc. In the dataset there are 28 parameters. The task is to find out most relevant parameters. This will be achieved by Feature selection method as shown in figure 1. Feature selection method is present in “sklearn” library of python. Feature selection method includes Decision trees, information gain, gain ratio. Generating heatmap to visualize success of movie in different regions. Various graphs are generated between time vs algorithms and accuracy vs algorithms for analysis.

Download Full-text