scholarly journals Analysis of Feature Weighting Methods Based on Feature Ranking Methods for Classification

Author(s):  
Norbert Jankowski ◽  
Krzysztof Usowicz
2019 ◽  
Vol 21 (2) ◽  
pp. 421-428 ◽  
Author(s):  
Alex A Freitas

Abstract An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning–based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area has, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.


2013 ◽  
Vol 22 (03) ◽  
pp. 1350010 ◽  
Author(s):  
SABEREH SADEGHI ◽  
HAMID BEIGY

Dimensionality reduction is a necessary task in data mining when working with high dimensional data. A type of dimensionality reduction is feature selection. Feature selection based on feature ranking has received much attention by researchers. The major reasons are its scalability, ease of use, and fast computation. Feature ranking methods can be divided into different categories and may use different measures for ranking features. Recently, ensemble methods have entered in the field of ranking and achieved more accuracy among others. Accordingly, in this paper a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results of the base ranking methods are then fused into a final feature subset by means of genetic algorithm. The diversity of the base methods improves the quality of initial population of the genetic algorithm and thus reducing the convergence time of the genetic algorithm. In most of ranking methods, it's the user's task to determine the threshold for choosing the appropriate subset of features. It is a problem, which may cause the user to try many different values to select a good one. In the proposed algorithm, the difficulty of determining a proper threshold by the user is decreased. The performance of the algorithm is evaluated on four different text datasets and the experimental results show that the proposed method outperforms all other five feature ranking methods used for comparison. One advantage of the proposed method is that it is independent to the classification method used for classification.


Author(s):  
Wilker Altidor ◽  
Taghi M. Khoshgoftaar ◽  
Jason Van Hulse ◽  
Amri Napolitano

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Azra Shamim ◽  
Vimala Balakrishnan ◽  
Muhammad Tahir ◽  
Muhammad Shiraz

The increasing use and ubiquity of the Internet facilitate dissemination of word-of-mouth through blogs, online forums, newsgroups, and consumer’s reviews. Online consumer’s reviews present tremendous opportunities and challenges for consumers and marketers. One of the challenges is to develop interactive marketing practices for making connections with target consumers that capitalize consumer-to-consumer communications for generating product adoption. Opinion mining is employed in marketing to help consumers and enterprises in the analysis of online consumers’ reviews by highlighting the strengths and weaknesses of the products. This paper describes an opinion mining system based on novel review and feature ranking methods to empower consumers and enterprises for identifying critical product features from enormous consumers’ reviews. Consumers and business analysts are the main target group for the proposed system who want to explore consumers’ feedback for determining purchase decisions and enterprise strategies. We evaluate the proposed system on real dataset. Results show that integration of review and feature-ranking methods improves the decision making processes significantly.


Sign in / Sign up

Export Citation Format

Share Document