scholarly journals The Efficiency of Aggregation Methods in Ensemble Filter Feature Selection Models

2021 ◽  
Vol 9 (4) ◽  
pp. 39-51
Author(s):  
Noureldien Noureldien ◽  
Saffa Mohmoud

Ensemble feature selection is recommended as it proves to produce a more stable subset of features and a better classification accuracy when compared to the individual feature selection methods. In this approach, the output of feature selection methods, called base selectors, are combined using some aggregation methods. For filter feature selection methods, a list aggregation method is needed to aggregate output ranked lists into a single list, and since many list aggregation methods have been proposed the decision on which method to use to build the optimum ensemble model is a de facto question.       In this paper, we investigate the efficiency of four aggregation methods, namely; Min, Median, Arithmetic Mean, and Geometric Mean. The performance of aggregation methods is evaluated using five datasets from different scientific fields with a variant number of instances and features. Besides, the classifies used in the evaluation are selected from three different classes, Trees, Rules, and Bayes.       The experimental results show that 11 out of the 15 best performance results are corresponding to ensemble models. And out of the 11 best performance ensemble models, the most efficient aggregation methods are Median (5/11), followed by Arithmetic Mean (3/11) and Min (3/11). Also, results show that as the number of features increased, the efficient aggregation method changes from Min to Median to Arithmetic Mean. This may suggest that for a very high number of features the efficient aggregation method is the Arithmetic Mean. And generally, there is no aggregation method that is the best for all cases.

Author(s):  
William H Black ◽  
Lari B Masten

There is ongoing controversy in the business valuation literature regarding the preferability of the arithmetic mean or the harmonic mean when estimating ratios for use in business valuation. This research conducts a simulation using data reported from actual market transactions. Successive random samples were taken from data on valuation multiples and alternative measures of central tendency were calculated, accumulating more than 3.7 million data points. The measures (arithmetic mean, median, harmonic mean, geometric mean) were compared using hold-out sampling to identify which measure provided the closest approximation to actual results, evaluated in terms of least squares differences. Results indicated the harmonic mean delivered superior predictions to the other measures of central tendency, with less overstatement. Further, differences in sample size from 5 to 50 observations were evaluated to assess their impact on predictive performance. Results showed substantial improvements up to sample sizes of 20 or 25, with diminished improvements thereafter.


2013 ◽  
Vol 753-755 ◽  
pp. 2806-2815
Author(s):  
Jun Ling Zhang ◽  
Jian Wu

Preference relations are the most common techniques to express decision makers preference information over alternatives or criteria. This paper focus on investigating effective operators for multiple attribute group decision making with intuitionistic fuzzy preference relations. Firstly, we extend arithmetic mean method operator and geometric mean method operator for accommodating intuitionistic fuzzy information to present the intuitionistic arithmetic mean method (IAMM) operator and the intuitionistic geometric mean method (IGMM) operator. Then the compatibility properties of intuitionistic preference relations obtained by IAMM and IGMM are analyzed, we found that aggregation of individual judgments and aggregation of individual priorities provide the same priorities of alternatives, and that if all the individual decision makers have acceptable consensus degree, then the collective preference relations obtained also are of acceptable consensus degree. Finally, the results are verified by an illustrative example carried out in the background of parts supplier selection.


Mathematics ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 110 ◽  
Author(s):  
Abhijeet R Patil ◽  
Sangjin Kim

In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers’ cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean.


Entropy ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. 613
Author(s):  
Yu Zhou ◽  
Junhao Kang ◽  
Xiao Zhang

Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points and those without cut-points, which results in information loss. In this paper, we propose a cooperative coevolutionary algorithm based on the genetic algorithm (GA) and particle swarm optimization (PSO), which searches for the feature subsets with and without entropy-based cut-points simultaneously. For the features with cut-points, a ranking mechanism is used to control the probability of mutation and crossover in GA. In addition, a binary-coded PSO is applied to update the indices of the selected features without cut-points. Experimental results on 10 real datasets verify the effectiveness of our algorithm in classification accuracy compared with several state-of-the-art competitors.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 200
Author(s):  
Reem Salman ◽  
Ayman Alzaatreh ◽  
Hana Sulieman ◽  
Shaimaa Faisal

In the past decade, big data has become increasingly prevalent in a large number of applications. As a result, datasets suffering from noise and redundancy issues have necessitated the use of feature selection across multiple domains. However, a common concern in feature selection is that different approaches can give very different results when applied to similar datasets. Aggregating the results of different selection methods helps to resolve this concern and control the diversity of selected feature subsets. In this work, we implemented a general framework for the ensemble of multiple feature selection methods. Based on diversified datasets generated from the original set of observations, we aggregated the importance scores generated by multiple feature selection techniques using two methods: the Within Aggregation Method (WAM), which refers to aggregating importance scores within a single feature selection; and the Between Aggregation Method (BAM), which refers to aggregating importance scores between multiple feature selection methods. We applied the proposed framework on 13 real datasets with diverse performances and characteristics. The experimental evaluation showed that WAM provides an effective tool for determining the best feature selection method for a given dataset. WAM has also shown greater stability than BAM in terms of identifying important features. The computational demands of the two methods appeared to be comparable. The results of this work suggest that by applying both WAM and BAM, practitioners can gain a deeper understanding of the feature selection process.


Author(s):  
Mehmet Pinar

AbstractComposite well-being and sustainability indices are usually obtained as arithmetic and geometric means of sub-dimensions. However, the arithmetic mean does not consider potential interactions across the dimensions of the indices and the geometric mean does not penalize unbalanced achievements across dimensions strongly enough. This paper uses a flexible non-additive aggregation model—the Choquet integral—to account for potential synergies and redundancies of the dimensions that are used to obtain indices, and uses the Human development index (HDI) as an example to illustrate the flexibility of the aggregation procedure. This paper relies on multiple theoretical and empirical studies, which indicate mutually strengthening relationships (positive interactions) among the three HDI dimensions. To illustrate and show-case how positive interactions among the three HDI dimensions could be taken into account, this paper uses five hypothetical weight sets and simulates 500 weight sets that allow varying positive interactions among the three dimensions. The analyses with the HDI data suggest that both geometric and arithmetic mean HDI scores are roughly the same for most countries, even when variations across the three dimensions are relatively large. On the other hand, countries with balanced (unbalanced) achievements across dimensions rank in higher (lower) positions with the Choquet integral aggregation. The illustrations of this paper show-case how Choquet integral is a flexible aggregation method to take into account varying positive interactions across the HDI dimensions and able to detect unbalanced achievements.


2012 ◽  
Vol E95-B (2) ◽  
pp. 647-650
Author(s):  
Ning WANG ◽  
Julian CHENG ◽  
Chintha TELLAMBURA

Author(s):  
Fatemeh Alighardashi ◽  
Mohammad Ali Zare Chahooki

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.


Sign in / Sign up

Export Citation Format

Share Document