Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the G statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text

An improved feature selection algorithm with conditional mutual information for classification problems

2013 International Conference on Human Computer Interactions (ICHCI) ◽

10.1109/ichci-ieee.2013.6887802 ◽

2013 ◽

Cited By ~ 1

Author(s):

Jaganathan Palanichamy ◽

Kuppuchamy Ramasamy

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems

Download Full-text

An Improved Feature Selection Algorithm Based on Parzen Window and Conditional Mutual Information

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2614 ◽

2013 ◽

Vol 347-350 ◽

pp. 2614-2619

Author(s):

Deng Chao He ◽

Wen Ning Hao ◽

Gang Chen ◽

Da Wei Jin

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Probability Density Functions ◽

Density Functions ◽

Continuous Variables ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Parzen Window ◽

Selection For

In this paper, an improved feature selection algorithm by conditional mutual information with Parzen window was proposed, which adopted conditional mutual information as an evaluation criterion of feature selection in order to overcome the deficiency of feature redundant and used Parzen window to estimate the probability density functions and calculate the conditional mutual information of continuous variables, in such a way as to achieve feature selection for continuous data.

Download Full-text

A Feature Selection Algorithm Based on Equal Interval Division and Conditional Mutual Information

Neural Processing Letters ◽

10.1007/s11063-021-10720-6 ◽

2022 ◽

Author(s):

Xiangyuan Gu ◽

Jichang Guo ◽

Tao Ming ◽

Lijun Xiao ◽

Chongyi Li

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Equal Interval

Download Full-text

An Improved Feature Selection Algorithm Based on Parzen Window and Conditional Mutual Information

Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation ◽

10.2991/isccca.2013.82 ◽

2013 ◽

Author(s):

Deng-chao He ◽

Wen-ning Hao ◽

Gang Chen ◽

Da-wei Jin

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Parzen Window

Download Full-text

Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy

Applied Intelligence ◽

10.1007/s10489-021-02412-4 ◽

2021 ◽

Author(s):

Xiangyuan Gu ◽

Jichang Guo ◽

Lijun Xiao ◽

Chongyi Li

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Conditional Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Minimal Redundancy ◽

Maximal Relevance

Download Full-text

Research and implementation of Chinese text feature selection algorithm based on χ2statistics

Computational Intelligence and Industrial Engineering ◽

10.2495/ciie140191 ◽

2014 ◽

Author(s):

Weijiang Wu ◽

Shengkai Wen ◽

Dongmei Xia ◽

Guohe Li

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Text Feature

Download Full-text

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437835 ◽

2020 ◽

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Hybrid Feature Selection Algorithm Based on Discrete Artificial Bee Colony for Parkinson Diagnosis

ACM Transactions on Internet Technology ◽

10.1145/3397161 ◽

2020 ◽

Cited By ~ 1

Author(s):

Haolun Li ◽

Chi-Man Pun ◽

Feng Xu ◽

Longsheng Pan ◽

Rui Zong ◽

...

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bee Colony

Download Full-text

High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm

Energies ◽

10.3390/en14051238 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1238

Author(s):

Supanat Chamchuen ◽

Apirat Siritaratiwat ◽

Pradit Fuangfoo ◽

Puripong Suthisopapan ◽

Pirat Khunkitti

Keyword(s):

Feature Selection ◽

Power Quality ◽

Distribution System ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Electrical Distribution ◽

Power Quality Disturbance ◽

Optimal Feature Selection ◽

Optimal Feature

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.

Download Full-text