Improved Feature Selection Based on Mutual Information for Regression Tasks

Mutual Information (MI) is an information theory concept often used in the recent time as a criterion for feature selection methods. This is due to its ability to capture both linear and non-linear dependency relationships between two variables. In theory, mutual information is formulated based on probability density functions (pdfs) or entropies of the two variables. In most machine learning applications, mutual information estimation is formulated for classification problems (that is data with labeled output). This study investigates the use of mutual information estimation as a feature selection criterion for regression tasks and introduces enhancement in selecting optimal feature subset based on previous works. Specifically, while focusing on regression tasks, it builds on the previous work in which a scientifically sound stopping criteria for feature selection greedy algorithms was proposed. Four real-world regression datasets were used in this study, three of the datasets are public obtained from UCI machine learning repository and the remaining one is a private well log dataset. Two Machine learning models namely multiple regression and artificial neural networks (ANN) were used to test the performance of IFSMIR. The results obtained has proved the effectiveness of the proposed method.

Download Full-text

A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/5 ◽

2020 ◽

Vol 17 (5) ◽

pp. 721-730

Author(s):

Kamal Bashir ◽

Tianrui Li ◽

Mahama Yahaya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Maximum Likelihood ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect ◽

Optimal Feature Subset ◽

Optimal Feature

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process

Download Full-text

PREDAIP: Computational Prediction and Analysis for Anti-inflammatory Peptide via a Hybrid Feature Selection Technique

Current Bioinformatics ◽

10.2174/1574893616666210601111157 ◽

2021 ◽

Vol 16 ◽

Author(s):

Dan Lin ◽

Jialin Yu ◽

Ju Zhang ◽

Huan He ◽

Xinyun Guo ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Learning Algorithms ◽

Selection Strategy ◽

Feature Subset ◽

Feature Selection Technique ◽

Selection Technique ◽

Anti Inflammatory ◽

Optimal Feature Subset ◽

Optimal Feature

Background: Anti-inflammatory peptides (AIPs) are potent therapeutic agents for inflammatory and autoimmune disorders due to their high specificity and minimal toxicity under normal conditions. Therefore, it is greatly significant and beneficial to identify AIPs for further discovering novel and efficient AIPs-based therapeutics. Recently, three computational approaches, which can effectively identify potential AIPs, have been developed based on machine learning algorithms. However, there are several challenges with the existing three predictors. Objective: A novel machine learning algorithm needs to be proposed to improve the AIPs prediction accuracy. Methods: This study attempts to improve the recognition of AIPs by employing multiple primary sequence-based feature descriptors and an efficient feature selection strategy. By sorting features through four enhanced minimal redundancy maximal relevance (emRMR) methods, and then attaching seven different classifiers wrapper methods based on the sequential forward selection algorithm (SFS), we proposed a hybrid feature selection technique emRMR-SFS to optimize feature vectors. Furthermore, by evaluating seven classifiers trained with the optimal feature subset, we developed the extremely randomized tree (ERT) based predictor named PREDAIP for identifying AIPs. Results: We systematically compared the performance of PREDAIP with the existing tools on an independent test dataset. It demonstrates the effectiveness and power of the PREDAIP. The correlation criteria used in emRMR would affect the selection results of the optimal feature subset at the SFS-wrapper stage, which justifies the necessity for considering different correlation criteria in emRMR. Conclusion: We expect that PREDAIP will be useful for the high-throughput prediction of AIPs and the development of AIPs therapeutics.

Download Full-text

Mr2DNM: A Novel Mutual Information-Based Dendritic Neuron Model

Computational Intelligence and Neuroscience ◽

10.1155/2019/7362931 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Qian ◽

Yirui Wang ◽

Shuyang Cao ◽

Yuki Todo ◽

Shangce Gao

Keyword(s):

Mutual Information ◽

Neuron Model ◽

Learning Rule ◽

Feature Subset ◽

Classification Problems ◽

Feature Selection Technique ◽

Plasticity Mechanism ◽

Optimal Feature Subset ◽

The Given ◽

Optimal Feature

By employing a neuron plasticity mechanism, the original dendritic neuron model (DNM) has been succeeded in the classification tasks with not only an encouraging accuracy but also a simple learning rule. However, the data collected in real world contain a lot of redundancy, which causes the process of analyzing data by DNM become complicated and time-consuming. This paper proposes a reliable hybrid model which combines a maximum relevance minimum redundancy (Mr2) feature selection technique with DNM (namely, Mr2DNM) for classifying the practical classification problems. The mutual information-based Mr2 is applied to evaluate and rank the most informative and discriminative features for the given dataset. The obtained optimal feature subset is used to train and test the DNM for classifying five different problems arisen from medical, physical, and social scenarios. Experimental results suggest that the proposed Mr2DNM outperforms DNM and other six classification algorithms in terms of accuracy and computational efficiency.

Download Full-text

Modulation Recognition of Digital Multimedia Signal Based on Data Feature Selection

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2017070107 ◽

2017 ◽

Vol 8 (3) ◽

pp. 90-111 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Li Li Guo ◽

Yun Lin

Keyword(s):

Feature Selection ◽

Information Entropy ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Modulation Recognition ◽

Signal Modulation ◽

Digital Multimedia ◽

Optimal Feature Subset ◽

Optimal Feature

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.

Download Full-text

Intrusion Detection System using SMIFS and Multi class Multi layer Perceptron

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8982.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 2622-2628

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Mutual Information ◽

New Technologies ◽

Detection System ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Feature Subset ◽

Classification Problems ◽

Data Set

As the new technologies are emerging, data is getting generated in larger volumes high dimensions. The high dimensionality of data may rise to great challenge while classification. The presence of redundant features and noisy data degrades the performance of the model. So, it is necessary to extract the relevant features from given data set. Feature extraction is an important step in many machine learning algorithms. Many researchers have been attempted to extract the features. Among these different feature extraction methods, mutual information is widely used feature selection method because of its good quality of quantifying dependency among the features in classification problems. To cope with this issue, in this paper we proposed simplified mutual information based feature selection with less computational overhead. The selected feature subset is experimented with multilayered perceptron on KDD CUP 99 data set with 2- class classification, 5-class classification and 4-class classification. The accuracy is of these models almost similar with less number of features.

Download Full-text

Optimal Feature Subset Selection for Imbalanced Class Data using SMOTE and Binary ALO Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c4734.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 344-349

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Classification Performance ◽

Selection Model ◽

Feature Subset Selection ◽

Feature Subset ◽

Spatial Features ◽

Imbalanced Classes ◽

Optimal Feature Subset ◽

Optimal Feature

Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.

Download Full-text

Accelerated Simulated Annealing and Mutation Operator Feature Selection method for Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1712.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 910-916

Keyword(s):

Feature Selection ◽

Simulated Annealing ◽

Feature Selection Method ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Mutation Operator ◽

Knn Classifier ◽

Optimal Feature Subset ◽

Optimal Feature

The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.

Download Full-text

Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection

Journal of Intelligent Systems ◽

10.1515/jisys-2019-0062 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1598-1610 ◽

Cited By ~ 5

Author(s):

Manosij Ghosh ◽

Ritam Guha ◽

Imran Alam ◽

Priyank Lohariwal ◽

Devesh Jalan ◽

...

Keyword(s):

Feature Selection ◽

Combination Method ◽

Sufficient Information ◽

Feature Subset ◽

Final Solution ◽

Swarm Optimization ◽

Intermediate Solution ◽

Optimal Feature Subset ◽

Optimal Feature ◽

Weighted Combination

Abstract Feature selection (FS) is a technique which helps to find the most optimal feature subset to develop an efficient pattern recognition model under consideration. The use of genetic algorithm (GA) and particle swarm optimization (PSO) in the field of FS is profound. In this paper, we propose an insightful way to perform FS by amassing information from the candidate solutions produced by GA and PSO. Our aim is to combine the exploitation ability of GA with the exploration capacity of PSO. We name this new model as binary genetic swarm optimization (BGSO). The proposed method initially lets GA and PSO to run independently. To extract sufficient information from the feature subsets obtained by those, BGSO combines their results by an algorithm called average weighted combination method to produce an intermediate solution. Thereafter, a local search called sequential one-point flipping is applied to refine the intermediate solution further in order to generate the final solution. BGSO is applied on 20 popular UCI datasets. The results were obtained by two classifiers, namely, k nearest neighbors (KNN) and multi-layer perceptron (MLP). The overall results and comparisons show that the proposed method outperforms the constituent algorithms in 16 and 14 datasets using KNN and MLP, respectively, whereas among the constituent algorithms, GA is able to achieve the best classification accuracy for 2 and 7 datasets and PSO achieves best accuracy for 2 and 4 datasets, respectively, for the same set of classifiers. This proves the applicability and usefulness of the method in the domain of FS.

Download Full-text

Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours

Energies ◽

10.3390/en11071899 ◽

2018 ◽

Vol 11 (7) ◽

pp. 1899 ◽

Cited By ~ 3

Author(s):

Lin Lin ◽

Lin Xue ◽

Zhiqiang Hu ◽

Nantian Huang

Keyword(s):

Feature Selection ◽

New England ◽

Load Forecasting ◽

Feature Subset ◽

Single Model ◽

Modular Model ◽

Selection Step ◽

Selection For ◽

Optimal Feature Subset ◽

Optimal Feature

To improve the accuracy of the day-ahead load forecasting predictions of a single model, a novel modular parallel forecasting model with feature selection was proposed. First, load features were extracted from a historic load with a horizon from the previous 24 h to the previous 168 h considering the calendar feature. Second, a feature selection combined with a predictor process was carried out to select the optimal feature for building a reliable predictor with respect to each hour. The final modular model consisted of 24 predictors with a respective optimal feature subset for day-ahead load forecasting. New England and Singapore load data were used to evaluate the effectiveness of the proposed method. The results indicated that the accuracy of the proposed modular model was higher than that of the traditional method. Furthermore, conducting a feature selection step when building a predictor improved the accuracy of load forecasting.

Download Full-text

AN OPTIMAL FEATURE SUBSET SELECTION METHOD BASED ON DISTANCE DISCRIMINANT AND DISTRIBUTION OVERLAPPING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007715 ◽

2009 ◽

Vol 23 (08) ◽

pp. 1577-1597 ◽

Cited By ~ 5

Author(s):

JIANNING LIANG ◽

SU YANG ◽

YUANYUAN WANG

Keyword(s):

Feature Selection ◽

Computational Cost ◽

Difficult Problem ◽

Exhaustive Search ◽

Feature Subset Selection ◽

Search Problem ◽

Feature Subset ◽

Ranking Problem ◽

Optimal Feature Subset ◽

Optimal Feature

The goal of feature selection is to search the optimal feature subset with respect to the evaluation function. Exhaustively searching all possible feature subsets requires high computational cost. The alternative suboptimal methods are more efficient and practical but they cannot promise globally optimal results. We propose a new feature selection algorithm based on distance discriminant and distribution overlapping (HFSDD) for continuous features, which overcomes the drawbacks of the exhaustive search approaches and those of the suboptimal methods. The proposed method is able to find the optimal feature subset without exhaustive search or Branch and Bound algorithm. The most difficult problem for optimal feature selection, the search problem, is converted into a feature ranking problem following rigorous theoretical proof such that the computational complexity can be greatly reduced. Since the distribution of overlapping degrees between every two classes can provide useful information for feature selection, HFSDD also takes them into account by using a new approach to estimate the overlapping degrees. In this sense, HFSDD is a distance discriminant and distribution overlapping based solution. HFSDD was compared with ReliefF and mrmrMID on ten data sets. The experimental results show that HFSDD outperforms the other methods.

Download Full-text