scholarly journals Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm

Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 717
Author(s):  
Garba Abdulrauf Sharifai ◽  
Zurinahni Zainol

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.

2021 ◽  
Vol 11 (2) ◽  
pp. 817-835
Author(s):  
Dr.M. Praveena ◽  
Dr.V. Jaiganesh

Background: High dimensional datasets contain the curse of dimensionality, and hence data mining becomes a more difficult task. Feature selection in the knowledge data and discovery process provides a solution for this curse of dimensionality issue and helps the classification task reduce the time complexity and improve the accuracy. Objectives: This paper aims to recognize a bio-inspired algorithm that best suits feature selection and utilizes optimized feature selection techniques. This algorithm is used to design machine learning classifiers that are suitable for multiple datasets and for both high dimensional datasets, moreover to carry out performance analysis with regards to the accuracy of a classification and the processing time for classification. Methods: This study employs an improved form of grasshopper optimization algorithm to perform feature selection task. Evolutionary outlay aware deep belief network is used to perform the classification task. Findings: In this research, 20 UCI benchmark data sets are taken with full 60 features and 30000 instances. The datasets are Mammography, Monks-1, Bupa, Credit, Parkinson's, Monk-2, Sonar, Ecoli, Prognostic, Ionosphere, Monk-3, Yeast, Car, Blood, Pima, Spect, Vert, Prognostic, Contraceptive, and Tic-Tac-Toe endgame. Table 1 describes the dataset details, number of instances, datasets and features. The overall performance is performed using MATLAB 6.0 tool, which runs on Microsoft Windows 8, and the configuration is Core 13 processor with 1 TB hard disk and 8GB RAM. Performance standards, like classification accuracy and the processing time for classification, is achieved. Novelty: Interestingly, the Improved Grasshopper Optimization Algorithm uses error rate and classification accuracy of the Evolutionary Outlay Aware –Deep Belief Network Classifier as fitness function values. This combined work of classification and feature selection is briefly represented as IGOA-EOA-DBNC. Twenty datasets are selected for testing the performance regarding elapsed time and accuracy, which gives better results.


2018 ◽  
Vol 10 (3) ◽  
pp. 478-495 ◽  
Author(s):  
Ibrahim Aljarah ◽  
Ala’ M. Al-Zoubi ◽  
Hossam Faris ◽  
Mohammad A. Hassonah ◽  
Seyedali Mirjalili ◽  
...  

2020 ◽  
Vol 127 ◽  
pp. 33-53 ◽  
Author(s):  
Dong Wang ◽  
Hongmei Chen ◽  
Tianrui Li ◽  
Jihong Wan ◽  
Yanyong Huang

Author(s):  
Reyhaneh Yaghobzadeh ◽  
Seyyed Reza Kamel ◽  
Mojtaba Asgari ◽  
Hassan Saadatmand ◽  

Sign in / Sign up

Export Citation Format

Share Document