An efficient binary chimp optimization algorithm for feature selection in biomedical data classification

Author(s):  
Elnaz Pashaei ◽  
Elham Pashaei
Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 717
Author(s):  
Garba Abdulrauf Sharifai ◽  
Zurinahni Zainol

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.


Author(s):  
Seyed Jalaleddin Mousavirad ◽  
Hossein Ebrahimpour-Komleh

Classification of biomedical data plays a significant role in prediction and diagnosis of disease. The existence of redundant and irrelevant features is one of the major problems in biomedical data classification. Excluding these features can improve the performance of classification algorithm. Feature selection is the problem of selecting a subset of features without reducing the accuracy of the original set of features. These algorithms are divided into three categories: wrapper, filter, and embedded methods. Wrapper methods use the learning algorithm for selection of features while filter methods use statistical characteristics of data. In the embedded methods, feature selection process combines with the learning process. Population-based metaheuristics can be applied for wrapper feature selection. In these algorithms, a population of candidate solutions is created. Then, they try to improve the objective function using some operators. This chapter presents the application of population-based feature selection to deal with issues of high dimensionality in the biomedical data classification. The result shows that population-based feature selection has presented acceptable performance in biomedical data classification.


2016 ◽  
Vol 72 (8) ◽  
pp. 3210-3221 ◽  
Author(s):  
Kuan-Cheng Lin ◽  
Kai-Yuan Zhang ◽  
Yi-Hung Huang ◽  
Jason C. Hung ◽  
Neil Yen

Mathematics ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 2008
Author(s):  
Mustufa Haider Abidi ◽  
Usama Umer ◽  
Muneer Khan Mohammed ◽  
Mohamed K. Aboudaif ◽  
Hisham Alkhalefah

Data classification has been considered extensively in different fields, such as machine learning, artificial intelligence, pattern recognition, and data mining, and the expansion of classification has yielded immense achievements. The automatic classification of maintenance data has been investigated over the past few decades owing to its usefulness in construction and facility management. To utilize automated data classification in the maintenance field, a data classification model is implemented in this study based on the analysis of different mechanical maintenance data. The developed model involves four main steps: (a) data acquisition, (b) feature extraction, (c) feature selection, and (d) classification. During data acquisition, four types of dataset are collected from the benchmark Google datasets. The attributes of each dataset are further processed for classification. Principal component analysis and first-order and second-order statistical features are computed during the feature extraction process. To reduce the dimensions of the features for error-free classification, feature selection was performed. The hybridization of two algorithms, the Whale Optimization Algorithm (WOA) and Spotted Hyena Optimization (SHO), tends to produce a new algorithm—i.e., a Spotted Hyena-based Whale Optimization Algorithm (SH-WOA), which is adopted for performing feature selection. The selected features are subjected to a deep learning algorithm called Recurrent Neural Network (RNN). To enhance the efficiency of conventional RNNs, the number of hidden neurons in an RNN is optimized using the developed SH-WOA. Finally, the efficacy of the proposed model is verified utilizing the entire dataset. Experimental results show that the developed model can effectively solve uncertain data classification, which minimizes the execution time and enhances efficiency.


Author(s):  
Noria Bidi ◽  
Zakaria Elberrichi

Feature selection is essential to improve the classification effectiveness. This paper presents a new adaptive algorithm called FS-PeSOA (feature selection penguins search optimization algorithm) which is a meta-heuristic feature selection method based on “Penguins Search Optimization Algorithm” (PeSOA), it will be combined with different classifiers to find the best subset features, which achieve the highest accuracy in classification. In order to explore the feature subset candidates, the bio-inspired approach PeSOA generates during the process a trial feature subset and estimates its fitness value by using three classifiers for each case: Naive Bayes (NB), Nearest Neighbors (KNN) and Support Vector Machines (SVMs). Our proposed approach has been experimented on six well known benchmark datasets (Wisconsin Breast Cancer, Pima Diabetes, Mammographic Mass, Dermatology, Colon Tumor and Prostate Cancer data sets). Experimental results prove that the classification accuracy of FS-PeSOA is the highest and very powerful for different datasets.


Sign in / Sign up

Export Citation Format

Share Document