Using Penguins Search Optimization Algorithm for Best Features Selection for Biomedical Data Classification

Author(s):  
Noria Bidi ◽  
Zakaria Elberrichi

Feature selection is essential to improve the classification effectiveness. This paper presents a new adaptive algorithm called FS-PeSOA (feature selection penguins search optimization algorithm) which is a meta-heuristic feature selection method based on “Penguins Search Optimization Algorithm” (PeSOA), it will be combined with different classifiers to find the best subset features, which achieve the highest accuracy in classification. In order to explore the feature subset candidates, the bio-inspired approach PeSOA generates during the process a trial feature subset and estimates its fitness value by using three classifiers for each case: Naive Bayes (NB), Nearest Neighbors (KNN) and Support Vector Machines (SVMs). Our proposed approach has been experimented on six well known benchmark datasets (Wisconsin Breast Cancer, Pima Diabetes, Mammographic Mass, Dermatology, Colon Tumor and Prostate Cancer data sets). Experimental results prove that the classification accuracy of FS-PeSOA is the highest and very powerful for different datasets.

2020 ◽  
pp. 407-421
Author(s):  
Noria Bidi ◽  
Zakaria Elberrichi

This article presents a new adaptive algorithm called FS-SLOA (Feature Selection-Seven Spot Ladybird Optimization Algorithm) which is a meta-heuristic feature selection method based on the foraging behavior of a seven spot ladybird. The new efficient technique has been applied to find the best subset features, which achieves the highest accuracy in classification using three classifiers: the Naive Bayes (NB), the Nearest Neighbors (KNN) and the Support Vector Machine (SVM). The authors' proposed approach has been experimented on four well-known benchmark datasets (Wisconsin Breast cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets) taken from the UCI machine learning repository. Experimental results prove that the classification accuracy of FS-SLOA is the best performing for different datasets.


2018 ◽  
Vol 9 (3) ◽  
pp. 75-87 ◽  
Author(s):  
Noria Bidi ◽  
Zakaria Elberrichi

This article presents a new adaptive algorithm called FS-SLOA (Feature Selection-Seven Spot Ladybird Optimization Algorithm) which is a meta-heuristic feature selection method based on the foraging behavior of a seven spot ladybird. The new efficient technique has been applied to find the best subset features, which achieves the highest accuracy in classification using three classifiers: the Naive Bayes (NB), the Nearest Neighbors (KNN) and the Support Vector Machine (SVM). The authors' proposed approach has been experimented on four well-known benchmark datasets (Wisconsin Breast cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets) taken from the UCI machine learning repository. Experimental results prove that the classification accuracy of FS-SLOA is the best performing for different datasets.


2021 ◽  
Vol 40 (1) ◽  
pp. 535-550
Author(s):  
Ashis Kumar Mandal ◽  
Rikta Sen ◽  
Basabi Chakraborty

The fundamental aim of feature selection is to reduce the dimensionality of data by removing irrelevant and redundant features. As finding out the best subset of features from all possible subsets is computationally expensive, especially for high dimensional data sets, meta-heuristic algorithms are often used as a promising method for addressing the task. In this paper, a variant of recent meta-heuristic approach Owl Search Optimization algorithm (OSA) has been proposed for solving the feature selection problem within a wrapper-based framework. Several strategies are incorporated with an aim to strengthen BOSA (binary version of OSA) in searching the global best solution. The meta-parameter of BOSA is initialized dynamically and then adjusted using a self-adaptive mechanism during the search process. Besides, elitism and mutation operations are combined with BOSA to control the exploitation and exploration better. This improved BOSA is named in this paper as Modified Binary Owl Search Algorithm (MBOSA). Decision Tree (DT) classifier is used for wrapper based fitness function, and the final classification performance of the selected feature subset is evaluated by Support Vector Machine (SVM) classifier. Simulation experiments are conducted on twenty well-known benchmark datasets from UCI for the evaluation of the proposed algorithm, and the results are reported based on classification accuracy, the number of selected features, and execution time. In addition, BOSA along with three common meta-heuristic algorithms Binary Bat Algorithm (BBA), Binary Particle Swarm Optimization (BPSO), and Binary Genetic Algorithm (BGA) are used for comparison. Simulation results show that the proposed approach outperforms similar methods by reducing the number of features significantly while maintaining a comparable level of classification accuracy.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255307
Author(s):  
Fujun Wang ◽  
Xing Wang

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.


2021 ◽  
Vol 12 (2) ◽  
pp. 1-15
Author(s):  
Khadoudja Ghanem ◽  
Abdesslem Layeb

Backtracking search optimization algorithm is a recent stochastic-based global search algorithm for solving real-valued numerical optimization problems. In this paper, a binary version of backtracking algorithm is proposed to deal with 0-1 optimization problems such as feature selection and knapsack problems. Feature selection is the process of selecting a subset of relevant features for use in model construction. Irrelevant features can negatively impact model performances. On the other hand, knapsack problem is a well-known optimization problem used to assess discrete algorithms. The objective of this research is to evaluate the discrete version of backtracking algorithm on the two mentioned problems and compare obtained results with other binary optimization algorithms using four usual classifiers: logistic regression, decision tree, random forest, and support vector machine. Empirical study on biological microarray data and experiments on 0-1 knapsack problems show the effectiveness of the binary algorithm and its ability to achieve good quality solutions for both problems.


2020 ◽  
Vol 10 (2) ◽  
pp. 370-379 ◽  
Author(s):  
Jie Cai ◽  
Lingjing Hu ◽  
Zhou Liu ◽  
Ke Zhou ◽  
Huailing Zhang

Background: Mild cognitive impairment (MCI) patients are a high-risk group for Alzheimer's disease (AD). Each year, the diagnosed of 10–15% of MCI patients are converted to AD (MCI converters, MCI_C), while some MCI patients remain relatively stable, and unconverted (MCI stable, MCI_S). MCI patients are considered the most suitable population for early intervention treatment for dementia, and magnetic resonance imaging (MRI) is clinically the most recommended means of imaging examination. Therefore, using MRI image features to reliably predict the conversion from MCI to AD can help physicians carry out an effective treatment plan for patients in advance so to prevent or slow down the development of dementia. Methods: We proposed an embedded feature selection method based on the least squares loss function and within-class scatter to select the optimal feature subset. The optimal subsets of features were used for binary classification (AD, MCI_C, MCI_S, normal control (NC) in pairs) based on a support vector machine (SVM), and the optimal 3-class features were used for 3-class classification (AD, MCI_C, MCI_S, NC in triples) based on one-versus-one SVMs (OVOSVMs). To ensure the insensitivity of the results to the random train/test division, a 10-fold cross-validation has been repeated for each classification. Results: Using our method for feature selection, only 7 features were selected from the original 90 features. With using the optimal subset in the SVM, we classified MCI_C from MCI_S with an accuracy, sensitivity, and specificity of 71.17%, 68.33% and 73.97%, respectively. In comparison, in the 3-class classification (AD vs. MCI_C vs. MCI_S) with OVOSVMs, our method selected 24 features, and the classification accuracy was 81.9%. The feature selection results were verified to be identical to the conclusions of the clinical diagnosis. Our feature selection method achieved the best performance, comparing with the existing methods using lasso and fused lasso for feature selection. Conclusion: The results of this study demonstrate the potential of the proposed approach for predicting the conversion from MCI to AD by identifying the affected brain regions undergoing this conversion.


Author(s):  
Muhang Zhang ◽  
Xiaohong Shen ◽  
Lei He ◽  
Haiyan Wang

Feature selection is an essential process in the identification task because the irrelevant and redundant features contained in the unselected feature set can reduce both the performance and efficiency of recognition. However, when identifying the underwater targets based on their radiated noise, the diversity of targets, and the complexity of underwater acoustic channels introduce various complex relationships among the extracted acoustic features. For this problem, this paper employs the normalized maximum information coefficient (NMIC) to measure the correlations between features and categories and the redundancy among different features and further proposes an NMIC based feature selection method (NMIC-FS). Then, on the real-world dataset, the average classification accuracy estimated by models such as random forest and support vector machine is used to evaluate the performance of the NMIC-FS. The analysis results show that the feature subset obtained by NMIC-FS can achieve higher classification accuracy in a shorter time than that without selection. Compared with correlation-based feature selection, laplacian score, and lasso methods, the NMIC-FS improves the classification accuracy faster in the process of feature selection and requires the least acoustic features to obtain classification accuracy comparable to that of the full feature set.


Author(s):  
Nina Zhou ◽  
Lipo Wang

This chapter introduces an approach to class-dependent feature selection and a novel support vector machine (SVM). The relative background and theory are presented for describing the proposed method, and real applications of the method on several biomedical datasets are demonstrated in the end. The authors hope this chapter can provide readers a different view of feature selection method and also the classifier so as to promote more promising methods and applications.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Min Li ◽  
XiaoYong Pan ◽  
Tao Zeng ◽  
Yu-Hang Zhang ◽  
Kaiyan Feng ◽  
...  

Among various risk factors for the initiation and progression of cancer, alternative polyadenylation (APA) is a remarkable endogenous contributor that directly triggers the malignant phenotype of cancer cells. APA affects biological processes at a transcriptional level in various ways. As such, APA can be involved in tumorigenesis through gene expression, protein subcellular localization, or transcription splicing pattern. The APA sites and status of different cancer types may have diverse modification patterns and regulatory mechanisms on transcripts. Potential APA sites were screened by applying several machine learning algorithms on a TCGA-APA dataset. First, a powerful feature selection method, minimum redundancy maximum relevancy, was applied on the dataset, resulting in a feature list. Then, the feature list was fed into the incremental feature selection, which incorporated the support vector machine as the classification algorithm, to extract key APA features and build a classifier. The classifier can classify cancer patients into cancer types with perfect performance. The key APA-modified genes had a potential prognosis ability because of their significant power in the survival analysis of TCGA pan-cancer data.


Sign in / Sign up

Export Citation Format

Share Document