scholarly journals An Adaptive Genetic Algorithm with Recursive Feature Elimination Approach for Predicting Malaria Vector Gene Expression Data Classification using Support Vector Machine Kernels

2021 ◽  
Vol 18 (17) ◽  
Author(s):  
Micheal Olaolu AROWOLO ◽  
Marion Olubunmi ADEBIYI ◽  
Chiebuka Timothy NNODIM ◽  
Sulaiman Olaniyi ABDULSALAM ◽  
Ayodele Ariyo ADEBIYI

As mosquito parasites breed across many parts of the sub-Saharan Africa part of the world, infected cells embrace an unpredictable and erratic life period. Millions of individual parasites have gene expressions. Ribonucleic acid sequencing (RNA-seq) is a popular transcriptional technique that has improved the detection of major genetic probes. The RNA-seq analysis generally requires computational improvements of machine learning techniques since it computes interpretations of gene expressions. For this study, an adaptive genetic algorithm (A-GA) with recursive feature elimination (RFE) (A-GA-RFE) feature selection algorithms was utilized to detect important information from a high-dimensional gene expression malaria vector RNA-seq dataset. Support Vector Machine (SVM) kernels were used as the classification algorithms to evaluate its predictive performances. The feasibility of this study was confirmed by using an RNA-seq dataset from the mosquito Anopheles gambiae. The technique results in related performance had 98.3 and 96.7 % accuracy rates, respectively. HIGHLIGHTS Dimensionality reduction method based of feature selection Classification using Support vector machine Classification of malaria vector dataset using an adaptive GA-RFE-SVM GRAPHICAL ABSTRACT

2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Ying Zhang ◽  
Qingchun Deng ◽  
Wenbin Liang ◽  
Xianchun Zou

The application of gene expression data to the diagnosis and classification of cancer has become a hot issue in the field of cancer classification. Gene expression data usually contains a large number of tumor-free data and has the characteristics of high dimensions. In order to select determinant genes related to breast cancer from the initial gene expression data, we propose a new feature selection method, namely, support vector machine based on recursive feature elimination and parameter optimization (SVM-RFE-PO). The grid search (GS) algorithm, the particle swarm optimization (PSO) algorithm, and the genetic algorithm (GA) are applied to search the optimal parameters in the feature selection process. Herein, the new feature selection method contains three kinds of algorithms: support vector machine based on recursive feature elimination and grid search (SVM-RFE-GS), support vector machine based on recursive feature elimination and particle swarm optimization (SVM-RFE-PSO), and support vector machine based on recursive feature elimination and genetic algorithm (SVM-RFE-GA). Then the selected optimal feature subsets are used to train the SVM classifier for cancer classification. We also use random forest feature selection (RFFS), random forest feature selection and grid search (RFFS-GS), and minimal redundancy maximal relevance (mRMR) algorithm as feature selection methods to compare the effects of the SVM-RFE-PO algorithm. The results showed that the feature subset obtained by feature selection using SVM-RFE-PSO algorithm results has a better prediction performance of Area Under Curve (AUC) in the testing data set. This algorithm not only is time-saving, but also is capable of extracting more representative and useful genes.


Molecules ◽  
2020 ◽  
Vol 25 (6) ◽  
pp. 1442 ◽  
Author(s):  
Tao Shen ◽  
Hong Yu ◽  
Yuan-Zhong Wang

Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana.


2016 ◽  
Vol 78 (5-10) ◽  
Author(s):  
Farzana Kabir Ahmad ◽  
Abdullah Yousef Awwad Al-Qammaz ◽  
Yuhanis Yusof

Human-computer intelligent interaction (HCII) is a rising field of science that aims to refine and enhance the interaction between computer and human. Since emotion plays a vital role in human daily life, the ability of computer to interpret and response to human emotion is a crucial element for future intelligent system. Accordingly, several studies have been conducted to recognise human emotion using different technique such as facial expression, speech, galvanic skin response (GSR), or heart rate (HR). However, such techniques have problems mainly in terms of credibility and reliability as people can fake their feeling and response. Electroencephalogram (EEG) on the other has shown to be a very effective way in recognising human emotion as this technique records the brain activity of human and they can hardly be deceived by voluntary control. Regardless the popularity of EEG in recognizing human emotion, this study field is relatively challenging as EEG signal is nonlinear, involves myriad factors and chaotic in nature. These issues have led to high dimensional problem and poor classification results. To address such problems, this study has proposed a novel computational model, which consist of three main stages, namely a) feature extraction; b) feature selection and c) classifier. Discrete wavelet packet transform (DWPT) has been used to extract EEG signals feature and ultimately 204,800 features from 32 subject-independent have been obtained. Meanwhile, Genetic Algorithm (GA) and Least squares support vector machine (LS-SVM) have been used as a feature selection technique and classifier respectively. This computational model is tested on the common DEAP pre-processed EEG dataset in order to classify three levels of valence and arousal. The empirical results have shown that the proposed GA-LSSVM, has improved the classification results to 49.22% and 54.83% for valence and arousal respectively, whereas is it observed that 46.33% of valence and 48.30% of arousal classification were achieved when no feature selection technique is applied on the identical classifier


2020 ◽  
Vol 14 (3) ◽  
pp. 269-279
Author(s):  
Hayet Djellali ◽  
Nacira Ghoualmi-Zine ◽  
Souad Guessoum

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.


Author(s):  
JUANA CANUL-REICH ◽  
LAWRENCE O. HALL ◽  
DMITRY B. GOLDGOF ◽  
JOHN N. KORECKI ◽  
STEVEN ESCHRICH

Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.


Sign in / Sign up

Export Citation Format

Share Document