Efficient Feature Selection Model for Gene Expression Data

Finding subset of informative gene is very crucial for biology process because several genes increase sharply and most of them are not related with others. In general, feature selection technique consists of two steps 1) all genes is ranked by a filter approach 2) rank list is sent to a wrapper approach. Nevertheless, the accuracy rate for recognition gene is not enough. Therefore, this paper proposes efficient feature selection model for gene expression data. First, two filter approaches are used to define many subset of attribute such as Correlation based Feature Selection (Cfs) and Gain Ratio (GR). Second, wrapper approach is used to evaluate each length of attribute that based on Support Vector Machine (SVM) and Random Forest (RF). The result of experiment depicts CfsSVM, CfsRF, GRSVM, and GRRF based on proposed model produce higher accuracy rate such as 87.10%, 90.32%, 87.10, and 88.71%, respectively.

Download Full-text

Methods for Gene Selection and Classification of Microarray Dataset

Handbook of Research on Biomimicry in Information Retrieval and Knowledge Management - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-3004-6.ch004 ◽

2018 ◽

pp. 66-77

Author(s):

Mekour Norreddine

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Selection ◽

Information Gain ◽

Microarray Dataset ◽

Data Sets ◽

Expression Data ◽

Wrapper Approach ◽

Filter Approach

One of the problems that gene expression data resolved is feature selection. There is an important process for choosing which features are important for prediction; there are two general approaches for feature selection: filter approach and wrapper approach. In this chapter, the authors combine the filter approach with method ranked information gain and wrapper approach with a searching method of the genetic algorithm. The authors evaluate their approach on two data sets of gene expression data: Leukemia, and the Central Nervous System. The classifier Decision tree (C4.5) is used for improving the classification performance.

Download Full-text

Classification of Gene Expression Data using Efficient Feature Selection Technique and Resampling Method

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e7816.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 406-414

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Model ◽

Support Vector ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Resampling Method

Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance

Download Full-text

A new distributed feature selection technique for classifying gene expression data

International Journal of Biomathematics ◽

10.1142/s1793524519500396 ◽

2019 ◽

Vol 12 (04) ◽

pp. 1950039 ◽

Cited By ~ 2

Author(s):

Sarah M. Ayyad ◽

Ahmed I. Saleh ◽

Labib M. Labib

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Fuzzy Inference ◽

Research Area ◽

Selection Method ◽

Expression Data ◽

Feature Selection Technique ◽

Inference System ◽

Selection Technique

Classification of gene expression data is a pivotal research area that plays a substantial role in diagnosis and prediction of diseases. Generally, feature selection is one of the extensively used techniques in data mining approaches, especially in classification. Gene expression data are usually composed of dozens of samples characterized by thousands of genes. This increases the dimensionality coupled with the existence of irrelevant and redundant features. Accordingly, the selection of informative genes (features) becomes difficult, which badly affects the gene classification accuracy. In this paper, we consider the feature selection for classifying gene expression microarray datasets. The goal is to detect the most possibly cancer-related genes in a distributed manner, which helps in effectively classifying the samples. Initially, the available huge amount of considered features are subdivided and distributed among several processors. Then, a new filter selection method based on a fuzzy inference system is applied to each subset of the dataset. Finally, all the resulted features are ranked, then a wrapper-based selection method is applied. Experimental results showed that our proposed feature selection technique performs better than other techniques since it produces lower time latency and improves classification performance.

Download Full-text

A hybrid filter/wrapper approach of feature selection for gene expression data

2008 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2008.4811698 ◽

2008 ◽

Cited By ~ 1

Author(s):

Chao-Hsuan Ke ◽

Cheng-Hong Yang ◽

Li-Yeh Chuang ◽

Cheng-San Yang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Expression Data ◽

Hybrid Filter ◽

Wrapper Approach ◽

Selection For

Download Full-text

A COMPARATIVE STUDY ON GENE SELECTION METHODS FOR TISSUES CLASSIFICATION ON LARGE SCALE GENE EXPRESSION DATA

Jurnal Teknologi ◽

10.11113/jt.v78.8843 ◽

2016 ◽

Vol 78 (5-10) ◽

Author(s):

Farzana Kabir Ahmad

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Large Scale ◽

Gene Selection ◽

Support Vector ◽

Breast Cancer Dataset ◽

Expression Data ◽

Selection Methods ◽

Normal Tissues

Deoxyribonucleic acid (DNA) microarray technology is the recent invention that provided colossal opportunities to measure a large scale of gene expressions simultaneously. However, interpreting large scale of gene expression data remain a challenging issue due to their innate nature of “high dimensional low sample size”. Microarray data mainly involved thousands of genes, n in a very small size sample, p which complicates the data analysis process. For such a reason, feature selection methods also known as gene selection methods have become apparently need to select significant genes that present the maximum discriminative power between cancerous and normal tissues. Feature selection methods can be structured into three basic factions; a) filter methods; b) wrapper methods and c) embedded methods. Among these methods, filter gene selection methods provide easy way to calculate the informative genes and can simplify reduce the large scale microarray datasets. Although filter based gene selection techniques have been commonly used in analyzing microarray dataset, these techniques have been tested separately in different studies. Therefore, this study aims to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. In this experiment, common classifiers, Support Vector Machine (SVM) is used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM. Furthermore, this study has shown SVM performance remained moderately unaffected unless a very small size of genes was selected.

Download Full-text

sigFeature: Novel Significant Feature Selection Method for Classification of Gene Expression Data Using Support Vector Machine and t Statistic

Frontiers in Genetics ◽

10.3389/fgene.2020.00247 ◽

2020 ◽

Vol 11 ◽

Cited By ~ 2

Author(s):

Pijush Das ◽

Anirban Roychowdhury ◽

Subhadeep Das ◽

Susanta Roychoudhury ◽

Sucheta Tripathy

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Gene Expression Data ◽

Feature Selection Method ◽

Selection Method ◽

Significant Feature ◽

Support Vector ◽

Expression Data

Download Full-text

Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data

Neural Computing and Applications ◽

10.1007/s00521-020-05101-4 ◽

2020 ◽

Cited By ~ 2

Author(s):

Uzma ◽

Feras Al-Obeidat ◽

Abdallah Tubaishat ◽

Babar Shah ◽

Zahid Halim

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Deep Learning ◽

Gene Expression Data ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Large Gene ◽

Unsupervised Deep Learning

Download Full-text

A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2007.07.008 ◽

2007 ◽

Vol 41 (2) ◽

pp. 161-175 ◽

Cited By ~ 71

Author(s):

Zhenyu Chen ◽

Jianping Li ◽

Liwei Wei

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Gene Expression Data ◽

Rule Extraction ◽

Support Vector ◽

Cancer Tissue ◽

Expression Data ◽

Multiple Kernel ◽

Kernel Support Vector Machine

Download Full-text

Cancer data classification by quantum-inspired immune clone optimization-based optimal feature selection using gene expression data: deep learning approach

Data Technologies and Applications ◽

10.1108/dta-05-2020-0109 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Nageswara Rao Eluri ◽

Gangadhara Rao Kancharla ◽

Suresh Dara ◽

Venkatesulu Dondeti

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Selection Model ◽

Expression Data ◽

Content Type ◽

Cancer Data ◽

Optimal Feature Selection ◽

Optimal Feature

PurposeGene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.Design/methodology/approachThe proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.FindingsThe proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.Originality/valueThis paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.

Download Full-text

An efficient feature selection technique for gene expression data

2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2018.8404977 ◽

2018 ◽

Cited By ~ 1

Author(s):

B Chandra

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text