Classification of Gene Expression Data using Efficient Feature Selection Technique and Resampling Method

doi:10.35940/ijeat.e7816.088619

Classification of Gene Expression Data using Efficient Feature Selection Technique and Resampling Method

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e7816.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 406-414

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Model ◽

Support Vector ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Resampling Method

Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance

A new distributed feature selection technique for classifying gene expression data

International Journal of Biomathematics ◽

10.1142/s1793524519500396 ◽

2019 ◽

Vol 12 (04) ◽

pp. 1950039 ◽

Cited By ~ 2

Author(s):

Sarah M. Ayyad ◽

Ahmed I. Saleh ◽

Labib M. Labib

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Fuzzy Inference ◽

Research Area ◽

Selection Method ◽

Expression Data ◽

Feature Selection Technique ◽

Inference System ◽

Selection Technique

Classification of gene expression data is a pivotal research area that plays a substantial role in diagnosis and prediction of diseases. Generally, feature selection is one of the extensively used techniques in data mining approaches, especially in classification. Gene expression data are usually composed of dozens of samples characterized by thousands of genes. This increases the dimensionality coupled with the existence of irrelevant and redundant features. Accordingly, the selection of informative genes (features) becomes difficult, which badly affects the gene classification accuracy. In this paper, we consider the feature selection for classifying gene expression microarray datasets. The goal is to detect the most possibly cancer-related genes in a distributed manner, which helps in effectively classifying the samples. Initially, the available huge amount of considered features are subdivided and distributed among several processors. Then, a new filter selection method based on a fuzzy inference system is applied to each subset of the dataset. Finally, all the resulted features are ranked, then a wrapper-based selection method is applied. Experimental results showed that our proposed feature selection technique performs better than other techniques since it produces lower time latency and improves classification performance.

sigFeature: Novel Significant Feature Selection Method for Classification of Gene Expression Data Using Support Vector Machine and t Statistic

Frontiers in Genetics ◽

10.3389/fgene.2020.00247 ◽

2020 ◽

Vol 11 ◽

Cited By ~ 2

Author(s):

Pijush Das ◽

Anirban Roychowdhury ◽

Subhadeep Das ◽

Susanta Roychoudhury ◽

Sucheta Tripathy

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Gene Expression Data ◽

Feature Selection Method ◽

Selection Method ◽

Significant Feature ◽

Support Vector ◽

Expression Data

Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data

Neural Computing and Applications ◽

10.1007/s00521-020-05101-4 ◽

2020 ◽

Cited By ~ 2

Author(s):

Uzma ◽

Feras Al-Obeidat ◽

Abdallah Tubaishat ◽

Babar Shah ◽

Zahid Halim

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Deep Learning ◽

Gene Expression Data ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Large Gene ◽

Unsupervised Deep Learning

Efficient Feature Selection Model for Gene Expression Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.110-116.1948 ◽

2011 ◽

Vol 110-116 ◽

pp. 1948-1952

Author(s):

Patharawut Saengsiri ◽

Sageemas Na Wichian ◽

Phayung Meesad

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Selection Model ◽

Support Vector ◽

Expression Data ◽

Accuracy Rate ◽

Feature Selection Technique ◽

Wrapper Approach ◽

Filter Approach

Finding subset of informative gene is very crucial for biology process because several genes increase sharply and most of them are not related with others. In general, feature selection technique consists of two steps 1) all genes is ranked by a filter approach 2) rank list is sent to a wrapper approach. Nevertheless, the accuracy rate for recognition gene is not enough. Therefore, this paper proposes efficient feature selection model for gene expression data. First, two filter approaches are used to define many subset of attribute such as Correlation based Feature Selection (Cfs) and Gain Ratio (GR). Second, wrapper approach is used to evaluate each length of attribute that based on Support Vector Machine (SVM) and Random Forest (RF). The result of experiment depicts CfsSVM, CfsRF, GRSVM, and GRRF based on proposed model produce higher accuracy rate such as 87.10%, 90.32%, 87.10, and 88.71%, respectively.

An efficient feature selection technique for gene expression data

2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2018.8404977 ◽

2018 ◽

Cited By ~ 1

Author(s):

B Chandra

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique

Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection

Studies in Classification, Data Analysis, and Knowledge Organization - Data Science and Classification ◽

10.1007/3-540-34416-0_35 ◽

2006 ◽

pp. 325-332

Author(s):

Edgar Acuña ◽

Jaime Porras

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Principal Components ◽

Expression Data

Classification of multiple cancer types by multicategory support vector machines using gene expression data

Bioinformatics ◽

10.1093/bioinformatics/btg102 ◽

2003 ◽

Vol 19 (9) ◽

pp. 1132-1139 ◽

Cited By ~ 217

Author(s):

Y. Lee ◽

C.-K. Lee

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Gene Expression Data ◽

Support Vector ◽

Expression Data ◽

Multiple Cancer ◽

Vector Machines ◽

Cancer Types

Hybrid feature selection methods for the Classification of Cancer in Micro-array Gene expression data: a Survey

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/275952020 ◽

2020 ◽

Vol 9 (5) ◽

pp. 8819-8827

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Expression Data ◽

Selection Methods ◽

Micro Array

A Comparative Performance Evaluation of Random Forest Feature Selection on Classification of Hepatocellular Carcinoma Gene Expression Data

2019 3rd International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos48119.2019.8982435 ◽

2019 ◽

Cited By ~ 1

Author(s):

Moh Abdul Latief ◽

Titin Siswantining ◽

Alhadi Bustamam ◽

Devvi Sarwinda

Keyword(s):

Gene Expression ◽

Hepatocellular Carcinoma ◽

Feature Selection ◽

Performance Evaluation ◽

Random Forest ◽

Gene Expression Data ◽

Expression Data ◽

Comparative Performance

Classification of Dengue Fever Patients Based on Gene Expression Data Using Support Vector Machines

PLoS ONE ◽

10.1371/journal.pone.0011267 ◽

2010 ◽

Vol 5 (6) ◽

pp. e11267 ◽

Cited By ~ 25

Author(s):

Ana Lisa V. Gomes ◽

Lawrence J. K. Wee ◽

Asif M. Khan ◽

Laura H. V. G. Gil ◽

Ernesto T. A. Marques ◽

...

Keyword(s):

Gene Expression ◽

Support Vector Machines ◽

Dengue Fever ◽

Gene Expression Data ◽

Support Vector ◽

Expression Data ◽

Vector Machines