Gene Selection Using Parallel Lion Optimization Method in Microarray Data for Cancer Classification

In the field of bioinformatics research, a large volume of genetic data has been generated. Availability of higher throughput devices at lower cost has contributed to this generation of huge volumetric data. Handling such numerous data has become extremely challenging for selecting the relevant disease-causing gene. The development of microarray technology provides higher chances of cancer diagnosis, by enabling to measure the expression level of multiple genes at the same stretch. Selecting the relevant gene by using classifiers for investigation of gene expression data is a complicated process. Proper identification of gene from the gene expression datasets plays a vital role in improving the accuracy of classification. In this article, identification of the highly relevant gene from the gene expression data for cancer treatment is discussed in detail. By using modified meta-heuristic approach, known as 'parallel lion optimization' (PLOA) for selecting genes from microarray data that can classify various cancer sub-types with more accuracy. The experimental results depict that PLOA outperforms than LOA and other well-known approaches, considering the five benchmark cancer gene expression dataset. It returns 99% classification accuracy for the dataset namely Prostate, Lung, Leukemia and Central Nervous system (CNS) for top 200 genes. Prostate and Lymphoma dataset PLOA is 99.19% and 99.93% respectively. On evaluating the result with other algorithm, the higher level of accuracy in gene selection is achieved by the proposed algorithm.

Download Full-text

Relevant Gene Selection and Classification of Leukemia Gene Expression Data

Emerging Research in Computing, Information, Communication and Applications ◽

10.1007/978-981-10-0287-8_47 ◽

2016 ◽

pp. 503-510

Author(s):

S. Jacophine Susmi ◽

H. Khanna Nehemiah ◽

A. Kannan ◽

Jabez Christopher

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Selection ◽

Expression Data ◽

Relevant Gene

Download Full-text

Filter-Based Gene Selection Method for Tissues Classification on Large Scale Gene Expression Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.15.11216 ◽

2018 ◽

Vol 7 (2.15) ◽

pp. 68

Author(s):

Farzana Kabir Ahmad ◽

Yuhanis Yusof ◽

Nooraini Yusoff

Keyword(s):

Gene Expression ◽

Dna Microarray ◽

Gene Expression Data ◽

Microarray Data ◽

Large Scale ◽

Gene Selection ◽

Support Vector ◽

Expression Data ◽

Selection Methods ◽

Normal Tissues

DNA microarray technology is a current innovative tool that has offers a new perspective to look sight into cellular systems and measure a large scale of gene expressions at once. Regardless the novel invention of DNA microarray, most of its results relies on the computational intelligence power, which is used to interpret the large number of data. At present, interpreting large scale of gene expression data remain a thought-provoking issue due to their innate nature of “high dimensional low sample size”. Microarray data mainly involved thousands of genes, n in a very small size sample, p. In addition, this data are often overwhelmed, over fitting and confused by the complexity of data analysis. Due to the nature of this microarray data, it is also common that a large number of genes may not be informative for classification purposes. For such a reason, many studies have used feature selection methods to select significant genes that present the maximum discriminative power between cancerous and normal tissues. In this study, we aim to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. Two common classifiers, Support Vector Machine (SVM) and Decision Tree (C4.5) are used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM while IG fit for C4.5. In a colon dataset, SVM has achieved a specificity of 86% with SNR while and 80% for IG. In contract, C4.5 has obtained a specificity of 78% for IG on the identical dataset. These results indicate that SVM performed slightly better with IG pre-processed data compare to C4.5 on the same dataset.

Download Full-text

The Analysis of Gene Expression Data, Statistical Analysis of Gene Expression Microarray Data

Technometrics ◽

10.1198/tech.2003.s188 ◽

2003 ◽

Vol 45 (4) ◽

pp. 375-375

Keyword(s):

Gene Expression ◽

Statistical Analysis ◽

Gene Expression Data ◽

Microarray Data ◽

Expression Data ◽

Gene Expression Microarray ◽

Expression Microarray ◽

Gene Expression Microarray Data

Download Full-text

Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures

BioMed Research International ◽

10.1155/2019/2497509 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Suyan Tian ◽

Chi Wang ◽

Bing Wang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Selection ◽

Selection Process ◽

Biological Knowledge ◽

Expression Data ◽

Selection Methods ◽

Its Gene ◽

Active Research

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.

Download Full-text

CLUSTERING GENE EXPRESSION DATA WITH KERNEL PRINCIPAL COMPONENTS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001168 ◽

2005 ◽

Vol 03 (02) ◽

pp. 303-316 ◽

Cited By ~ 6

Author(s):

ZHENQIU LIU ◽

DECHANG CHEN ◽

HALIMA BENSMAIL ◽

YING XU

Keyword(s):

Gene Expression ◽

Principal Component Analysis ◽

Gene Expression Data ◽

Microarray Data ◽

Principal Components ◽

Data Clustering ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

Expression Data ◽

Fuzzy C Means

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms.

Download Full-text

Bootstrapping Consistency Method for Optimal Gene Selection from Microarray Gene Expression Data for Classification Problems

Machine Learning in Bioinformatics ◽

10.1002/9780470397428.ch4 ◽

2009 ◽

pp. 89-110 ◽

Cited By ~ 1

Author(s):

Shaoning Pang ◽

Ilkka Havukkala ◽

Yingjie Hu ◽

Nikola Kasabov

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Selection ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Classification Problems ◽

Microarray Gene Expression ◽

Microarray Gene ◽

Consistency Method

Download Full-text

CLASSIFYING TEMPORAL MICROARRAY DATA BY SELECTING INFORMATIVE GENES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013410060 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1341006

Author(s):

QIANG LOU ◽

ZORAN OBRADOVIC

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Data ◽

Data Sets ◽

Temporal Data ◽

Expression Data ◽

Selection Methods ◽

Temporal Gene Expression ◽

Single Matrix

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.

Download Full-text