Gene Selection Using Parallel Lion Optimization Method in Microarray Data for Cancer Classification

2019 ◽  
Vol 9 (6) ◽  
pp. 1294-1300 ◽  
Author(s):  
A. Sampathkumar ◽  
P. Vivekanandan

In the field of bioinformatics research, a large volume of genetic data has been generated. Availability of higher throughput devices at lower cost has contributed to this generation of huge volumetric data. Handling such numerous data has become extremely challenging for selecting the relevant disease-causing gene. The development of microarray technology provides higher chances of cancer diagnosis, by enabling to measure the expression level of multiple genes at the same stretch. Selecting the relevant gene by using classifiers for investigation of gene expression data is a complicated process. Proper identification of gene from the gene expression datasets plays a vital role in improving the accuracy of classification. In this article, identification of the highly relevant gene from the gene expression data for cancer treatment is discussed in detail. By using modified meta-heuristic approach, known as 'parallel lion optimization' (PLOA) for selecting genes from microarray data that can classify various cancer sub-types with more accuracy. The experimental results depict that PLOA outperforms than LOA and other well-known approaches, considering the five benchmark cancer gene expression dataset. It returns 99% classification accuracy for the dataset namely Prostate, Lung, Leukemia and Central Nervous system (CNS) for top 200 genes. Prostate and Lymphoma dataset PLOA is 99.19% and 99.93% respectively. On evaluating the result with other algorithm, the higher level of accuracy in gene selection is achieved by the proposed algorithm.

2018 ◽  
Vol 7 (2.15) ◽  
pp. 68
Author(s):  
Farzana Kabir Ahmad ◽  
Yuhanis Yusof ◽  
Nooraini Yusoff

DNA microarray technology is a current innovative tool that has offers a new perspective to look sight into cellular systems and measure a large scale of gene expressions at once. Regardless the novel invention of DNA microarray, most of its results relies on the computational intelligence power, which is used to interpret the large number of data. At present, interpreting large scale of gene expression data remain a thought-provoking issue due to their innate nature of “high dimensional low sample size”. Microarray data mainly involved thousands of genes, n in a very small size sample, p. In addition, this data are often overwhelmed, over fitting and confused by the complexity of data analysis. Due to the nature of this microarray data, it is also common that a large number of genes may not be informative for classification purposes. For such a reason, many studies have used feature selection methods to select significant genes that present the maximum discriminative power between cancerous and normal tissues. In this study, we aim to investigate and compare the effectiveness of these four popular filter gene selection methods namely Signal-to-Noise ratio (SNR), Fisher Criterion (FC), Information Gain (IG) and t-Test in selecting informative genes that can distinguish cancer and normal tissues. Two common classifiers, Support Vector Machine (SVM) and Decision Tree (C4.5) are used to train the selected genes. These gene selection methods are tested on three large scales of gene expression datasets, namely breast cancer dataset, colon dataset, and lung dataset. This study has discovered that IG and SNR are more suitable to be used with SVM while IG fit for C4.5. In a colon dataset, SVM has achieved a specificity of 86% with SNR while and 80% for IG. In contract, C4.5 has obtained a specificity of 78% for IG on the identical dataset. These results indicate that SVM performed slightly better with IG pre-processed data compare to C4.5 on the same dataset.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Suyan Tian ◽  
Chi Wang ◽  
Bing Wang

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.


2005 ◽  
Vol 03 (02) ◽  
pp. 303-316 ◽  
Author(s):  
ZHENQIU LIU ◽  
DECHANG CHEN ◽  
HALIMA BENSMAIL ◽  
YING XU

Kernel principal component analysis (KPCA) has been applied to data clustering and graphic cut in the last couple of years. This paper discusses the application of KPCA to microarray data clustering. A new algorithm based on KPCA and fuzzy C-means is proposed. Experiments with microarray data show that the proposed algorithms is in general superior to traditional algorithms.


2013 ◽  
Vol 11 (03) ◽  
pp. 1341006
Author(s):  
QIANG LOU ◽  
ZORAN OBRADOVIC

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.


2007 ◽  
Vol 11 (2) ◽  
pp. 219-222 ◽  
Author(s):  
Mohd Saberi Mohamad ◽  
Sigeru Omatu ◽  
Safaai Deris ◽  
Siti Zaiton Mohd Hashim

Sign in / Sign up

Export Citation Format

Share Document