Comparative Analysis of Discretization Methods for Gene Selection of Breast Cancer Gene Expression Data

Abstract Motivation Reconstruction of cancer gene networks from gene expression data is important for understanding the mechanisms underlying human cancer. Due to heterogeneity, the tumor tissue samples for a single cancer type can be divided into multiple distinct subtypes (inter-tumor heterogeneity) and are composed of non-cancerous and cancerous cells (intra-tumor heterogeneity). If tumor heterogeneity is ignored when inferring gene networks, the edges specific to individual cancer subtypes and cell types cannot be characterized. However, most existing network reconstruction methods do not simultaneously take inter-tumor and intra-tumor heterogeneity into account. Results In this article, we propose a new Gaussian graphical model-based method for jointly estimating multiple cancer gene networks by simultaneously capturing inter-tumor and intra-tumor heterogeneity. Given gene expression data of heterogeneous samples for different cancer subtypes, a non-cancerous network shared across different cancer subtypes and multiple subtype-specific cancerous networks are estimated jointly. Tumor heterogeneity can be revealed by the difference in the estimated networks. The performance of our method is first evaluated using simulated data, and the results indicate that our method outperforms other state-of-the-art methods. We also apply our method to The Cancer Genome Atlas breast cancer data to reconstruct non-cancerous and subtype-specific cancerous gene networks. Hub nodes in the networks estimated by our method perform important biological functions associated with breast cancer development and subtype classification. Availability and implementation The source code is available at https://github.com/Zhangxf-ccnu/NETI2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A multi-objective strategy in genetic algorithms for gene selection of gene expression data

Artificial Life and Robotics ◽

10.1007/s10015-008-0533-5 ◽

2009 ◽

Vol 13 (2) ◽

pp. 410-413 ◽

Cited By ~ 22

Author(s):

Mohd Saberi Mohamad ◽

Sigeru Omatu ◽

Safaai Deris ◽

Muhammad Faiz Misman ◽

Michifumi Yoshioka

Keyword(s):

Gene Expression ◽

Genetic Algorithms ◽

Gene Expression Data ◽

Gene Selection ◽

Expression Data ◽

Multi Objective ◽

Selection Of

Download Full-text

A Hybrid Gene Selection Strategy Based on Fisher and Ant Colony Optimization Algorithm for Breast Cancer Classification

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i02.19889 ◽

2021 ◽

Vol 17 (02) ◽

pp. 148

Author(s):

Mohammed Hamim ◽

Ismail El Moudden ◽

Mohan D Pant ◽

Hicham Moutachaouik ◽

Mustapha Hain

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Ant Colony Optimization ◽

Gene Expression Data ◽

Optimization Algorithm ◽

Gene Selection ◽

Prediction Performance ◽

Ant Colony ◽

Expression Data ◽

Ant Colony Optimization Algorithm

<div id="titleAndAbstract"><p class="0abstract">Breast cancer poses the greatest threat to human life and especially to women's life. Despite the progress made in data mining technology in recent years, the ability to predict and diagnose such fatal diseases based on gene expression data still reveals a limited prediction performance, which may not be surprising since most of the genes in expression data are believed to be irrelevant or redundant. The dimensionality reduction process may be considered as a crucial step to analyze gene expression data, as it can reduce the high dimensionality of the breast cancer datasets, which may result into a better prediction performance of such diseases. The paper suggests a new hybrid approach-based gene selection that combines the filter method and the Ant Colony Optimization algorithm to find the smallest subset of informative genes (genes markers) among 24,481 genes. The proposed approach combines four machine learning algorithms - C5.0 Decision Tree, Support Vector Machines, K-Nearest Neighbors algorithm, and Random Forest Classifier - to classify each of the selected samples (patients) into two classes which have cancer or not. Compared with existing methods in the literature, experimental results indicate that our proposed gene selection approach achieved globally higher classification accuracies with a relatively smaller number of genes.</p></div>

Download Full-text