DISCOVERY OF HIGHLY DIFFERENTIATIVE GENE GROUPS FROM MICROARRAY GENE EXPRESSION DATA USING THE GENE CLUB APPROACH

2005 ◽  
Vol 03 (06) ◽  
pp. 1263-1280 ◽  
Author(s):  
SHIHONG MAO ◽  
GUOZHU DONG

Motivation: It is commonly believed that suitable analysis of microarray gene expression profile data can lead to better understanding of diseases, and better ways to diagnose and treat diseases. To achieve those goals, it is of interest to discover the gene interaction networks, and perhaps even pathways, underlying given diseases from such data. In this paper, we consider methods for efficiently discovering highly differentiative gene groups (HDGG), which may provide insights on gene interaction networks. HDGGs are groups of genes which completely or nearly completely characterize the diseased or normal tissues. Discovering HDGGs is challenging, due to the high dimensionality of the data. Results: Our methods are based on the novel concept of gene clubs. A gene club consists of a set of genes having high potential to be interactive with each other. The methods can (i) efficiently discover signature HDGGs which completely characterize the diseased and the normal tissues respectively, (ii) find strongest or near strongest HDGGs containing any given gene, and (iii) find much stronger HDGGs than previous methods. As part of the experimental evaluation, the methods are applied to colon, prostate, ovarian, and breast cancer, and leukemia and so on. Some of the genes in the extracted signature HDGGs have known biological functions, and some have attracted little attention in biology and medicine. We hope that appropriate study on them can lead to medical breakthroughs. Some HDGGs for colon and prostate cancers are listed here. The website listed below contains HDGGs for the other cancers. Availability: HDGG is implemented in C++ and runs on Unix or Windows platform. The code is available at: .

Author(s):  
Qiang Zhao ◽  
Jianguo Sun

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.


Sign in / Sign up

Export Citation Format

Share Document