Joint reconstruction of multiple gene networks by simultaneously capturing inter-tumor and intra-tumor heterogeneity

2020 ◽  
Vol 36 (9) ◽  
pp. 2755-2762
Author(s):  
Jia-Juan Tu ◽  
Le Ou-Yang ◽  
Hong Yan ◽  
Xiao-Fei Zhang ◽  
Hong Qin

Abstract Motivation Reconstruction of cancer gene networks from gene expression data is important for understanding the mechanisms underlying human cancer. Due to heterogeneity, the tumor tissue samples for a single cancer type can be divided into multiple distinct subtypes (inter-tumor heterogeneity) and are composed of non-cancerous and cancerous cells (intra-tumor heterogeneity). If tumor heterogeneity is ignored when inferring gene networks, the edges specific to individual cancer subtypes and cell types cannot be characterized. However, most existing network reconstruction methods do not simultaneously take inter-tumor and intra-tumor heterogeneity into account. Results In this article, we propose a new Gaussian graphical model-based method for jointly estimating multiple cancer gene networks by simultaneously capturing inter-tumor and intra-tumor heterogeneity. Given gene expression data of heterogeneous samples for different cancer subtypes, a non-cancerous network shared across different cancer subtypes and multiple subtype-specific cancerous networks are estimated jointly. Tumor heterogeneity can be revealed by the difference in the estimated networks. The performance of our method is first evaluated using simulated data, and the results indicate that our method outperforms other state-of-the-art methods. We also apply our method to The Cancer Genome Atlas breast cancer data to reconstruct non-cancerous and subtype-specific cancerous gene networks. Hub nodes in the networks estimated by our method perform important biological functions associated with breast cancer development and subtype classification. Availability and implementation The source code is available at https://github.com/Zhangxf-ccnu/NETI2. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Thin Nguyen ◽  
Samuel C. Lee ◽  
Thomas P. Quinn ◽  
Buu Truong ◽  
Xiaomei Li ◽  
...  

AbstractThe classification of clinical samples based on gene expression data is an important part of precision medicine. However, it has proved difficult to accurately predict survival outcomes and treatment responses for cancer patients. In this manuscript, we show how transforming gene expression data into a set of personalized (sample-specific) networks can allow us to harness existing graph-based methods to improve classifier performance. Existing approaches to personalized gene networks all have the limitation that they depend on other samples in the data and must get re-computed whenever a new sample is introduced. Here, we propose a novel method, called Personalized Annotation-based Networks (PAN), that avoids this limitation by using curated annotation databases to transform gene expression data into a graph. These databases organize genes into overlapping gene sets, called annotations, that we use to build a network where nodes represent functional terms and edges represent the similarity between them. Unlike competing methods, PANs are calculated for each sample independent of the population, making it a more efficient way to obtain single-sample networks. Using three breast cancer datasets as a case study (METABRIC and a super-set of GEO studies), we show that PAN classifiers not only predict cancer relapse better than gene features alone, but also outperform PPI and population-level graph-based classifiers. This work demonstrates the practical advantages of graph-based classification for high-dimensional genomic data, while offering a new approach to making sample-specific networks.Supplementary informationThe codes and data are available at https://github.com/thinng/[email protected]


2003 ◽  
Vol 19 (9) ◽  
pp. 1079-1089 ◽  
Author(s):  
G. Getz ◽  
H. Gal ◽  
I. Kela ◽  
D. A. Notterman ◽  
E. Domany

2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 1013-1013 ◽  
Author(s):  
Rene Natowicz ◽  
Tingting Jiang ◽  
Weiwei Shi ◽  
Yuan Qi ◽  
Yann Delpech ◽  
...  

1013 Background: The goal of this study was to develop a method to quantify intratumor heterogeneity of cancers using gene expression data. We compared gene expression heterogeneity between different molecular subtypes of breast cancer and between basal like cancers with or without pathologic complete response (pCR) to neoadjuvant chemotherapy. Methods: Affymetrix U133A gene expression data of 335 stage I-III breast cancers were analyzed. Molecular class was assigned using the PAM50 predictor. All patients received neoadjuvant chemotherapy. We measured tumor heterogeneity by the Gini index (GI) calculated individually for each case over the expression of all probe sets and random subsets. The GI was used as a metric of inequality of gene expression distributions between cases. The higher the GI, the greater the inequality of the expression distribution. Results: Basal like cancers (n=138) had greater heterogeneity than luminal cancers (n=197) (mean GI values 24.51 vs 23.05, p<0.001) and luminal B (n=71) cancers had greater heterogeneity compared to Luminal A (n=126) cancers (24.49 vs 22.25, p<0.001). Among the basal-like cancers, those with pCR (n=44) had significantly higher heterogeneity compared to cancers with residual disease (RD, n=94) (26.10 vs 23.77, p<0.001). Significant differences in GI between cancer subtypes remained for as low 2500 randomly selected probe sets. Conclusions: Breast cancer subtypes differ in intratumor gene expression heterogeneity. Greater degree of heterogeneity correlate with greater chemotherapy sensitivity. Importantly, among basal-like cancers only the heterogeneity metric differed significantly between cases with pCR or RD but not individual genes expression values or gene signatures.


2014 ◽  
Vol 678 ◽  
pp. 12-18
Author(s):  
Duo Wang

The clustering analysis of the cancer gene expression data can provide bases for the early diagnosis of cancer and accurate classification of the cancer subtypes. For the characteristics of cancer gene expression data, an algorithm named FNMF-ITWC (Fast Nonnegative Matrix Factorization Interrelated Two_Way Clustering) is proposed. FNMF-ITWC algorithm firstly selects genes from the original gene expression data, implements non-negative matrix factorization on the row (gene dimension), and then performs clustering on the column (sample dimension). Matlab experimental results show that FNMF-ITWC algorithm improves the computing speed of the algorithm and reduces the data storage space. At the same time, it is able to reveal correlation among genes under certain experimental conditions and the correlation among experiments for some genes.


2013 ◽  
pp. 1626-1641
Author(s):  
Anasua Sarkar ◽  
Ujjwal Maulik

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.


Author(s):  
Anasua Sarkar ◽  
Ujjwal Maulik

Identification of cancer subtypes is the central goal in the cancer gene expression data analysis. Modified symmetry-based clustering is an unsupervised learning technique for detecting symmetrical convex or non-convex shaped clusters. To enable fast automatic clustering of cancer tissues (samples), in this chapter, the authors propose a rough set based hybrid approach for modified symmetry-based clustering algorithm. A natural basis for analyzing gene expression data using the symmetry-based algorithm is to group together genes with similar symmetrical patterns of microarray expressions. Rough-set theory helps in faster convergence and initial automatic optimal classification, thereby solving the problem of unknown knowledge of number of clusters in gene expression measurement data. For rough-set-theoretic decision rule generation, each cluster is classified using heuristically searched optimal reducts to overcome overlapping cluster problem. The rough modified symmetry-based clustering algorithm is compared with another newly implemented rough-improved symmetry-based clustering algorithm and existing K-Means algorithm over five benchmark cancer gene expression data sets, to demonstrate its superiority in terms of validity. The statistical analyses are also performed to establish the significance of this rough modified symmetry-based clustering approach.


Sign in / Sign up

Export Citation Format

Share Document