Biased Data, Biased AI: Deep Networks Predict the Acquisition Site of TCGA Images

Abstract Deep learning models applied to healthcare applications including digital pathology have been increasing their scope and importance in recent years. Many of these models have been trained on The Cancer Genome Atlas (TCGA) atlas of digital images, or use it as a validation source. This study shows that there are tissue source site (tss) specific patterns of TCGA images that could be used to identify contributing institutions without any explicit training. Furthermore, it was observed that a model trained for cancer subtype classification has discovered such tss specific patterns within digital slides to classify cancer types. Digital scanner configuration and noise, tissue stain variation and artifacts, and source site patient demographics are among factors that likely account for the observed bias. Therefore, researchers should be cautious of such bias when using histopathology datasets for developing and training deep networks.

Download Full-text

Integrated Dissection of lncRNA-Perturbated Triplets Reveals Novel Prognostic Signatures Across Cancer Types

International Journal of Molecular Sciences ◽

10.3390/ijms21176087 ◽

2020 ◽

Vol 21 (17) ◽

pp. 6087

Author(s):

Yunzhen Wei ◽

Limeng Zhou ◽

Yingzhang Huang ◽

Dianjing Guo

Keyword(s):

Cancer Biology ◽

Noncoding Rna ◽

The Cancer Genome Atlas ◽

Dynamic Changes ◽

Computational Framework ◽

Potential Biomarker ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tcga Dataset ◽

Pan Cancer

Long noncoding RNA (lncRNA)/microRNA(miRNA)/mRNA triplets contribute to cancer biology. However, identifying significative triplets remains a major challenge for cancer research. The dynamic changes among factors of the triplets have been less understood. Here, by integrating target information and expression datasets, we proposed a novel computational framework to identify the triplets termed as “lncRNA-perturbated triplets”. We applied the framework to five cancer datasets in The Cancer Genome Atlas (TCGA) project and identified 109 triplets. We showed that the paired miRNAs and mRNAs were widely perturbated by lncRNAs in different cancer types. LncRNA perturbators and lncRNA-perturbated mRNAs showed significantly higher evolutionary conservation than other lncRNAs and mRNAs. Importantly, the lncRNA-perturbated triplets exhibited high cancer specificity. The pan-cancer perturbator OIP5-AS1 had higher expression level than that of the cancer-specific perturbators. These lncRNA perturbators were significantly enriched in known cancer-related pathways. Furthermore, among the 25 lncRNA in the 109 triplets, lncRNA SNHG7 was identified as a stable potential biomarker in lung adenocarcinoma (LUAD) by combining the TCGA dataset and two independent GEO datasets. Results from cell transfection also indicated that overexpression of lncRNA SNHG7 and TUG1 enhanced the expression of the corresponding mRNA PNMA2 and CDC7 in LUAD. Our study provides a systematic dissection of lncRNA-perturbated triplets and facilitates our understanding of the molecular roles of lncRNAs in cancers.

Download Full-text

Epstein-Barr Virus-Positive Cancers Show Altered B-Cell Clonality

mSystems ◽

10.1128/msystems.00081-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 11

Author(s):

Sara R. Selitsky ◽

David Marron ◽

Lisle E. Mose ◽

Joel S. Parker ◽

Dirk P. Dittmer

Keyword(s):

B Cells ◽

B Cell ◽

Epstein Barr Virus ◽

The Cancer Genome Atlas ◽

Barr Virus ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Epstein Barr ◽

Tumor Types ◽

Genome Atlas

ABSTRACTEpstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n= 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P≤ 1.4 × 10−20) and diversity (P≤ 8.3 × 10−27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population.IMPORTANCEAround 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.

Download Full-text

Cancer Type-Dependent Correlations betweenTP53Mutations and Antitumor Immunity

10.1101/692715 ◽

2019 ◽

Author(s):

Lin Li ◽

Mengyuan Li ◽

Xiaosheng Wang

Keyword(s):

Antitumor Immunity ◽

Colon Adenocarcinoma ◽

Joint Effect ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Mutation Burden ◽

Cancer Genome Atlas ◽

Negative Role ◽

Stomach Adenocarcinoma ◽

Cancer Types

AbstractMany studies have shown thatTP53mutations play a negative role in antitumor immunity. However, a few studies reported thatTP53mutations could promote antitumor immunity. To explain these contradictory findings, we analyzed five cancer cohorts from The Cancer Genome Atlas (TCGA) project. We found thatTP53-mutated cancers had significantly higher levels of antitumor immune signatures thanTP53-wildtype cancers in breast invasive carcinoma (BRCA) and lung adenocarcinoma (LUAD). In contrast,TP53-mutated cancers had significantly lower antitumor immune signature levels thanTP53-wildtype cancers in stomach adenocarcinoma (STAD), colon adenocarcinoma (COAD), and head and neck squamous cell carcinoma (HNSC). Moreover,TP53-mutated cancers likely had higher tumor mutation burden (TMB) and tumor aneuploidy level (TAL) thanTP53-wildtype cancers. However, the TMB differences were more marked betweenTP53-mutated andTP53-wildtype cancers than the TAL differences in BRCA and LUAD, and the TAL differences were more significant in STAD and COAD. Furthermore, we showed that TMB and TAL had a positive and a negative correlation with antitumor immunity and that TMB affected antitumor immunity more greatly than TAL did in BRCA and LUAD while TAL affected antitumor immunity more strongly than TMB in STAD and HNSC. These findings indicate that the distinct correlations betweenTP53mutations and antitumor immunity in different cancer types are a consequence of the joint effect of the altered TMB and TAL caused byTP53mutations on tumor immunity. Our data suggest that theTP53mutation status could be a useful biomarker for cancer immunotherapy response depending on cancer types.

Download Full-text

Pan-cancer analysis reveals complex tumor-specific alternative polyadenylation

10.1101/160960 ◽

2017 ◽

Author(s):

Zhuyi Xue ◽

René L Warren ◽

Ewan A Gibb ◽

Daniel MacMillan ◽

Johnathan Wong ◽

...

Keyword(s):

Alternative Polyadenylation ◽

Length Change ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Rna Seq ◽

Multiple Cancer ◽

Tissue Samples ◽

Cancer Genome Atlas ◽

Specific Alternative ◽

Cancer Types

AbstractAlternative polyadenylation (APA) of 3’ untranslated regions (3’ UTRs) has been implicated in cancer development. Earlier reports on APA in cancer primarily focused on 3’ UTR length modifications, and the conventional wisdom is that tumor cells preferentially express transcripts with shorter 3’ UTRs. Here, we analyzed the APA patterns of 114 genes, a select list of oncogenes and tumor suppressors, in 9,939 tumor and 729 normal tissue samples across 33 cancer types using RNA-Seq data from The Cancer Genome Atlas, and we found that the APA regulation machinery is much more complicated than what was previously thought. We report 77 cases (gene-cancer type pairs) of differential 3’ UTR cleavage patterns between normal and tumor tissues, involving 33 genes in 13 cancer types. For 15 genes, the tumor-specific cleavage patterns are recurrent across multiple cancer types. While the cleavage patterns in certain genes indicate apparent trends of 3’ UTR shortening in tumor samples, over half of the 77 cases imply 3’ UTR length change trends in cancer that are more complex than simple shortening or lengthening. This work extends the current understanding of APA regulation in cancer, and demonstrates how large volumes of RNA-seq data generated for characterizing cancer cohorts can be mined to investigate this process.

Download Full-text

Novel Insights into Epigenetic Regulation of IL6 Pathway: In Silico Perspective on Inflammation and Cancer Relationship

International Journal of Molecular Sciences ◽

10.3390/ijms221810172 ◽

2021 ◽

Vol 22 (18) ◽

pp. 10172

Author(s):

Saverio Candido ◽

Barbara Maria Rita Tomasello ◽

Alessandro Lavoro ◽

Luca Falzone ◽

Giuseppe Gattuso ◽

...

Keyword(s):

Dna Methylation ◽

Epigenetic Regulation ◽

Computational Analysis ◽

The Cancer Genome Atlas ◽

Post Translational Modifications ◽

Pathway Gene ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tumor Types

IL-6 pathway is abnormally hyperactivated in several cancers triggering tumor cell growth and immune system inhibition. Along with genomic mutation, the IL6 pathway gene expression can be affected by DNA methylation, microRNAs, and post-translational modifications. Computational analysis was performed on the Cancer Genome Atlas (TCGA) datasets to explore the role of IL6, IL6R, IL6ST, and IL6R transmembrane isoform expression and their epigenetic regulation in different cancer types. IL6 was significantly modulated in 70% of tumor types, revealing either up- or down-regulation in an approximately equal number of tumors. Furthermore, IL6R and IL6ST were downregulated in more than 10 tumors. Interestingly, the correlation analysis demonstrated that only the IL6R expression was negatively affected by the DNA methylation within the promoter region in most tumors. Meanwhile, only the IL6ST expression was extensively modulated by miRNAs including miR-182-5p, which also directly targeted all three genes. In addition, IL6 upregulated miR-181a-3p, mirR-214-3p, miR-18a-5p, and miR-938, which in turn inhibited the expression of IL6 receptors. Finally, the patients’ survival rate was significantly affected by analyzed targets in some tumors. Our results suggest the relevance of epigenetic regulation of IL6 signaling and pave the way for further studies to validate these findings and to assess the prognostic and therapeutic predictive value of these epigenetic markers on the clinical outcome and survival of cancer patients.

Download Full-text

A pan-cancer analysis of prognostic genes

PeerJ ◽

10.7717/peerj.1499 ◽

2016 ◽

Vol 3 ◽

pp. e1499 ◽

Cited By ~ 13

Author(s):

Jordan Anaya ◽

Brian Reon ◽

Wei-Min Chen ◽

Stefan Bekiranov ◽

Anindya Dutta

Keyword(s):

Microarray Data ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Cancer Pathogenesis ◽

Gene Sets ◽

Protective Genes ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Measure Of Association ◽

Pan Cancer

Numerous studies have identified prognostic genes in individual cancers, but a thorough pan-cancer analysis has not been performed. In addition, previous studies have mostly used microarray data instead of RNA-SEQ, and have not published comprehensive lists of associations with survival. Using recently available RNA-SEQ and clinical data from The Cancer Genome Atlas for 6,495 patients, we have investigated every annotated and expressed gene’s association with survival across 16 cancer types. The most statistically significant harmful and protective genes were not shared across cancers, but were enriched in distinct gene sets which were shared across certain groups of cancers. These groups of cancers were independently recapitulated by both unsupervised clustering of Cox coefficients (a measure of association with survival) for individual genes, and for gene programs. This analysis has revealed unappreciated commonalities among cancers which may provide insights into cancer pathogenesis and rationales for co-opting treatments between cancers.

Download Full-text

Pan-cancer driver copy number alterations identified by joint expression/CNA data analysis

Scientific Reports ◽

10.1038/s41598-020-74276-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Gaojianyong Wang ◽

Dimitris Anastassiou

Keyword(s):

Gene Expression ◽

Copy Number ◽

The Cancer Genome Atlas ◽

Copy Number Alterations ◽

Multiple Cancer ◽

Cancer Driver ◽

Large Gene ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Pan Cancer

Abstract Analysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.

Download Full-text

Identifying Interaction Clusters for MiRNA and MRNA Pairs in TCGA Network

Genes ◽

10.3390/genes10090702 ◽

2019 ◽

Vol 10 (9) ◽

pp. 702 ◽

Cited By ~ 6

Author(s):

Dai ◽

Ding ◽

Liu ◽

Xu ◽

Jiang ◽

...

Keyword(s):

Messenger Rna ◽

The Cancer Genome Atlas ◽

Biological Functions ◽

Significant Cluster ◽

Scoring Method ◽

Nonalcoholic Fatty Liver ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Liver Hepatocellular Carcinoma ◽

Tumor Types

Existing methods often fail to recognize the conversions for the biological roles of the pairs of genes and microRNAs (miRNAs) between the tumor and normal samples. We have developed a novel cluster scoring method to identify messenger RNA (mRNA) and miRNA interaction pairs and clusters while considering tumor and normal samples jointly. Our method has identified 54 significant clusters for 15 cancer types selected from The Cancer Genome Atlas project. We also determined the shared clusters across tumor types and/or subtypes. In addition, we compared gene and miRNA overlap between lists identified in our liver hepatocellular carcinoma (LIHC) study and regulatory relationships reported from human and rat nonalcoholic fatty liver disease studies (NAFLD). Finally, we analyzed biological functions for the single significant cluster in LIHC and uncovered a significantly enriched pathway (phospholipase D signaling pathway) with six genes represented in the cluster, symbols: DGKQ, LPAR2, PDGFRB, PIK3R3, PTGFR and RAPGEF3.

Download Full-text

Integrative Analysis of Cancer Omics Data for Prognosis Modeling

Genes ◽

10.3390/genes10080604 ◽

2019 ◽

Vol 10 (8) ◽

pp. 604 ◽

Cited By ~ 2

Author(s):

Wang ◽

Wu ◽

Keyword(s):

Integrative Analysis ◽

The Cancer Genome Atlas ◽

Cancer Prognosis ◽

Sufficient Information ◽

Cancer Type ◽

Multiple Cancer ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Information Borrowing ◽

Penalization Technique

Prognosis modeling plays an important role in cancer studies. With the development of omics profiling, extensive research has been conducted to search for prognostic markers for various cancer types. However, many of the existing studies share a common limitation by only focusing on a single cancer type and suffering from a lack of sufficient information. With potential molecular similarity across cancer types, one cancer type may contain information useful for the analysis of other types. The integration of multiple cancer types may facilitate information borrowing so as to more comprehensively and more accurately describe prognosis. In this study, we conduct marginal and joint integrative analysis of multiple cancer types, effectively introducing integration in the discovery process. For accommodating high dimensionality and identifying relevant markers, we adopt the advanced penalization technique which has a solid statistical ground. Gene expression data on nine cancer types from The Cancer Genome Atlas (TCGA) are analyzed, leading to biologically sensible findings that are different from the alternatives. Overall, this study provides a novel venue for cancer prognosis modeling by integrating multiple cancer types.

Download Full-text

Clinical and Prognostic Significance of TZAP Expression in Cervical Cancer

Medicina ◽

10.3390/medicina56050207 ◽

2020 ◽

Vol 56 (5) ◽

pp. 207

Author(s):

Won-Jin Park ◽

Jae-Hee Park ◽

Ho-Yong Shin ◽

Jae-Ho Lee

Keyword(s):

Cervical Cancer ◽

Survival Analysis ◽

Prognostic Value ◽

Prognostic Significance ◽

Telomeric Repeat ◽

The Cancer Genome Atlas ◽

Genetic Changes ◽

Cancer Pathogenesis ◽

Cancer Genome Atlas ◽

Cancer Types

Background and Objectives: Telomeric zinc finger-associated protein (TZAP) is a telomere-associated factor that was previously called ZBTB48. This protein binds preferentially to long telomeres, competing with telomeric repeat factors 1 and 2. Genetic changes in TZAP may be associated with cancer pathogenesis; however, this relationship has not yet been elucidated for any type of cancer. In this study, we aimed to examine the clinicopathologic and prognostic value of TZAP expression in cervical cancer (CC). Materials and Methods: The data were extracted from The Cancer Genome Atlas cohorts by OncoLnc (21 cancer types, 7700 cancers). The prognostic value of TZAP for different stages of 264 CCs was examined using survival analysis. Results: The TZAP expression did not differ significantly between CC and normal matched tissues. Age, cancer stage, and viral infection were not associated with TZAP expression. Survival analysis revealed a shorter overall survival in CC patients with a lower TZAP expression (χ2 = 3.62, p = 0.057). The prognostic value of TZAP expression was greater in patients with N1 stage CC (χ2 = 5.64, p = 0.018). Conclusion: TZAP expression is a possible prognostic marker for CC, especially stage N1 CC.

Download Full-text