scholarly journals CancerMine: A literature-mined resource for drivers, oncogenes and tumor suppressors in cancer

2018 ◽  
Author(s):  
Jake Lever ◽  
Eric Y. Zhao ◽  
Jasleen Grewal ◽  
Martin R. Jones ◽  
Steven J. M. Jones

AbstractUnderstanding a mutation in cancer requires knowledge of the different roles that genes play in cancer as drivers, oncogenes and tumor suppressors. We present CancerMine, a high-quality text-mined knowledgebase that catalogues over 856 genes as drivers, 2,421 as oncogenes and 2,037 as tumor suppressors in 426 cancer types. We compile 3,485 genes that are not in the IntOGen resource of drivers and complement the Cancer Gene Census with 3,136 new genes identified as oncogenes and tumor suppressors. CancerMine provides a method for gene-centric clustering of cancer types illustrating genetic similarities between cancer types of different organs and was validated against data from the Cancer Genome Atlas (TCGA) project. Finally with 178 novel cancer gene mentions in publications each month, this resource will be updated monthly, pre-empting the need to manually curate the ever-increasing number of novel cancer associated genes. CancerMine is viewable through a web portal (http://bionlp.bcgsc.ca/cancermine/) and available for download (https://github.com/jakelever/cancermine).

mSystems ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Sara R. Selitsky ◽  
David Marron ◽  
Lisle E. Mose ◽  
Joel S. Parker ◽  
Dirk P. Dittmer

ABSTRACTEpstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n= 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P≤ 1.4 × 10−20) and diversity (P≤ 8.3 × 10−27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population.IMPORTANCEAround 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1235 ◽  
Author(s):  
Nicholas Borcherding ◽  
Nicholas L. Bormann ◽  
Andrew P. Voigt ◽  
Weizhou Zhang

Reverse-phase protein arrays (RPPAs) are a highthroughput approach to protein quantification utilizing antibody-based micro-to-nano scale dot blot. Within the Cancer Genome Atlas (TCGA), RPPAs were used to quantify over 200 proteins in 8,167 tumor and metastatic samples. Protein-level data has particular advantages in assessing putative prognostic or therapeutic targets in tumors. However, many of the available pipelines do not allow for the partitioning of clinical and RPPA information to make meaningful conclusions. We developed a cloud-based application, TRGAted to enable researchers to better examine patient survival based on single or multiple proteins across 31 cancer types in the TCGA. TRGAted contains up-to-date overall survival, disease-specific survival, disease-free interval and progression-free interval information. Furthermore, survival information for primary tumor samples can be stratified based on gender, age, tumor stage, histological type, and subtype, allowing for highly adaptive and intuitive user experience. The code and processed data are open sourced and available on github and contains a tutorial built into the application for assisting users.


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13776
Author(s):  
Yanxun Xu ◽  
Yitan Zhu ◽  
Peter Müller ◽  
Riten Mitra ◽  
Yuan Ji

The Cancer Genome Atlas (TCGA) generates comprehensive genomic data for thousands of patients over more than 20 cancer types. TCGA data are typically whole-genome measurements of multiple genomic features, such as DNA copy numbers, DNA methylation, and gene expression, providing unique opportunities for investigating cancer mechanism from multiple molecular and regulatory layers. We propose a Bayesian graphical model to systemically integrate multi-platform TCGA data for inference of the interactions between different genomic features either within a gene or between multiple genes. The presence or absence of edges in the graph indicates the presence or absence of conditional dependence between genomic features. The inference is restricted to genes within a known biological network, but can be extended to any sets of genes. Applying the model to the same genes using patient samples in two different cancer types, we identify network components that are common as well as different between cancer types. The examples and codes are available at https://www.ma.utexas.edu/users/yxu/software.html .


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Amélie Boichard ◽  
Scott M. Lippman ◽  
Razelle Kurzrock

AbstractAmplifications of oncogenic genes are often considered actionable. However, not all patients respond. Questions have therefore arisen regarding the degree to which amplifications, especially non-focal ones, mediate overexpression. We found that a subset of high-level gene amplifications (≥ 6 copies) (from The Cancer Genome Atlas database) was not over-expressed at the RNA level. Unexpectedly, focal amplifications were more frequently silenced than non-focal amplifications. Most non-focal amplifications were not silenced; therefore, non-focal amplifications, if over-expressed, may be therapeutically tractable. Furthermore, specific silencing of high-level focal or non-focal gene amplifications may explain resistance to drugs that target the relevant gene product.


2017 ◽  
pp. 1-13 ◽  
Author(s):  
Anshuman Panda ◽  
Anil Betigeri ◽  
Kalyanasundaram Subramanian ◽  
Jeffrey S. Ross ◽  
Dean C. Pavlick ◽  
...  

Purpose An association between mutational burden and response to immune checkpoint therapy has been documented in several cancer types. The potential for such a mutational burden threshold to predict response to immune checkpoint therapy was evaluated in several clinical datasets, where mutational burden was measured either by whole-exome sequencing or by using commercially available sequencing panels. Methods Whole-exome sequencing and RNA sequencing data of 33 solid cancer types from The Cancer Genome Atlas were analyzed to determine whether a robust immune checkpoint–activating mutation (iCAM) burden threshold associated with evidence of immune checkpoint activation exists in these cancers that may serve as a biomarker of response to immune checkpoint blockade therapy. Results We found that a robust iCAM threshold, associated with signatures of immune checkpoint activation, exists in eight of 33 solid cancers: melanoma, lung adenocarcinoma, colon adenocarcinoma, endometrial cancer, stomach adenocarcinoma, cervical cancer, estrogen receptor–positive/human epidermal growth factor receptor 2–negative breast cancer, and bladder-urothelial cancer. Tumors with a mutational burden higher than the threshold (iCAM positive) also had clear histologic evidence of lymphocytic infiltration. In published datasets of melanoma, lung adenocarcinoma, and colon cancer, patients with iCAM-positive tumors had significantly better response to immune checkpoint therapy compared with those with iCAM-negative tumors. Receiver operating characteristic analysis using The Cancer Genome Atlas predictions as the gold standard showed that iCAM-positive tumors are accurately identifiable using clinical sequencing assays, such as FoundationOne (Foundation Medicine, Cambridge, MA) or StrandAdvantage (Strand Life Sciences, Bangalore, India). Using the FoundationOne-derived threshold, an analysis of 113 melanoma tumors showed that patients with iCAM-positive disease have significantly better response to immune checkpoint therapy. iCAM-positive and iCAM-negative tumors have distinct mutation patterns and different immune microenvironments. Conclusion In eight solid cancers, a mutational burden threshold exists that may predict response to immune checkpoint blockade. This threshold is identifiable using available clinical sequencing assays.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1235 ◽  
Author(s):  
Nicholas Borcherding ◽  
Nicholas L. Bormann ◽  
Andrew P. Voigt ◽  
Weizhou Zhang

Reverse-phase protein arrays (RPPAs) are a highthroughput approach to protein quantification utilizing an antibody-based micro-to-nano scale dot blot. Within the Cancer Genome Atlas (TCGA), RPPAs were used to quantify over 200 proteins in 8,167 tumor or metastatic samples. This protein-level data has particular advantages in assessing putative prognostic or therapeutic targets in tumors. However, many of the available pipelines do not allow for the partitioning of clinical and RPPA information to make meaningful conclusions. We developed a cloud-based application, TRGAted to enable researchers to better examine survival based on single or multiple proteins across 31 cancer types in the TCGA. TRGAted contains up-to-date overall survival, disease-specific survival, disease-free interval and progression-free interval information. Furthermore, survival information for primary tumor samples can be stratified based on gender, age, tumor stage, histological type, and subtype, allowing for highly adaptive and intuitive user experience. The code and processed data is open sourced and available on github  and with a tutorial built into the application for assisting users.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Masakuni Serizawa ◽  
Maki Mizuguchi ◽  
Kenichi Urakami ◽  
Takeshi Nagashima ◽  
Keiichi Ohshima ◽  
...  

AbstractWith the emergence of next-generation sequencing (NGS)-based cancer gene panel tests in routine oncological practice in Japan, an easily interpretable cancer genome database of Japanese patients in which mutational profiles are unaffected by racial differences is needed to improve the interpretation of the detected gene alterations. Considering this, we constructed the first Japanese cancer genome database, called the Japanese version of the Cancer Genome Atlas (JCGA), which includes multiple tumor types. The database includes whole-exome sequencing data from 4907 surgically resected primary tumor samples obtained from 4753 Japanese patients with cancer and graphically provides genome information on 460 cancer-associated genes, including the 336 genes that are included in two NGS-based cancer gene panel tests approved by the Pharmaceuticals and Medical Devices Agency. Moreover, most of the contents of this database are written in Japanese; this not only helps physicians explain the results of NGS-based cancer gene panel tests but also enables patients and their families to obtain further information regarding the detected gene alterations.


2016 ◽  
Author(s):  
Alexandra R. Buckley ◽  
Kristopher A. Standish ◽  
Kunal Bhutani ◽  
Trey Ideker ◽  
Hannah Carter ◽  
...  

AbstractThe degree to which germline variation drives cancer development and shapes tumor phenotypes remains largely unexplored, possibly due to a lack of large scale publicly available germline data for a cancer cohort. Here we called germline variants on 9,618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. Due to the inherent noise we chose to remove all 614 amplified DNA samples, including all acute myeloid leukemia and virtually all ovarian cancer samples, from the final dataset. This study demonstrates how insufficient quality control can lead to false positive germlinetumor type associations and draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.Author SummaryCancer research to date has largely focused on genetic aberrations specific to tumor tissue. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. In this study we identify germline variants in 9,618 samples using raw germline exome data from The Cancer Genome Atlas (TCGA). There are substantial differences in the way exome sequence data was generated both across and within cancer types in TCGA. We observe that differences in sequence data generation introduced batch effects, or variation that is due to technical factors not true biological variation, in our variant data. Most notably, we observe that amplification of DNA prior to sequencing resulted in an excess of predicted damaging indel variants. We show how these batch effects can confound germline association analyses if not properly addressed. Our study highlights the difficulties of working with large public genomic datasets like TCGA where samples are collected over time and across data centers, and particularly cautions the use of amplified DNA samples for genetic association analyses.


2017 ◽  
Author(s):  
Xin Hu ◽  
Qianghu Wang ◽  
Floris Barthel ◽  
Ming Tang ◽  
Samirkumar Amin ◽  
...  

Fusion genes, particularly those involving kinases, have been demonstrated as drivers and are frequent therapeutic targets in cancer1. Here, we describe our results on detecting transcript fusions across 33 cancer types from The Cancer Genome Atlas (TCGA), totaling 9,966 cancer samples and 648 normal samples2. Preprocessing, including read alignment to both genome and transcriptome, and fusion detection were carried out using a uniform pipeline3. To validate the resultant fusions, we also called somatic structural variations for 561 cancers from whole genome sequencing data. A summary of the data used in this study is provided in Table S1. Our results can be accessed per our portal at http://www.tumorfusions.org.


Sign in / Sign up

Export Citation Format

Share Document