Distinct signatures of codon and codon pair usage in 32 primary tumor types in the novel database CancerCoCoPUTs for cancer-specific codon usage

Abstract Background Gene expression is highly variable across tissues of multi-cellular organisms, influencing the codon usage of the tissue-specific transcriptome. Cancer disrupts the gene expression pattern of healthy tissue resulting in altered codon usage preferences. The topic of codon usage changes as they relate to codon demand, and tRNA supply in cancer is of growing interest. Methods We analyzed transcriptome-weighted codon and codon pair usage based on The Cancer Genome Atlas (TCGA) RNA-seq data from 6427 solid tumor samples and 632 normal tissue samples. This dataset represents 32 cancer types affecting 11 distinct tissues. Our analysis focused on tissues that give rise to multiple solid tumor types and cancer types that are present in multiple tissues. Results We identified distinct patterns of synonymous codon usage changes for different cancer types affecting the same tissue. For example, a substantial increase in GGT-glycine was observed in invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and mixed invasive ductal and lobular carcinoma (IDLC) of the breast. Change in synonymous codon preference favoring GGT correlated with change in synonymous codon preference against GGC in IDC and IDLC, but not in ILC. Furthermore, we examined the codon usage changes between paired healthy/tumor tissue from the same patient. Using clinical data from TCGA, we conducted a survival analysis of patients based on the degree of change between healthy and tumor-specific codon usage, revealing an association between larger changes and increased mortality. We have also created a database that contains cancer-specific codon and codon pair usage data for cancer types derived from TCGA, which represents a comprehensive tool for codon-usage-oriented cancer research. Conclusions Based on data from TCGA, we have highlighted tumor type-specific signatures of codon and codon pair usage. Paired data revealed variable changes to codon usage patterns, which must be considered when designing personalized cancer treatments. The associated database, CancerCoCoPUTs, represents a comprehensive resource for codon and codon pair usage in cancer and is available at https://dnahive.fda.gov/review/cancercocoputs/. These findings are important to understand the relationship between tRNA supply and codon demand in cancer states and could help guide the development of new cancer therapeutics.

Download Full-text

77 Prevalence of secondary immunotherapeutic targets in the absence of established immune biomarkers in solid tumors

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2021-sitc2021.077 ◽

2021 ◽

Vol 9 (Suppl 3) ◽

pp. A86-A86

Author(s):

Paul DePietro ◽

Mary Nesline ◽

Yong Hee Lee ◽

RJ Seager ◽

Erik Van Roey ◽

...

Keyword(s):

Gene Expression ◽

Lung Cancer ◽

Reference Population ◽

List Type ◽

Tumor Type ◽

Genomic Profiling ◽

Rna Seq ◽

Immune Biomarkers ◽

Cancer Types ◽

Immune Related Genes

BackgroundImmune checkpoint inhibitor-based therapies have achieved impressive success in the treatment of several cancer types. Predictive immune biomarkers, including PD-L1, MSI and TMB are well established as surrogate markers for immune evasion and tumor-specific neoantigens across many tumors. Positive detection across cancer types varies, but overall ~50% of patients test negative for these primary immune markers.1 In this study, we investigated the prevalence of secondary immune biomarkers outside of PD-L1, TMB and MSI.MethodsComprehensive genomic and immune profiling, including PD-L1 IHC, TMB, MSI and gene expression of 395 immune related genes was performed on 6078 FFPE tumors representing 34 cancer types, predominantly composed of lung cancer (36.7%), colorectal cancer (11.9%) and breast cancer (8.5%). Expression levels by RNA-seq of 36 genes targeted by immunotherapies in solid tumor clinical trials, identified as secondary immune biomarkers, were ranked against a reference population. Genes with a rank value ≥75th percentile were considered high and values were associated with PD-L1 (positive ≥1%), MSI (MSI-H or MSS) and TMB (high ≥10 Mut/Mb) status. Additionally, secondary immune biomarker status was segmented by tumor type and cancer immune cycle roles.ResultsIn total, 41.0% of cases were PD-L1+, 6.4% TMB+, and 0.1% MSI-H. 12.6% of cases were positive for >2 of these markers while 39.9% were triple negative (PD-L1-/TMB-/MSS). Of the PD-L1-/TMB-/MSS cases, 89.1% were high for at least one secondary immune biomarker, with 69.3% having ≥3 markers. PD-L1-/TMB-/MSS tumor types with ≥50% prevalence of high secondary immune biomarkers included brain, prostate, kidney, sarcoma, gallbladder, breast, colorectal, and liver cancer. High expression of cancer testis antigen secondary immune biomarkers (e.g., NY-ESO-1, LAGE-1A, MAGE-A4) was most commonly observed in bladder, ovarian, sarcoma, liver, and prostate cancer (≥15%). Tumors demonstrating T-cell priming (e.g., CD40, OX40, CD137), trafficking (e.g., TGFB1, TLR9, TNF) and/or recognition (e.g., CTLA4, LAG3, TIGIT) secondary immune biomarkers were most represented by kidney, gallbladder, and sarcoma (≥40%), with melanoma, esophageal, head & neck, cervical, stomach, and lung cancer least represented (≥15%).ConclusionsOur studies show comprehensive tumor profiling that includes gene expression can detect secondary immune biomarkers targeted by investigational therapies in ~90% of PD-L1-/TMB-/MSS cases. While genomic profiling could also provide therapeutic choices for a percentage of these patients, detection of secondary immune biomarkers by RNA-seq provides additional options for patients without a clear therapeutic path as determined by PD-L1 testing and genomic profiling alone.ReferenceHuang R S P, Haberberger J, Severson E, et al. A pan-cancer analysis of PD-L1 immunohistochemistry and gene amplification, tumor mutation burden and microsatellite instability in 48,782 cases. Mod Pathol 2021;34: 252–263.

Download Full-text

Putative biomarkers for predicting tumor sample purity based on gene expression data

BMC Genomics ◽

10.1186/s12864-019-6412-8 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Yuanyuan Li ◽

David M. Umbach ◽

Adrienna Bingham ◽

Qi-Jing Li ◽

Yuan Zhuang ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Supervised Machine Learning ◽

Tumor Type ◽

Expression Data ◽

Expression Levels ◽

Gene Set ◽

Tumor Purity ◽

Tumor Types ◽

Cancerous Cells

Abstract Background Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. Methods We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. Results Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. Conclusions Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.

Download Full-text

Gene Expression Levels Are Correlated with Synonymous Codon Usage, Amino Acid Composition, and Gene Architecture in the Red Flour Beetle, Tribolium castaneum

Molecular Biology and Evolution ◽

10.1093/molbev/mss184 ◽

2012 ◽

Vol 29 (12) ◽

pp. 3755-3766 ◽

Cited By ~ 28

Author(s):

Anna Williford ◽

Jeffery P. Demuth

Keyword(s):

Gene Expression ◽

Amino Acid ◽

Codon Usage ◽

Amino Acid Composition ◽

Tribolium Castaneum ◽

Synonymous Codon ◽

Synonymous Codon Usage ◽

Red Flour Beetle ◽

Gene Architecture ◽

Gene Expression Levels

Download Full-text

Distribution of ETV6-NTRK3 translocations across neoplasms identified from the Mitelman Database.

Journal of Clinical Oncology ◽

10.1200/jco.2017.35.15_suppl.e23141 ◽

2017 ◽

Vol 35 (15_suppl) ◽

pp. e23141-e23141

Author(s):

Juan Carlos Malpartida ◽

Eric Vick ◽

Noah Hunter Richardson ◽

Kruti Patel ◽

Matthew K Stein ◽

...

Keyword(s):

Ductal Carcinoma ◽

Tumor Type ◽

Invasive Adenocarcinoma ◽

Data Set ◽

Oncogenic Potential ◽

Multiple Tumor ◽

Cancer Types ◽

Mitelman Database ◽

Tumor Types ◽

Tumor Profiling

e23141 Background: Discovered as a novel aberration in congenital fibrosarcoma (CF), the ETV6-NTRK3 translocation (EN) confers oncogenic potential and is inhibited by crizotinib. The present study aims to survey the scope of neoplasms that harbor EN across tumor types. Methods: Utilizing the National Cancer Institute’s Mitelman Database (MD) of Chromosome Aberrations and Gene Fusions patients (pts) were identified with EN and categorized based on tumor type, subtype and incidence. Cancer pts who received tumor profiling with Caris were also surveyed for EN. Results: 47 pts with EN across 12 cancer types were extracted from the MD and had median age of 0.17 years (7 unreported); 38% male; 51% acquired malignancies, 49% congenital; 62% cases were pediatric, 23% adult and 15% unknown. 0/204 pts with Caris tumor profiling were found to have an EN. Cancers with the highest number of EN were: 15 (31.9% EN data set) congenital mesoblastic nephromas (CMN), 10 (21.3%) CF, 7 (14.9%) breast carcinoma (BC; 6 secretory ductal carcinoma (SD) and 1 invasive adenocarcinoma (IA)) and 3 (6.4%) colorectal carcinoma (CRC). EN were found in 8 other malignancies (Table 1). Cancer types with the highest incidence of EN+ cases in the MD were gastrointestinal stromal tumor (GIST; 100%), CMN (75%) and CF (23.3%). Conclusions: These results further our understanding of the distribution of ETV6-NTRK3 translocations in multiple tumor types across the age spectrum and suggest that pts with CMN, CF, BC and CRC requiring high order therapy should be considered for NTRK3-based treatment. [Table: see text]

Download Full-text

Compelling Evidence Suggesting the Codon Usage of SARS-CoV-2 Adapts to Human After the Split From RaTG13

Evolutionary Bioinformatics ◽

10.1177/11769343211052013 ◽

2021 ◽

Vol 17 ◽

pp. 117693432110520

Author(s):

Yanping Zhang ◽

Xiaojie Jin ◽

Haiyan Wang ◽

Yaoyao Miao ◽

Xiaoping Yang ◽

...

Keyword(s):

Gene Expression ◽

Codon Usage ◽

Synonymous Codon ◽

Mrna Translation ◽

Synonymous Codon Usage ◽

Fast Decoding ◽

Rate Limiting Step ◽

Human Lungs ◽

Human Sample ◽

Rate Limiting

SARS-CoV-2 needs to efficiently make use of the resources from hosts in order to survive and propagate. Among the multiple layers of regulatory network, mRNA translation is the rate-limiting step in gene expression. Synonymous codon usage usually conforms with tRNA concentration to allow fast decoding during translation. It is acknowledged that SARS-CoV-2 has adapted to the codon usage of human lungs so that the virus could rapidly proliferate in the lung environment. While this notion seems to nicely explain the adaptation of SARS-CoV-2 to lungs, it is unable to tell why other viruses do not have this advantage. In this study, we retrieve the GTEx RNA-seq data for 30 tissues (belonging to over 17 000 individuals). We calculate the RSCU (relative synonymous codon usage) weighted by gene expression in each human sample, and investigate the correlation of RSCU between the human tissues and SARS-CoV-2 or RaTG13 (the closest coronavirus to SARS-CoV-2). Lung has the highest correlation of RSCU to SARS-CoV-2 among all tissues, suggesting that the lung environment is generally suitable for SARS-CoV-2. Interestingly, for most tissues, SARS-CoV-2 has higher correlations with the human samples compared with the RaTG13-human correlation. This difference is most significant for lungs. In conclusion, the codon usage of SARS-CoV-2 has adapted to human lungs to allow fast decoding and translation. This adaptation probably took place after SARS-CoV-2 split from RaTG13 because RaTG13 is less perfectly correlated with human. This finding depicts the trajectory of adaptive evolution from ancestral sequence to SARS-CoV-2, and also well explains why SARS-CoV-2 rather than other viruses could perfectly adapt to human lung environment.

Download Full-text

Predicting master transcription factors from pan-cancer expression data

10.1101/839142 ◽

2019 ◽

Cited By ~ 4

Author(s):

Jessica Reddy ◽

Marcos A. S. Fonseca ◽

Rosario I Corona ◽

Robbin Nameki ◽

Felipe Segato Dezem ◽

...

Keyword(s):

Transcription Factor ◽

Transcription Factors ◽

Regulatory Elements ◽

Ca 125 ◽

Tumor Type ◽

Expression Data ◽

Primary Tumors ◽

Cancer Types ◽

Tumor Types ◽

Pan Cancer

The function of critical developmental regulators can be subverted by cancer cells to control expression of oncogenic transcriptional programs. These "master transcription factors" (MTFs) are often essential for cancer cell survival and represent vulnerabilities that can be exploited therapeutically. The current approaches to identify candidate MTFs examine super-enhancer associated transcription factor-encoding genes with high connectivity in network models. This relies on chromatin immunoprecipitation-sequencing (ChIP-seq) data, which is technically challenging to obtain from primary tumors, and is currently unavailable for many cancer types and clinically relevant subtypes. In contrast, gene expression data are more widely available, especially for rare tumors and subtypes where MTFs have yet to be discovered. We have developed a predictive algorithm called CaCTS (Cancer Core Transcription factor Specificity) to identify candidate MTFs using pan-cancer RNA-sequencing data from The Cancer Genome Atlas. The algorithm identified 273 candidate MTFs across 34 tumor types and recovered known tumor MTFs. We also made novel predictions, including for cancer types and subtypes for which MTFs have not yet been characterized. Clustering based on MTF predictions reproduced anatomic groupings of tumors that share 1-2 lineage-specific candidates, but also dictated functional groupings, such as a squamous group that comprised five tumor subtypes sharing 3 common MTFs. PAX8, SOX17, and MECOM were candidate factors in high-grade serous ovarian cancer (HGSOC), an aggressive tumor type where the core regulatory circuit is currently uncharacterized. PAX8, SOX17, and MECOM are required for cell viability and lie proximal to super-enhancers in HGSOC cells. ChIP-seq revealed that these factors co-occupy HGSOC regulatory elements globally and co-bind at critical gene loci including MUC16 (CA-125). Addiction to these factors was confirmed in studies using THZ1 to inhibit transcription in HGSOC cells, suggesting early down-regulation of these genes may be responsible for cytotoxic effects of THZ1 on HGSOC models. Identification of MTFs across 34 tumor types and 140 subtypes, especially for those with limited understanding of transcriptional drivers paves the way to therapeutic targeting of MTFs in a broad spectrum of cancers.

Download Full-text

Quantitative Analysis of Differential Expression of HOX Genes in Multiple Cancers

Cancers ◽

10.3390/cancers12061572 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1572

Author(s):

Orit Adato ◽

Yaron Orenstein ◽

Juri Kopolovic ◽

Tamar Juven-Gershon ◽

Ron Unger

Keyword(s):

Gene Expression ◽

Quantitative Analysis ◽

Differential Expression ◽

Survival Data ◽

Hox Genes ◽

Expression Patterns ◽

The Cancer Genome Atlas ◽

Tumor Type ◽

Hox Gene ◽

Cancer Types

Transcription factors encoded by Homeobox (HOX) genes play numerous key functions during early embryonic development and differentiation. Multiple reports have shown that mis-regulation of HOX gene expression plays key roles in the development of cancers. Their expression levels in cancers tend to differ based on tissue and tumor type. Here, we performed a comprehensive analysis comparing HOX gene expression in different cancer types, obtained from The Cancer Genome Atlas (TCGA), with matched healthy tissues, obtained from Genotype-Tissue Expression (GTEx). We identified and quantified differential expression patterns that confirmed previously identified expression changes and highlighted new differential expression signatures. We discovered differential expression patterns that are in line with patient survival data. This comprehensive and quantitative analysis provides a global picture of HOX genes’ differential expression patterns in different cancer types.

Download Full-text

Differentially Methylated Super-Enhancers Regulate Target Gene Expression in Human Cancer

Scientific Reports ◽

10.1038/s41598-019-51018-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Emily L. Flam ◽

Ludmila Danilova ◽

Dylan Z. Kelley ◽

Elena Stavrovskaya ◽

Theresa Guo ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Target Genes ◽

Human Cancer ◽

Genetic Alterations ◽

Tumor Type ◽

Gene Pairs ◽

Aberrant Gene Expression ◽

Tumor Types ◽

Aberrant Gene

Abstract Current literature suggests that epigenetically regulated super-enhancers (SEs) are drivers of aberrant gene expression in cancers. Many tumor types are still missing chromatin data to define cancer-specific SEs and their role in carcinogenesis. In this work, we develop a simple pipeline, which can utilize chromatin data from etiologically similar tumors to discover tissue-specific SEs and their target genes using gene expression and DNA methylation data. As an example, we applied our pipeline to human papillomavirus-related oropharyngeal squamous cell carcinoma (HPV + OPSCC). This tumor type is characterized by abundant gene expression changes, which cannot be explained by genetic alterations alone. Chromatin data are still limited for this disease, so we used 3627 SE elements from public domain data for closely related tissues, including normal and tumor lung, and cervical cancer cell lines. We integrated the available DNA methylation and gene expression data for HPV + OPSCC samples to filter the candidate SEs to identify functional SEs and their affected targets, which are essential for cancer development. Overall, we found 159 differentially methylated SEs, including 87 SEs that actively regulate expression of 150 nearby genes (211 SE-gene pairs) in HPV + OPSCC. Of these, 132 SE-gene pairs were validated in a related TCGA cohort. Pathway analysis revealed that the SE-regulated genes were associated with pathways known to regulate nasopharyngeal, breast, melanoma, and bladder carcinogenesis and are regulated by the epigenetic landscape in those cancers. Thus, we propose that gene expression in HPV + OPSCC may be controlled by epigenetic alterations in SE elements, which are common between related tissues. Our pipeline can utilize a diversity of data inputs and can be further adapted to SE analysis of diseased and non-diseased tissues from different organisms.

Download Full-text

A molecular portrait of microsatellite instability across multiple cancers

10.1101/079152 ◽

2016 ◽

Author(s):

Isidro Cortes-Ciriano ◽

Sejoon Lee ◽

Woong-Yang Park ◽

Tae-Min Kim ◽

Peter J. Park

Keyword(s):

Microsatellite Instability ◽

High Sensitivity ◽

Tumor Type ◽

Sequencing Data ◽

Molecular Fingerprints ◽

Dna Mismatch ◽

Oncogenic Pathways ◽

Cancer Types ◽

Tumor Types ◽

Pan Cancer

ABSTRACTMicrosatellite instability (MSI) refers to the hypermutability of the cancer genome due to impaired DNA mismatch repair. Although MSI has been studied for decades, the large amount of sequencing data now available allows us to examine the molecular fingerprints of MSI in greater detail. Here, we analyze ~8000 exome and ~1000 whole-genome pairs across 23 cancer types. Our pan-cancer analysis reveals that the prevalence of MSI events is highly variable within and across tumor types including some in which MSI is not typically examined. We also identify genes in DNA repair and oncogenic pathways recurrently subject to MSI and uncover non-coding loci that frequently display MSI events. Finally, we propose an exomebased predictive model for the MSI phenotype that achieves high sensitivity and specificity. These results advance our understanding of the genomic drivers and consequences of MSI, and a comprehensive catalog of tumor-type specific MSI loci we have generated enables efficient panel-based MSI testing to identify patients who are likely to benefit from immunotherapy.

Download Full-text

Identifying common transcriptome signatures of cancer by interpreting deep learning models

10.1101/2021.11.11.467790 ◽

2021 ◽

Author(s):

Anupama Jha ◽

Mathieu Quesnel-Vallières ◽

Andrei Thomas-Tikhonenko ◽

Kristen W. Lynch ◽

Yoseph Barash

Keyword(s):

Solid Tumor ◽

Cancer Biology ◽

Splice Junction ◽

Cellular Functions ◽

Protein Coding ◽

Cancer Pathways ◽

Molecular Features ◽

Independent Test Dataset ◽

Cancer Types ◽

Tumor Types

Cancer is a set of diseases characterized by unchecked cell proliferation and invasion of surrounding tissues. The many genes that have been genetically associated with cancer or shown to directly contribute to oncogenesis vary widely between tumor types, but common gene signatures that relate to core cancer pathways have also been identified, signifying that cancer cases display common hallmark molecular features. It is not clear however whether there exist additional sets of genes or transcriptomic features that are less well known in cancer biology but that are also commonly deregulated across several cancer types. Here, in order to agnostically identify transcriptomic features that are commonly shared between cancer types, we used RNA-Seq datasets encompassing thousands of samples from 19 healthy tissue types and 18 solid tumor types to train three feed-forward neural networks, based either on protein-coding gene expression, lncRNA expression or splice junction use, to distinguish between healthy and tumor samples. All three models achieve high precision, recall and accuracy on test sets derived from 13 datasets used during training and on an independent test dataset, indicating that our models recognize transcriptome signatures that are consistent across tumors. Analysis of attribution values extracted from our models reveals that genes that are commonly altered in cancer by expression or splicing variations are under strong evolutionary and selective constraints, suggesting that they have important cellular functions. Importantly, we found that genes composing our cancer transcriptome signatures are not frequently affected by mutations or genomic alterations and that their functions differ widely from the genes genetically associated with cancer. Finally, our results also highlighted that deregulation of RNA-processing genes and aberrant splicing are pervasive features across a large array of solid tumor types. The transcriptomic features that we highlight here define cancer signatures that may reflect causal variations or consequences of disease state, or a combination of both.

Download Full-text