scholarly journals A Pan-Cancer Transcriptome Analysis Reveals Pervasive Regulation through Tumor-Associated Alternative Promoters

2017 ◽  
Author(s):  
Deniz Demircioğlu ◽  
Martin Kindermans ◽  
Tannistha Nandi ◽  
Engin Cukuroglu ◽  
Claudia Calabrese ◽  
...  

ABSTRACTMost human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. While the role of promoters as driver elements in cancer has been recognized, the contribution of alternative promoters to regulation of the cancer transcriptome remains largely unexplored. Here we infer active promoters using RNA-Seq data from 1,188 cancer samples with matched whole genome sequencing data. We find that alternative promoters are a major contributor to context-specific regulation of isoform expression and that alternative promoters are frequently deregulated in cancer, affecting known cancer-genes and novel candidates. Our study suggests that a highly dynamic landscape of active promoters shapes the cancer transcriptome, opening many opportunities to further explore the interplay of regulatory mechanism and noncoding somatic mutations with transcriptional aberrations in cancer.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Zhongbo Chen ◽  
◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
...  

AbstractKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.


2017 ◽  
Author(s):  
Rebecca Elyanow ◽  
Hsin-Ta Wu ◽  
Benjamin J. Raphael

AbstractStructural variation, including large deletions, duplications, inversions, translocations, and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (~5-10) DNA molecules ~50Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in a individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification – including two recent methods that also analyze linked-reads – on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes.


2020 ◽  
Vol 21 (18) ◽  
pp. 6562 ◽  
Author(s):  
Abigail L. Pfaff ◽  
Vivien J. Bubb ◽  
John P. Quinn ◽  
Sulev Koks

Long interspersed element-1 (LINE-1/L1s) contributes 17% of the human genome with more than 1 million elements present; however, fewer than 100 of these have evidence for being retrotransposition competent (RC). In addition to those RC-L1s present in the reference genome, there are a small number of known non-reference L1 insertions that are also retrotransposition competent. L1 activity, whether through the potentially detrimental effects of their mRNA or protein expression or somatic retrotransposition events, has been linked to several neurological conditions. The polymorphic nature of both reference and non-reference RC-L1s in terms of their presence or absence will result in individuals harboring a different combination of these elements and it is currently unknown if this type of germline variation contributes to the risk of neurological disease. Here, we utilized whole-genome sequencing data from 178 healthy controls and 372 Parkinson’s disease (PD) subjects from the Parkinson’s Progression Markers Initiative (PPMI) to investigate the role of RC-L1s in PD. In the PPMI cohort, we identified 22 reference and 50 non-reference polymorphic RC-L1 loci. Focusing on 16 highly active RC-L1 loci, an increased burden of these elements (≥9) was associated with PD (OR 1.25, 95% CI 1.03–1.51, p = 0.02). In addition, we identified significant associations of progression markers of PD and the burden of highly active RC-L1s. This study has identified a novel type of genetic element associated with PD risk and disease progression.


NAR Cancer ◽  
2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Chie Kikutake ◽  
Minako Yoshihara ◽  
Mikita Suyama

Abstract Cancer-related mutations have been mainly identified in protein-coding regions. Recent studies have demonstrated that mutations in non-coding regions of the genome could also be a risk factor for cancer. However, the non-coding regions comprise 98% of the total length of the human genome and contain a huge number of mutations, making it difficult to interpret their impacts on pathogenesis of cancer. To comprehensively identify cancer-related non-coding mutations, we focused on recurrent mutations in non-coding regions using somatic mutation data from COSMIC and whole-genome sequencing data from The Cancer Genome Atlas (TCGA). We identified 21 574 recurrent mutations in non-coding regions that were shared by at least two different samples from both COSMIC and TCGA databases. Among them, 580 candidate cancer-related non-coding recurrent mutations were identified based on epigenomic and chromatin structure datasets. One of such mutation was located in RREB1 binding site that is thought to interact with TEAD1 promoter. Our results suggest that mutations may disrupt the binding of RREB1 to the candidate enhancer region and increase TEAD1 expression levels. Our findings demonstrate that non-coding recurrent mutations and coding mutations may contribute to the pathogenesis of cancer.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Fatemeh Salabi ◽  
Hedieh Jafari ◽  
Shahrokh Navidpour ◽  
Ayeh Sadat Sadr

AbstractThe potential function of long non-coding RNAs in regulating neighbor protein-coding genes has attracted scientists’ attention. Despite the important role of lncRNAs in biological processes, a limited number of studies focus on non-model animal lncRNAs. In this study, we used a stringent step-by-step filtering pipeline and machine learning-based tools to identify the specific Androctonus crassicauda lncRNAs and analyze the features of predicted scorpion lncRNAs. 13,401 lncRNAs were detected using pipeline in A. crassicauda transcriptome. The blast results indicated that the majority of these lncRNAs sequences (12,642) have no identifiable orthologs even in closely related species and those considered as novel lncRNAs. Compared to lncRNA prediction tools indicated that our pipeline is a helpful approach to distinguish protein-coding and non-coding transcripts from RNA sequencing data of species without reference genomes. Moreover, analyzing lncRNA characteristics in A. crassicauda uncovered that lower protein-coding potential, lower GC content, shorter transcript length, and less number of isoform per gene are outstanding features of A. crassicauda lncRNAs transcripts.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ivy K. Kombe ◽  
Charles N. Agoti ◽  
Patrick K. Munywoki ◽  
Marc Baguelin ◽  
D. James Nokes ◽  
...  

AbstractRespiratory syncytial virus (RSV) is responsible for a significant burden of severe acute lower respiratory tract illness in children under 5 years old; particularly infants. Prior to rolling out any vaccination program, identification of the source of infant infections could further guide vaccination strategies. We extended a dynamic model calibrated at the individual host level initially fit to social-temporal data on shedding patterns to include whole genome sequencing data available at a lower sampling intensity. The study population was 493 individuals (55 aged < 1 year) distributed across 47 households, observed through one RSV season in coastal Kenya. We found that 58/97 (60%) of RSV-A and 65/125 (52%) of RSV-B cases arose from infection probably occurring within the household. Nineteen (45%) infant infections appeared to be the result of infection by other household members, of which 13 (68%) were a result of transmission from a household co-occupant aged between 2 and 13 years. The applicability of genomic data in studies of transmission dynamics is highly context specific; influenced by the question, data collection protocols and pathogen under investigation. The results further highlight the importance of pre-school and school-aged children in RSV transmission, particularly the role they play in directly infecting the household infant. These age groups are a potential RSV vaccination target group.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 631
Author(s):  
Maria Pina Concas ◽  
Massimiliano Cocca ◽  
Margherita Francescatto ◽  
Thomas Battistuzzi ◽  
Beatrice Spedicati ◽  
...  

To date, little is known about the role of olfactory receptor (OR) genes on smell performance. Thanks to the availability of whole-genome sequencing data of 802 samples, we identified 41 knockout (KO) OR genes (i.e., carriers of Loss of Function variants) and evaluated their effect on odor discrimination in 218 Italian individuals through recursive partitioning analysis. Furthermore, we checked the expression of these genes in human and mouse tissues using publicly available data and the presence of organ-related diseases in human KO (HKO) individuals for OR expressed in non-olfactory tissues (Fisher test). The recursive partitioning analysis showed that age and the high number (burden) of OR-KO genes impact the worsening of odor discrimination (p-value < 0.05). Human expression data showed that 33/41 OR genes are expressed in the olfactory system (OS) and 27 in other tissues. Sixty putative mouse homologs of the 41 humans ORs have been identified, 58 of which are expressed in the OS and 37 in other tissues. No association between OR-KO individuals and pathologies has been detected. In conclusion, our work highlights the role of the burden of OR-KO genes in worse odor discrimination.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Megan C. Bakeberg ◽  
Anastazja M. Gorecki ◽  
Abigail L. Pfaff ◽  
Madison E. Hoes ◽  
Sulev Kõks ◽  
...  

AbstractThe translocase of outer mitochondrial membrane 40 (TOMM40) ‘523’ polymorphism has previously been associated with age of Alzheimer’s disease onset and cognitive functioning in non-pathological ageing, but has not been explored as a candidate risk marker for cognitive decline in Parkinson’s disease (PD). Therefore, this longitudinal study investigated the role of the ‘523’ variant in cognitive decline in a patient cohort from the Parkinson’s Progression Markers Initiative. As such, a group of 368 people with PD were assessed annually for cognitive performance using multiple neuropsychological protocols, and were genotyped for the TOMM40 ‘523’ variant using whole-genome sequencing data. Covariate-adjusted generalised linear mixed models were utilised to examine the relationship between TOMM40 ‘523’ allele lengths and cognitive scores, while taking into account the APOE ε genotype. Cognitive scores declined over the 5-year study period and were lower in males than in females. When accounting for APOE ε4, the TOMM40 ‘523’ variant was not robustly associated with overall cognitive performance. However, in APOE ε3/ε3 carriers, who accounted for ~60% of the whole cohort, carriage of shorter ‘523’ alleles was associated with more severe cognitive decline in both sexes, while carriage of the longer alleles in females were associated with better preservation of global cognition and a number of cognitive sub-domains, and with a delay in progression to dementia. The findings indicate that when taken in conjunction with the APOE genotype, TOMM40 ‘523’ allele length is a significant independent determinant and marker for the trajectory of cognitive decline and risk of dementia in PD.


2020 ◽  
Author(s):  
Zhongbo Chen ◽  
David Zhang ◽  
Regina H. Reynolds ◽  
Emil K. Gustavsson ◽  
Sonia García Ruiz ◽  
...  

ABSTRACTKnowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript/s to be more abundant in Alzheimer’s disease with more severe tau and amyloid pathological burden. Thus, we demonstrate the importance of human-lineage-specific sequences in brain development and neurological disease. We release our annotation through vizER (https://snca.atica.um.es/browser/app/vizER).


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Yingnan Chen ◽  
Nan Hu ◽  
Huaitong Wu

Salix wilsonii is an important ornamental willow tree widely distributed in China. In this study, an integrated circular chloroplast genome was reconstructed for S. wilsonii based on the chloroplast reads screened from the whole-genome sequencing data generated with the PacBio RSII platform. The obtained pseudomolecule was 155,750 bp long and had a typical quadripartite structure, comprising a large single copy region (LSC, 84,638 bp) and a small single copy region (SSC, 16,282 bp) separated by two inverted repeat regions (IR, 27,415 bp). The S. wilsonii chloroplast genome encoded 115 unique genes, including four rRNA genes, 30 tRNA genes, 78 protein-coding genes, and three pseudogenes. Repetitive sequence analysis identified 32 tandem repeats, 22 forward repeats, two reverse repeats, and five palindromic repeats. Additionally, a total of 118 perfect microsatellites were detected, with mononucleotide repeats being the most common (89.83%). By comparing the S. wilsonii chloroplast genome with those of other rosid plant species, significant contractions or expansions were identified at the IR-LSC/SSC borders. Phylogenetic analysis of 17 willow species confirmed that S. wilsonii was most closely related to S. chaenomeloides and revealed the monophyly of the genus Salix. The complete S. wilsonii chloroplast genome provides an additional sequence-based resource for studying the evolution of organelle genomes in woody plants.


Sign in / Sign up

Export Citation Format

Share Document