scholarly journals LeafCutter: annotation-free quantification of RNA splicing

2016 ◽  
Author(s):  
Yang I Li ◽  
David A Knowles ◽  
Jack Humphrey ◽  
Alvaro N. Barbeira ◽  
Scott P. Dickinson ◽  
...  

AbstractThe excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable intron splicing events from short-read RNA-seq data and finds alternative splicing events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both for detecting differential splicing between sample groups, and for mapping splicing quantitative trait loci (sQTLs). Compared to contemporary methods, we find 1.4–2.1 times more sQTLs, many of which help us ascribe molecular effects to disease-associated variants. Strikingly, transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at 5% FDR by an average of 2.1-fold as compared to using gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available at https://github.com/davidaknowles/leafcutter.

2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Weitong Cui ◽  
Huaru Xue ◽  
Lei Wei ◽  
Jinghua Jin ◽  
Xuewen Tian ◽  
...  

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.


Blood ◽  
2018 ◽  
Vol 132 (12) ◽  
pp. 1225-1240 ◽  
Author(s):  
Andrea Pellagatti ◽  
Richard N. Armstrong ◽  
Violetta Steeples ◽  
Eshita Sharma ◽  
Emmanouela Repapi ◽  
...  

Key Points RNA-seq analysis of CD34+ cells identifies novel aberrantly spliced genes and dysregulated pathways in splicing factor mutant MDS. Aberrantly spliced isoforms predict MDS survival and implicate dysregulation of focal adhesion and exosomes as drivers of poor survival.


2019 ◽  
Vol 317 (1) ◽  
pp. H168-H180 ◽  
Author(s):  
Ali M. Tabish ◽  
Mohammed Arif ◽  
Taejeong Song ◽  
Zaher Elbeck ◽  
Richard C. Becker ◽  
...  

In this study, we investigated the role of DNA methylation [5-methylcytosine (5mC)] and 5-hydroxymethylcytosine (5hmC), epigenetic modifications that regulate gene activity, in dilated cardiomyopathy (DCM). A MYBPC3 mutant mouse model of DCM was compared with wild type and used to profile genomic 5mC and 5hmC changes by Chip-seq, and gene expression levels were analyzed by RNA-seq. Both 5mC-altered genes (957) and 5hmC-altered genes (2,022) were identified in DCM hearts. Diverse gene ontology and KEGG pathways were enriched for DCM phenotypes, such as inflammation, tissue fibrosis, cell death, cardiac remodeling, cardiomyocyte growth, and differentiation, as well as sarcomere structure. Hierarchical clustering of mapped genes affected by 5mC and 5hmC clearly differentiated DCM from wild-type phenotype. Based on these data, we propose that genomewide 5mC and 5hmC contents may play a major role in DCM pathogenesis. NEW & NOTEWORTHY Our data demonstrate that development of dilated cardiomyopathy in mice is associated with significant epigenetic changes, specifically in intronic regions, which, when combined with gene expression profiling data, highlight key signaling pathways involved in pathological cardiac remodeling and heart contractile dysfunction.


NAR Cancer ◽  
2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Julianne K David ◽  
Sean K Maden ◽  
Benjamin R Weeder ◽  
Reid F Thompson ◽  
Abhinav Nellore

Abstract This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon–exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon–exon junctions may have a substantial causal relationship with the biology of disease.


2020 ◽  
Vol 21 (24) ◽  
pp. 9378
Author(s):  
Yuzhe Sun ◽  
Min Xie ◽  
Zhou Xu ◽  
Koon Chuen Chan ◽  
Jia Yi Zhong ◽  
...  

Nitrogen fixation in soybean consumes a tremendous amount of energy, leading to substantial differences in energy metabolism and mitochondrial activities between nodules and uninoculated roots. While C-to-U RNA editing and intron splicing of mitochondrial transcripts are common in plant species, their roles in relation to nodule functions are still elusive. In this study, we performed RNA-seq to compare transcript profiles and RNA editing of mitochondrial genes in soybean nodules and roots. A total of 631 RNA editing sites were identified on mitochondrial transcripts, with 12% or 74 sites differentially edited among the transcripts isolated from nodules, stripped roots, and uninoculated roots. Eight out of these 74 differentially edited sites are located on the matR transcript, of which the degrees of RNA editing were the highest in the nodule sample. The degree of mitochondrial intron splicing was also examined. The splicing efficiencies of several introns in nodules and stripped roots were higher than in uninoculated roots. These include nad1 introns 2/3/4, nad4 intron 3, nad5 introns 2/3, cox2 intron 1, and ccmFc intron 1. A greater splicing efficiency of nad4 intron 1, a higher NAD4 protein abundance, and a reduction in supercomplex I + III2 were also observed in nodules, although the causal relationship between these observations requires further investigation.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
F Mazzarotto ◽  
U Tayal ◽  
R Buchan ◽  
W Midwinter ◽  
A Wilk ◽  
...  

Abstract Background Dilated cardiomyopathy (DCM) is genetically heterogeneous, with >100 purported disease genes tested in clinical laboratories. However, many genes were originally identified based on candidate-gene studies that did not adequately account for background population variation. Here we define the frequency of rare variation in 2538 DCM patients across protein-coding regions of 56 commonly tested genes and compare this to both 912 confirmed healthy controls and a reference population of 60,706 individuals. Purpose To identify clinically interpretable genes robustly associated with dominant monogenic DCM. Methods We used the TruSight Cardio sequencing panel to evaluate the burden of rare variants in 56 putative DCM genes in 1040 DCM patients and 912 healthy volunteers processed with identical sequencing and bioinformatics pipelines. We further aggregated data from 1498 DCM patients sequenced in diagnostic laboratories and the ExAC database for replication and meta-analysis. Results Specific variant classes in TTN, DSP, MYH7 and LMNA were associated with DCM in all comparisons. Variants in BAG3, TNNT2, TPM1, NEXN and VCL were significantly enriched specific patient subsets, with the last 3 genes likely contributing primarily to early-onset forms of DCM. Overall, rare variants in these 9 genes potentially explained 19–26% of cases. Whilst the absence of a significant excess in other genes cannot preclude a role in disease, such genes have limited diagnostic value since novel variants will be uninterpretable and therefore non-actionable, and their diagnostic yield is minimal. Conclusion In the largest sequenced DCM cohort yet described, we observe robust disease association only with a limited number of genes, highlighting their importance in DCM and translating into high interpretability in diagnostic testing. The other genes evaluated have limited value in diagnostic testing in DCM. This data will contribute to community gene curation efforts, and will reduce erroneous and inconclusive findings in diagnostic testing. Acknowledgement/Funding Wellcome Trust (107469/Z/15/Z), BHF (SP/10/10/28431), MRC (MR/M003191/1), Fondation Leducq (11-CVD01), Italian Ministry of Health (RF-2013-02356787)


2011 ◽  
Vol 7 (6) ◽  
pp. 896-898 ◽  
Author(s):  
Alison G. Scoville ◽  
Young Wha Lee ◽  
John H. Willis ◽  
John K. Kelly

Most natural populations display substantial genetic variation in behaviour, morphology, physiology, life history and the susceptibility to disease. A major challenge is to determine the contributions of individual loci to variation in complex traits. Quantitative trait locus (QTL) mapping has identified genomic regions affecting ecologically significant traits of many species. In nearly all cases, however, the importance of these QTLs to population variation remains unclear. In this paper, we apply a novel experimental method to parse the genetic variance of floral traits of the annual plant Mimulus guttatus into contributions of individual QTLs. We first use QTL-mapping to identify nine loci and then conduct a population-based breeding experiment to estimate V Q , the genetic variance attributable to each QTL. We find that three QTLs with moderate effects explain up to one-third of the genetic variance in the natural population. Variation at these loci is probably maintained by some form of balancing selection. Notably, the largest effect QTLs were relatively minor in their contribution to heritability.


2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29334 ◽  
Author(s):  
Jessica P. Hekman ◽  
Jennifer L Johnson ◽  
Anna V. Kukekova

Domesticated species occupy a special place in the human world due to their economic and cultural value. In the era of genomic research, domesticated species provide unique advantages for investigation of diseases and complex phenotypes. RNA sequencing, or RNA-seq, has recently emerged as a new approach for studying transcriptional activity of the whole genome, changing the focus from individual genes to gene networks. RNA-seq analysis in domesticated species may complement genome-wide association studies of complex traits with economic importance or direct relevance to biomedical research. However, RNA-seq studies are more challenging in domesticated species than in model organisms. These challenges are at least in part associated with the lack of quality genome assemblies for some domesticated species and the absence of genome assemblies for others. In this review, we discuss strategies for analyzing RNA-seq data, focusing particularly on questions and examples relevant to domesticated species.


2017 ◽  
Author(s):  
A. L. Richards ◽  
D. Watza ◽  
A. Findley ◽  
A. Alazizi ◽  
X. Wen ◽  
...  

AbstractEnvironmental perturbations have large effects on both organismal and cellular traits, including gene expression, but the extent to which the environment affects RNA processing remains largely uncharacterized. Recent studies have identified a large number of genetic variants associated with variation in RNA processing that also have an important role in complex traits; yet we do not know in which contexts the different underlying isoforms are used. Here, we comprehensively characterized changes in RNA processing events across 89 environments in five human cell types and identified 15,300 event shifts (FDR = 15%) comprised of eight event types in over 4,000 genes. Many of these changes occur consistently in the same direction across conditions, indicative of global regulation by trans factors. Accordingly, we demonstrate that environmental modulation of splicing factor binding predicts shifts in intron retention, and that binding of transcription factors predicts shifts in AFE usage in response to specific treatments. We validated the mechanism hypothesized for AFE in two independent datasets. Using ATAC-seq, we found altered binding of 64 factors in response to selenium at sites of AFE shift, including ELF2 and other factors in the ETS family. We also performed AFE QTL mapping in 373 individuals and found an enrichment for SNPs predicted to disrupt binding of the ELF2 factor. Together, these results demonstrate that RNA processing is dramatically changed in response to environmental perturbations through specific mechanisms regulated by trans factors.Author SummaryChanges in a cell’s environment and genetic variation have been shown to impact gene expression. Here, we demonstrate that environmental perturbations also lead to extensive changes in alternative RNA processing across a large number of cellular environments that we investigated. These changes often occur in a non-random manner. For example, many treatments lead to increased intron retention and usage of the downstream first exon. We also show that the changes to first exon usage are likely dependent on changes in transcription factor binding. We provide support for this hypothesis by considering how first exon usage is affected by disruption of binding due to treatment with selenium. We further validate the role of a specific factor by considering the effect of genetic variation in its binding sites on first exon usage. These results help to shed light on the vast number of changes that occur in response to environmental stimuli and will likely aid in understanding the impact of compounds to which we are daily exposed.


Sign in / Sign up

Export Citation Format

Share Document