scholarly journals Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets

2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Emily Berger ◽  
Deniz Yorukoglu ◽  
Lillian Zhang ◽  
Sarah K. Nyquist ◽  
Alex K. Shalek ◽  
...  

Abstract Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10×  faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.

2019 ◽  
Vol 47 (21) ◽  
pp. e136-e136
Author(s):  
Natalia Blay ◽  
Eduard Casas ◽  
Iván Galván-Femenía ◽  
Jan Graffelman ◽  
Rafael de Cid ◽  
...  

Abstract Analysis of RNA sequencing (RNA-seq) data from related individuals is widely used in clinical and molecular genetics studies. Prediction of kinship from RNA-seq data would be useful for confirming the expected relationships in family based studies and for highlighting samples from related individuals in case-control or population based studies. Currently, reconstruction of pedigrees is largely based on SNPs or microsatellites, obtained from genotyping arrays, whole genome sequencing and whole exome sequencing. Potential problems with using RNA-seq data for kinship detection are the low proportion of the genome that it covers, the highly skewed coverage of exons of different genes depending on expression level and allele-specific expression. In this study we assess the use of RNA-seq data to detect kinship between individuals, through pairwise identity by descent (IBD) estimates. First, we obtained high quality SNPs after successive filters to minimize the effects due to allelic imbalance as well as errors in sequencing, mapping and genotyping. Then, we used these SNPs to calculate pairwise IBD estimates. By analysing both real and simulated RNA-seq data we show that it is possible to identify up to second degree relationships using RNA-seq data of even low to moderate sequencing depth.


2020 ◽  
Author(s):  
Elena Vigorito ◽  
Wei-Yu Lin ◽  
Colin Starr ◽  
Paul DW Kirk ◽  
Simon R White ◽  
...  

AbstractAvailable methods to detect molecular quantitative trait loci (QTL) require study individuals to be genotyped. Here, we describe BaseQTL, a Bayesian method that exploits allele-specific expression to map molecular QTL from sequencing reads even when no genotypes are available. When used with genotypes, BaseQTL has lower error rates and increased power compared with existing QTL mapping methods. Running without genotypes limits how many tests can be performed, but due to the proximity of QTL variants to gene bodies, the 2.8% of variants within a 100kB-window that could be tested, contained 26% of QTL variants detectable with genotypes. eQTL effect estimates were invariably consistent between analyses performed with and without genotypes. Often, sequencing data may be generated in absence of genotypes on patients and controls in differential expression studies, and we identified an apparent psoriasis-specific effect for GSTP1 in one such dataset, providing new insights into disease-dependent gene regulation.


2014 ◽  
Author(s):  
Patrick Deelen ◽  
Daria Zhernakova ◽  
Mark de Haan ◽  
Marijke van der Sijde ◽  
Marc Jan Bonder ◽  
...  

Given increasing numbers of RNA-seq samples in the public domain, we studied to what extent expression quantitative trait loci (eQTLs) and allele-specific expression (ASE) can be identified in public RNA-seq data while also deriving the genotypes from the RNA-seq reads. 4,978 human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though this data originated from many different laboratories, samples reflecting the same cell-type clustered together, suggesting that technical biases due to different sequencing protocols were limited. We derived genotypes from the RNA-seq reads and imputed non-coding variants. In a joint analysis on 1,262 samples combined, we identified cis-eQTLs effects for 8,034 unique genes. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become relevant for studying tissue-specific effects of rare pathogenic genetic variants.


2019 ◽  
Author(s):  
Natalia Blay ◽  
Eduard Casas ◽  
Iván Galván-Femenía ◽  
Jan Graffelman ◽  
Rafael de Cid ◽  
...  

AbstractAnalysis of RNA sequencing (RNA-seq) data from related individuals is widely used in clinical and molecular genetics studies. Sample labelling mistakes are estimated to affect more than 4% of published samples. Therefore, as a method of data quality control, a way to reconstruct pedigrees from RNA-seq data would be useful for confirming the expected relationships. Currently, reconstruction of pedigrees is based mainly on SNPs or microsatellites, obtained from genotyping arrays, whole genome sequencing and whole exome sequencing. Potential problems with using RNA-seq data for kinship detection are the low proportion of the genome that it covers, the highly skewed coverage of exons of different genes depending on expression level and allele-specific expression.In this study we assess the use of RNA-seq data to detect kinship between individuals, through pairwise identity-by-descent (IBD) estimates. First, we obtained high quality SNPs after successive filters to minimize the effects due to allelic imbalance as well as errors in sequencing, mapping and genotyping. Then, we used these SNPs to calculate pairwise IBD estimates. By analysing both real and simulated RNA-seq data we show that it is possible to identify up to second degree relationships using RNA-seq data of even low to moderate sequencing depth.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Joseph Tomlinson ◽  
Shawn W. Polson ◽  
Jing Qiu ◽  
Juniper A. Lake ◽  
William Lee ◽  
...  

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Asia Mendelevich ◽  
Svetlana Vinogradova ◽  
Saumya Gupta ◽  
Andrey A. Mironov ◽  
Shamil R. Sunyaev ◽  
...  

AbstractA sensitive approach to quantitative analysis of transcriptional regulation in diploid organisms is analysis of allelic imbalance (AI) in RNA sequencing (RNA-seq) data. A near-universal practice in such studies is to prepare and sequence only one library per RNA sample. We present theoretical and experimental evidence that data from a single RNA-seq library is insufficient for reliable quantification of the contribution of technical noise to the observed AI signal; consequently, reliance on one-replicate experimental design can lead to unaccounted-for variation in error rates in allele-specific analysis. We develop a computational approach, Qllelic, that accurately accounts for technical noise by making use of replicate RNA-seq libraries. Testing on new and existing datasets shows that application of Qllelic greatly decreases false positive rate in allele-specific analysis while conserving appropriate signal, and thus greatly improves reproducibility of AI estimates. We explore sources of technical overdispersion in observed AI signal and conclude by discussing design of RNA-seq studies addressing two biologically important questions: quantification of transcriptome-wide AI in one sample, and differential analysis of allele-specific expression between samples.


Genetics ◽  
2013 ◽  
Vol 195 (3) ◽  
pp. 1157-1166 ◽  
Author(s):  
Sandrine Lagarrigue ◽  
Lisa Martin ◽  
Farhad Hormozdiari ◽  
Pierre-François Roux ◽  
Calvin Pan ◽  
...  

2018 ◽  
Author(s):  
Emad Bahrami-Samani ◽  
Yi Xing

AbstractGene expression is tightly regulated at the post-transcriptional level through splicing, transport, translation, and decay. RNA-binding proteins (RBPs) play key roles in post-transcriptional gene regulation, and genetic variants that alter RBP-RNA interactions can affect gene products and functions. We developed a computational method ASPRIN (Allele-Specific Protein-RNA Interaction), that uses a joint analysis of CLIP-seq (cross-linking and immunoprecipitation followed by high-throughput sequencing) and RNA-seq data to identify genetic variants that alter RBP-RNA interactions by directly observing the allelic preference of RBP from CLIP-seq experiments as compared to RNA-seq. We used ASPRIN to systematically analyze CLIP-seq and RNA-seq data for 166 RBPs in two ENCODE (Encyclopedia of DNA Elements) cell lines. ASPRIN identified genetic variants that alter RBP-RNA interactions by modifying RBP binding motifs within RNA. Moreover, through an integrative ASPRIN analysis with population-scale RNA-seq data, we showed that ASPRIN can help reveal potential causal variants that affect alternative splicing via allele-specific protein-RNA interactions.


2020 ◽  
Author(s):  
Ioan Filip ◽  
Rose Orenbuch ◽  
Junfei Zhao ◽  
Gulam Manji ◽  
Evangelina López de Maturana ◽  
...  

AbstractEfficient presentation of aberrant peptide fragments by the human leukocyte antigen class I (HLA-I) genes is necessary for immune detection and killing of cancer cells. Patient HLA-I genotypes are known to impact the efficacy of cancer immunotherapy, and the somatic loss of HLA-I heterozygosity has been established as a factor in immune evasion. While global deregulated expression of HLA-I has been reported in different tumor types, the role of HLA-I allele-specific expression loss – that is, the preferential RNA expression loss of specific HLA-I alleles – has not been fully characterized in cancer. In the present study, we quantified HLA-I allele-specific expression (ASE) across eleven TCGA tumor types using a novel method from input RNA and whole-exome sequencing data. Allele-specific loss in at least one of the three HLA-I genes (ASE loss) was pervasive and associated to worse overall survival across tumor types, including pancreatic adenocarcinomas, prostate carcinomas and glioblastomas, among others. In particular, our analysis shows that detection of neoantigens with binding affinity to the specific HLA-I genes subject to ASE loss was a top prognostic indicator of overall survival. Additionally, we found that ASE loss hindered immunotherapy in retrospective analyses. Together, these results highlight the prevalence of HLA-I ASE loss – a previously uncharacterized phenomenon in cancer – and provide initial evidence of its clinical significance in cancer prognosis and immunotherapy treatment.


2019 ◽  
Vol 116 (12) ◽  
pp. 5653-5658 ◽  
Author(s):  
Lin Shao ◽  
Feng Xing ◽  
Conghao Xu ◽  
Qinghua Zhang ◽  
Jian Che ◽  
...  

Utilization of heterosis has greatly increased the productivity of many crops worldwide. Although tremendous progress has been made in characterizing the genetic basis of heterosis using genomic technologies, molecular mechanisms underlying the genetic components are much less understood. Allele-specific expression (ASE), or imbalance between the expression levels of two parental alleles in the hybrid, has been suggested as a mechanism of heterosis. Here, we performed a genome-wide analysis of ASE by comparing the read ratios of the parental alleles in RNA-sequencing data of an elite rice hybrid and its parents using three tissues from plants grown under four conditions. The analysis identified a total of 3,270 genes showing ASE (ASEGs) in various ways, which can be classified into two patterns: consistent ASEGs such that the ASE was biased toward one parental allele in all tissues/conditions, and inconsistent ASEGs such that ASE was found in some but not all tissues/conditions, including direction-shifting ASEGs in which the ASE was biased toward one parental allele in some tissues/conditions while toward the other parental allele in other tissues/conditions. The results suggested that these patterns may have distinct implications in the genetic basis of heterosis: The consistent ASEGs may cause partial to full dominance effects on the traits that they regulate, and direction-shifting ASEGs may cause overdominance. We also showed that ASEGs were significantly enriched in genomic regions that were differentially selected during rice breeding. These ASEGs provide an index of the genes for future pursuit of the genetic and molecular mechanism of heterosis.


Sign in / Sign up

Export Citation Format

Share Document