scholarly journals A statistical method for joint estimation of cis-eQTLs and parent-of-orign effects using an orthogonal framework with RNA-seq data

2019 ◽  
Author(s):  
Shirong Deng ◽  
Feifei Xiao

AbstractIn the past few years extensive studies have been put on the analysis of genome function, especially on expression quantitative trait loci (eQTL) which offered promise for characterization of the functional sequencing variation and for the understanding of the basic processes of gene regulation. However, most studies of eQTL mapping have not implemented models that allow for the non-equivalence of parental alleles as so-called parent-of-origin effects (POEs); thus, the number and effects of imprinted genes remain important open questions. Imprinting is a type of POE that the expression of certain genes depends on their allelic parent-of-origin which are important contributors to phenotypic variations, such as diabetes and many cancer types. Besides, multi-collinearity is an important issue arising from modeling multiple genetic effects. To address these challenges, we proposed a statistical framework to test the main allelic effects of the candidate eQTLs along with the POE with an orthogonal model for RNA sequencing (RNA-seq) data. Using simulations, we demonstrated the desirable power and Type I error of the orthogonal model which also achieved accurate estimation of the genetic effects and over-dispersion of the RNA-seq data. These methods were applied to an existing HapMap project trio dataset to validate the reported imprinted genes and to discovery novel imprinted genes. Using the orthogonal method, we validated existing imprinting genes and discovered two novel imprinting genes with significant dominance effect.Author SummaryIn the past decades, an unprecedented wealth of knowledge has been accumulated for understanding variations in human DNA level. However, this DNA-level knowledge has not been sufficiently translated to understanding the mechanisms of human diseases. Gene expression quantitative trait locus (eQTL) mapping is one of the most promising approaches to fill this gap, which aims to explore the genetic basis of gene expression. Genomic imprinting is an important epigenetic phenomenon which is an important contributor to phenotypic variation in human complex diseases and may explain some of the “hidden” heritable variability. Many imprinting genes are known to play important roles in human complex diseases such as diabetes, breast cancer and obesity. However, traditional eQTL mapping approaches does not allow for the detection of imprinting which is usually involved in gene expression imbalance. In this study, we have for the first time demonstrated the orthogonal statistical model can be applied to eQTL mapping for RNA sequencing (RNA-seq) data. We showed by simulated and real data that the orthogonal model outperformed the usual functional model for detecting main effects in most cases, which addressed the issue of confounding between the dominance and additive effects. Application of the statistical model to the HapMap data resulted in discovery of some potential eQTLs with imprinting effects and dominance effects on expression of RB1 and IGF1R genes.In summary, we developed a comprehensive framework for modeling imprinting effect for eQTL mapping, by decomposing the effects to multiple genetic components. This study is providing new insights into statistical modeling of eQTL mapping with RNA-seq data which allows for uncorrelated parameter estimation of genetic effects, covariates and over-dispersion parameter.

2018 ◽  
Author(s):  
Sahar V. Mozaffari ◽  
Michelle M. Stein ◽  
Kevin M. Magnaye ◽  
Dan L. Nicolae ◽  
Carole Ober

AbstractGenomic imprinting is the phenomena that leads to silencing of one copy of a gene inherited from a specific parent. Mutations in imprinted regions have been involved in diseases showing parent of origin effects. Identifying genes with evidence of parent of origin expression patterns in family studies allows the detection of more subtle imprinting. Here, we use allele specific expression in lymphoblastoid cell lines from 306 Hutterites related in a single pedigree to provide formal evidence for parent of origin effects. We take advantage of phased genotype data to assign parent of origin to RNA-seq reads in individuals with gene expression data. Our approach identified known imprinted genes, two putative novel imprinted genes, and 14 genes with asymmetrical parent of origin gene expression. We used gene expression in peripheral blood leukocytes (PBL) to validate our findings, and then confirmed imprinting control regions (ICRs) using DNA methylation levels in the PBLs.Author SummaryLarge scale gene expression studies have identified known and novel imprinted genes through allele specific expression without knowing the parental origins of each allele. Here, we take advantage of phased genotype data to assign parent of origin to RNA-seq reads in 306 individuals with gene expression data. We identified known imprinted genes as well as two novel imprinted genes in lymphoblastoid cell line gene expression. We used gene expression in PBLs to validate our findings, and DNA methylation levels in PBLs to confirm previously characterized imprinting control regions that could regulate these imprinted genes.


2020 ◽  
Vol 20 (2) ◽  
Author(s):  
Tyler Doughty ◽  
Eduard Kerkhoven

ABSTRACT Over the past decade, improvements in technology and methods have enabled rapid and relatively inexpensive generation of high-quality RNA-seq datasets. These datasets have been used to characterize gene expression for several yeast species and have provided systems-level insights for basic biology, biotechnology and medicine. Herein, we discuss new techniques that have emerged and existing techniques that enable analysts to extract information from multifactorial yeast RNA-seq datasets. Ultimately, this minireview seeks to inspire readers to query datasets, whether previously published or freshly obtained, with creative and diverse methods to discover and support novel hypotheses.


2017 ◽  
Author(s):  
Douglas R. Wilson ◽  
Wei Sun ◽  
Joseph G. Ibrahim

AbstractThe study of gene expression quantitative trait loci (eQTL) is an effective approach to illuminate the functional roles of genetic variants. Computational methods have been developed for eQTL mapping using gene expression data from microarray or RNA-seq technology. Application of these methods for eQTL mapping in tumor tissues is problematic because tumor tissues are composed of both tumor and infiltrating normal cells (e.g. immune cells) and eQTL effects may vary between tumor and infiltrating normal cells. To address this challenge, we have developed a new method for eQTL mapping using RNA-seq data from tumor samples. Our method separately estimates the eQTL effects in tumor and infiltrating normal cells using both total expression and allele-specific expression (ASE). We demonstrate that our method controls type I error rate and has higher power than some alternative approaches. We applied our method to study RNA-seq data from The Cancer Genome Atlas and illustrated the similarities and differences of eQTL effects in tumor and normal cells.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


2017 ◽  
Vol 7 (7) ◽  
pp. 2227-2234 ◽  
Author(s):  
Yasuaki Takada ◽  
Ryutaro Miyagi ◽  
Aya Takahashi ◽  
Toshinori Endo ◽  
Naoki Osada

Abstract Joint quantification of genetic and epigenetic effects on gene expression is important for understanding the establishment of complex gene regulation systems in living organisms. In particular, genomic imprinting and maternal effects play important roles in the developmental process of mammals and flowering plants. However, the influence of these effects on gene expression are difficult to quantify because they act simultaneously with cis-regulatory mutations. Here we propose a simple method to decompose cis-regulatory (i.e., allelic genotype), genomic imprinting [i.e., parent-of-origin (PO)], and maternal [i.e., maternal genotype (MG)] effects on allele-specific gene expression using RNA-seq data obtained from reciprocal crosses. We evaluated the efficiency of method using a simulated dataset and applied the method to whole-body Drosophila and mouse trophoblast stem cell (TSC) and liver RNA-seq data. Consistent with previous studies, we found little evidence of PO and MG effects in adult Drosophila samples. In contrast, we identified dozens and hundreds of mouse genes with significant PO and MG effects, respectively. Interestingly, a similar number of genes with significant PO effect were detect in mouse TSCs and livers, whereas more genes with significant MG effect were observed in livers. Further application of this method will clarify how these three effects influence gene expression levels in different tissues and developmental stages, and provide novel insight into the evolution of gene expression regulation.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Jenny Tung ◽  
Xiang Zhou ◽  
Susan C Alberts ◽  
Matthew Stephens ◽  
Yoav Gilad

Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hao Rong ◽  
Wenjing Yang ◽  
Haotian Zhu ◽  
Bo Jiang ◽  
Jinjin Jiang ◽  
...  

Abstract Background Genomic imprinting results in the expression of parent-of-origin-specific alleles in the offspring. Brassica napus is an oil crop with research values in polyploidization. Identification of imprinted genes in B. napus will enrich the knowledge of genomic imprinting in dicotyledon plants. Results In this study, we performed reciprocal crosses between B. napus L. cultivars Yangyou 6 (Y6) and Zhongshuang 11 (ZS11) to collect endosperm at 20 and 25 days after pollination (DAP) for RNA-seq. In total, we identified 297 imprinted genes, including 283 maternal expressed genes (MEGs) and 14 paternal expressed genes (PEGs) according to the SNPs between Y6 and ZS11. Only 36 genes (35 MEGs and 1 PEG) were continuously imprinted in 20 and 25 DAP endosperm. We found 15, 2, 5, 3, 10, and 25 imprinted genes in this study were also imprinted in Arabidopsis, rice, castor bean, maize, B. rapa, and other B. napus lines, respectively. Only 26 imprinted genes were specifically expressed in endosperm, while other genes were also expressed in root, stem, leaf and flower bud of B. napus. A total of 109 imprinted genes were clustered on rapeseed chromosomes. We found the LTR/Copia transposable elements (TEs) were most enriched in both upstream and downstream of the imprinted genes, and the TEs enriched around imprinted genes were more than non-imprinted genes. Moreover, the expression of 5 AGLs and 6 pectin-related genes in hybrid endosperm were significantly changed comparing with that in parent endosperm. Conclusion This research provided a comprehensive identification of imprinted genes in B. napus, and enriched the gene imprinting in dicotyledon plants, which would be useful in further researches on how gene imprinting regulates seed development.


2021 ◽  
Vol 118 (29) ◽  
pp. e2104445118
Author(s):  
Jessica A. Rodrigues ◽  
Ping-Hung Hsieh ◽  
Deling Ruan ◽  
Toshiro Nishimura ◽  
Manoj K. Sharma ◽  
...  

Parent-of-origin–dependent gene expression in mammals and flowering plants results from differing chromatin imprints (genomic imprinting) between maternally and paternally inherited alleles. Imprinted gene expression in the endosperm of seeds is associated with localized hypomethylation of maternally but not paternally inherited DNA, with certain small RNAs also displaying parent-of-origin–specific expression. To understand the evolution of imprinting mechanisms in Oryza sativa (rice), we analyzed imprinting divergence among four cultivars that span both japonica and indica subspecies: Nipponbare, Kitaake, 93-11, and IR64. Most imprinted genes are imprinted across cultivars and enriched for functions in chromatin and transcriptional regulation, development, and signaling. However, 4 to 11% of imprinted genes display divergent imprinting. Analyses of DNA methylation and small RNAs revealed that endosperm-specific 24-nt small RNA–producing loci show weak RNA-directed DNA methylation, frequently overlap genes, and are imprinted four times more often than genes. However, imprinting divergence most often correlated with local DNA methylation epimutations (9 of 17 assessable loci), which were largely stable within subspecies. Small insertion/deletion events and transposable element insertions accompanied 4 of the 9 locally epimutated loci and associated with imprinting divergence at another 4 of the remaining 8 loci. Correlating epigenetic and genetic variation occurred at key regulatory regions—the promoter and transcription start site of maternally biased genes, and the promoter and gene body of paternally biased genes. Our results reinforce models for the role of maternal-specific DNA hypomethylation in imprinting of both maternally and paternally biased genes, and highlight the role of transposition and epimutation in rice imprinting evolution.


2014 ◽  
Author(s):  
Chris Harvey ◽  
Gregory A Moyebrailean ◽  
Omar Davis ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

Expression quantitative trait loci (eQTL) studies have discovered thousands of genetic variants that regulate gene expression, enabling a better understanding of the functional role of non-coding sequences. However, eQTL studies are costly, requiring large sample sizes and genome-wide genotyping of each sample. In contrast, analysis of allele specific expression (ASE) is becoming a popular approach to detect the effect of genetic variation on gene expression, even within a single individual. This is typically achieved by counting the number of RNA-seq reads matching each allele at heterozygous sites and testing the null hypothesis of a 1:1 allelic ratio. In principle, when genotype information is not readily available it could be inferred from the RNA-seq reads directly. However, there are currently no existing methods that jointly infer genotypes and conduct ASE inference, while considering uncertainty in the genotype calls. We present QuASAR, Quantitative Allele Specific Analysis of Reads, a novel statistical learning method for jointly detecting heterozygous genotypes and inferring ASE. The proposed ASE inference step takes into consideration the uncertainty in the genotype calls while including parameters that model base-call errors in sequencing and allelic over-dispersion. We validated our method with experimental data for which high quality genotypes are available. Results for an additional dataset with multiple replicates at different sequencing depths demonstrate that QuASAR is a powerful tool for ASE analysis when genotypes are not available.


2018 ◽  
Author(s):  
Bharati Jadhav ◽  
Ramin Monajemi ◽  
Kristina K. Gagalova ◽  
Daniel Ho ◽  
Harmen H.M. Draisma ◽  
...  

AbstractCombining allelic analysis of RNA-Seq data with phased genotypes in family trios provides a powerful method to detect parent-of-origin biases in gene expression. We report findings in 296 family trios from two large studies: 165 lymphoblastoid cell lines from the 1000 Genomes Project, and 131 blood samples from the Genome of the Netherlands participants (GoNL). Based on parental haplotypes we identified >2.8 million transcribed heterozygous SNVs phased for parental origin, and developed a robust statistical framework for measuring allelic expression. We identified a total of 45 imprinted genes and one imprinted unannotated transcript, 17 of which have not previously been reported as showing parental expression bias. Multiple novel imprinted transcripts showing incomplete parental expression bias were located adjacent to known strongly imprinted genes. For example, PXDC1, a gene which lies adjacent to the paternally-expressed gene FAM50B, shows a 2:1 paternal expression bias. Other novel imprinted genes had promoter regions that coincide with sites of parentally-biased DNA methylation identified in blood from uniparental disomy (UPD) samples, thus providing independent validation of our results. Using the stranded nature of the RNA-Seq data in LCLs we identified multiple loci with overlapping sense/antisense transcripts, of which one is expressed paternally and the other maternally. Using a sliding window approach, we searched for imprinted expression across the entire genome, identifying a novel imprinted putative lncRNA in 13q21.2. Our methods and data provide a robust and high resolution map of imprinted gene expression in the human genome.


Sign in / Sign up

Export Citation Format

Share Document