scholarly journals Feasibility of predicting allele specific expression from DNA sequencing using machine learning

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Zhenhua Zhang ◽  
Freerk van Dijk ◽  
Niek de Klein ◽  
Mariëlle E van Gijn ◽  
Lude H Franke ◽  
...  

AbstractAllele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.

2009 ◽  
Vol 25 (24) ◽  
pp. 3207-3212 ◽  
Author(s):  
Jacob F. Degner ◽  
John C. Marioni ◽  
Athma A. Pai ◽  
Joseph K. Pickrell ◽  
Everlyne Nkadori ◽  
...  

2015 ◽  
Author(s):  
David A Knowles ◽  
Joe R Davis ◽  
Anil Raj ◽  
Xiaowei Zhu ◽  
James B Potash ◽  
...  

The impact of environment on human health is dramatic, with major risk factors including substance use, diet and exercise. However, identifying interactions between the environment and an individual's genetic background (GxE) has been hampered by statistical and computational challenges. By combining RNA sequencing of whole blood and extensive environmental annotations collected from 922 individuals, we have evaluated GxE interactions at a cellular level. We have developed EAGLE, a hierarchical Bayesian model for identifying GxE interactions based on association between environment and allele-specific expression (ASE). EAGLE increases power by leveraging the controlled, within-sample comparison of environmental impact on different genetic backgrounds provided by ASE, while also taking into account technical covariates and over-dispersion of sequencing read counts. EAGLE identifies 35 GxE interactions, a substantial increase over standard GxE testing. Among EAGLE hits are variants that modulate response to smoking, exercise and blood pressure medication. Further, application of EAGLE identifies GxE interactions to infection response that replicate results reported in vitro, demonstrating the power of EAGLE to accurately identify GxE candidates from large RNA sequencing studies.


PLoS Genetics ◽  
2020 ◽  
Vol 16 (5) ◽  
pp. e1008786 ◽  
Author(s):  
Jiaxin Fan ◽  
Jian Hu ◽  
Chenyi Xue ◽  
Hanrui Zhang ◽  
Katalin Susztak ◽  
...  

2016 ◽  
Vol 32 (21) ◽  
pp. 3291-3297 ◽  
Author(s):  
Zhi Liu ◽  
Tuantuan Gui ◽  
Zhen Wang ◽  
Hong Li ◽  
Yunhe Fu ◽  
...  

Author(s):  
Asia Mendelevich ◽  
Svetlana Vinogradova ◽  
Saumya Gupta ◽  
Andrey A. Mironov ◽  
Shamil Sunyaev ◽  
...  

RNA sequencing and other experimental methods that produce large amounts of data are increasingly dominant in molecular biology. However, the noise properties of these techniques have not been fully understood. We assessed the reproducibility of allele-specific expression measurements by conducting replicate sequencing experiments from the same RNA sample. Surprisingly, variation in the estimates of allelic imbalance (AI) between technical replicates was up to 7-fold higher than expected from commonly applied noise models. We show that AI overdispersion varies substantially between replicates and between experimental series, appears to arise during the construction of sequencing libraries, and can be measured by comparing technical replicates. We demonstrate that compensation for AI overdispersion greatly reduces technical variation and enables reliable differential analysis of allele-specific expression across samples and across experiments. Conversely, not taking AI overdispersion into account can lead to a substantial number of false positives in analysis of allele-specific gene expression


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
M. Joseph Tomlinson ◽  
Shawn W. Polson ◽  
Jing Qiu ◽  
Juniper A. Lake ◽  
William Lee ◽  
...  

AbstractDifferential abundance of allelic transcripts in a diploid organism, commonly referred to as allele specific expression (ASE), is a biologically significant phenomenon and can be examined using single nucleotide polymorphisms (SNPs) from RNA-seq. Quantifying ASE aids in our ability to identify and understand cis-regulatory mechanisms that influence gene expression, and thereby assist in identifying causal mutations. This study examines ASE in breast muscle, abdominal fat, and liver of commercial broiler chickens using variants called from a large sub-set of the samples (n = 68). ASE analysis was performed using a custom software called VCF ASE Detection Tool (VADT), which detects ASE of biallelic SNPs using a binomial test. On average ~ 174,000 SNPs in each tissue passed our filtering criteria and were considered informative, of which ~ 24,000 (~ 14%) showed ASE. Of all ASE SNPs, only 3.7% exhibited ASE in all three tissues, with ~ 83% showing ASE specific to a single tissue. When ASE genes (genes containing ASE SNPs) were compared between tissues, the overlap among all three tissues increased to 20.1%. Our results indicate that ASE genes show tissue-specific enrichment patterns, but all three tissues showed enrichment for pathways involved in translation.


Genetics ◽  
2013 ◽  
Vol 195 (3) ◽  
pp. 1157-1166 ◽  
Author(s):  
Sandrine Lagarrigue ◽  
Lisa Martin ◽  
Farhad Hormozdiari ◽  
Pierre-François Roux ◽  
Calvin Pan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document