scholarly journals Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts

2021 ◽  
Vol 89 (1) ◽  
pp. 76-89 ◽  
Author(s):  
Bernard Mulvey ◽  
Tomás Lagunas ◽  
Joseph D. Dougherty
PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

2018 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

AbstractThe relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.


Author(s):  
Zikun Yang ◽  
Chen Wang ◽  
Stephanie Erjavec ◽  
Lynn Petukhova ◽  
Angela Christiano ◽  
...  

Abstract Motivation Predicting regulatory effects of genetic variants is a challenging but important problem in functional genomics. Given the relatively low sensitivity of functional assays, and the pervasiveness of class imbalance in functional genomic data, popular statistical prediction models can sharply underestimate the probability of a regulatory effect. We describe here the presence-only model (PO-EN), a type of semisupervised model, to predict regulatory effects of genetic variants at sequence-level resolution in a context of interest by integrating a large number of epigenetic features and massively parallel reporter assays (MPRAs). Results Using experimental data from a variety of MPRAs we show that the presence-only model produces better calibrated predicted probabilities and has increased accuracy relative to state-of-the-art prediction models. Furthermore, we show that the predictions based on pretrained PO-EN models are useful for prioritizing functional variants among candidate eQTLs and significant SNPs at GWAS loci. In particular, for the costimulatory locus, associated with multiple autoimmune diseases, we show evidence of a regulatory variant residing in an enhancer 24.4 kb downstream of CTLA4, with evidence from capture Hi-C of interaction with CTLA4. Furthermore, the risk allele of the regulatory variant is on the same risk increasing haplotype as a functional coding variant in exon 1 of CTLA4, suggesting that the regulatory variant acts jointly with the coding variant leading to increased risk to disease. Availability and implementation The presence-only model is implemented in the R package ‘PO.EN’, freely available on CRAN. A vignette describing a detailed demonstration of using the proposed PO-EN model can be found on github at https://github.com/Iuliana-Ionita-Laza/PO.EN/ Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 40 (9) ◽  
pp. 1299-1313 ◽  
Author(s):  
Anat Kreimer ◽  
Zhongxia Yan ◽  
Nadav Ahituv ◽  
Nir Yosef

2020 ◽  
Vol 44 (7) ◽  
pp. 785-794
Author(s):  
Dandi Qiao ◽  
Corwin M. Zigler ◽  
Michael H. Cho ◽  
Edwin K. Silverman ◽  
Xiaobo Zhou ◽  
...  

2019 ◽  
Vol 15 ◽  
pp. P628-P628
Author(s):  
Karen Nuytemans ◽  
Derek J. van Booven ◽  
Natalia K. Hofmann ◽  
Farid Rajabli ◽  
Anthony J. Griswold ◽  
...  

PLoS ONE ◽  
2016 ◽  
Vol 11 (10) ◽  
pp. e0164169 ◽  
Author(s):  
Geik Yong Ang ◽  
Choo Yee Yu ◽  
Vinothini Subramaniam ◽  
Mohd Ikhmal Hanif Abdul Khalid ◽  
Tuan Azlin Tuan Abdu Aziz ◽  
...  

2019 ◽  
Author(s):  
Daniel Esposito ◽  
Jochen Weile ◽  
Jay Shendure ◽  
Lea M Starita ◽  
Anthony T Papenfuss ◽  
...  

AbstractMultiplex Assays of Variant Effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here we present MaveDB, a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first of these applications, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.


Sign in / Sign up

Export Citation Format

Share Document