scholarly journals MPRAnalyze - A statistical framework for Massively Parallel Reporter Assays

2019 ◽  
Author(s):  
Tal Ashuach ◽  
David Sebastian Fischer ◽  
Anat Kreimer ◽  
Nadav Ahituv ◽  
Fabian Theis ◽  
...  

AbstractMassively parallel reporter assays (MPRAs) are a technique that enables testing thousands of regulatory DNA sequences and their variants in a single, quantitative experiment. Despite growing popularity, there is lack of statistical methods that account for the different sources of uncertainty inherent to these assays, thus effectively leveraging their promise. Development of such methods could help enhance our ability to identify regulatory sequences in the genome, understand their function under various setting, and ultimately gain a better understanding of how the regulatory code and its alteration lead to phenotypic consequence.Here we present MPRAnalyze: a statistical framework dedicated to analyzing MPRA count data. MPRAnalyze addresses the major questions that are posed in the context of MPRA experiments: estimating the magnitude of the effect of a regulatory sequence in a single condition setting, and comparing differential activity of regulatory sequences across multiple conditions. The framework uses a nested construction of generalized linear models to account for uncertainty in both DNA and RNA observations, controls for various sources of unwanted variation, and incorporates negative controls for robust hypothesis testing, thereby providing clear quantitative answers in complex experimental settings.We demonstrate the robustness, accuracy and applicability of MPR-Analyze on simulated data and published data sets and compare it against the existing analysis methodologies. MPRAnalyze is implemented as an R package and is publicly available through Bioconductor [1].


2018 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

AbstractThe relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.



PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  


Author(s):  
Ingrid Lönnstedt ◽  
Rebecca Rimini ◽  
Peter Nilsson

In the exploding field of gene expression techniques such as DNA microarrays, there are still few general probabilistic methods for analysis of variance. Linear models and ANOVA are heavily used tools in many other disciplines of scientific research. The usual F-statistic is unsatisfactory for microarray data, which explore many thousand genes in parallel, with few replicates.We present three potential one-way ANOVA statistics in a parametric statistical framework. The aim is to separate genes that are differently regulated across several treatment conditions from those with equal regulation. The statistics have different features and are evaluated using both real and simulated data. Our statistic B1 generally shows the best performance, and is extended for use in an algorithm that groups cell lines by equal expression levels for each gene. An extension is also outlined for more general ANOVA tests including several factors.The methods presented are implemented in the freely available statistical language R. They are available at http://www.math.uu.se/staff/pages/?uname=ingrid.



BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Leslie Myint ◽  
Dimitrios G. Avramopoulos ◽  
Loyal A. Goff ◽  
Kasper D. Hansen


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Talitha L Forcier ◽  
Andalus Ayaz ◽  
Manraj S Gill ◽  
Daniel Jones ◽  
Rob Phillips ◽  
...  

Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs), but quantitatively measuring TF-DNA and TF-TF interactions remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. This strategy centers on the measurement and modeling of ‘allelic manifolds’, a multidimensional generalization of the classical genetics concept of allelic series. Allelic manifolds are measured using reporter assays performed on strategically designed cis-regulatory sequences. Quantitative biophysical models are then fit to the resulting data. We used this strategy to study regulation by two Escherichia coli TFs, CRP andσ70RNA polymerase. Doing so, we consistently obtained energetic measurements precise to∼0.1kcal/mol. We also obtained multiple results that deviate from the prior literature. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should therefore be highly scalable and broadly applicable.Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (<xref ref-type="decision-letter" rid="SA1">see decision letter</xref>).



2018 ◽  
Author(s):  
Talitha Forcier ◽  
Andalus Ayaz ◽  
Manraj S. Gill ◽  
Daniel Jones ◽  
Rob Phillips ◽  
...  

AbstractGene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs), but quantitatively measuring TF-DNA and TF-TF interactions remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. This strategy centers on the measurement and modeling of “allelic manifolds”, a multidimensional generalization of the classical genetics concept of allelic series. Allelic manifolds are measured using reporter assays performed on strategically designed cis-regulatory sequences. Quantitative biophysical models are then fit to the resulting data. We used this strategy to study regulation by twoEscherichia coliTFs, CRP and σ70RNA polymerase. Doing so, we consistently obtained energetic measurements precise to ~ 0.1 kcal/mol. We also obtained multiple results that deviate from the prior literature. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should therefore be highly scalable and broadly applicable.



2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Tal Ashuach ◽  
David S. Fischer ◽  
Anat Kreimer ◽  
Nadav Ahituv ◽  
Fabian J. Theis ◽  
...  


2017 ◽  
Author(s):  
Leslie Myint ◽  
Dimitrios G. Avramopoulos ◽  
Loyal A. Goff ◽  
Kasper D. Hansen

AbstractMassively parallel reporter assays (MPRAs) have emerged as a popular means for understanding noncoding variation in a variety of conditions. While a large number of experiments have been described in the literature, analysis typically uses ad-hoc methods. There has been little attention to comparing performance of methods across datasets.We present the mpralm method which we show is calibrated and powerful, by analyzing its performance on multiple MPRA datasets. We show that it outperforms existing statistical methods for analysis of this data type, in the first comprehensive evaluation of statistical methods on several datasets. We investigate theoretical and real-data properties of barcode summarization methods and show an unappreciated impact of summarization method for some datasets. Finally, we use our model to conduct a power analysis for this assay and show substantial improvements in power by performing up to 6 replicates per condition, whereas sequencing depth has smaller impact; we recommend to always use at least 4 replicates. Together, these results inform recommendations for differential analysis, general group comparisons, and power analysis and will help improve design and analysis of MPRA experiments. An R package is available from the Bioconductor project athttps://bioconductor.org/packages/mpra.



2021 ◽  
Author(s):  
Eeshit Dhaval Vaishnav ◽  
Carl G. de Boer ◽  
Moran Yassour ◽  
Jennifer Molinet ◽  
Lin Fan ◽  
...  

Mutations in non-coding cis-regulatory DNA sequences can alter gene expression, organismal phenotype, and fitness. Fitness landscapes, which map DNA sequence to organismal fitness, are a long-standing goal in biology, but have remained elusive because it is challenging to generalize accurately to the vast space of possible sequences using models built on measurements from a limited number of endogenous regulatory sequences. Here, we construct a sequence-to-expression model for such a landscape and use it to decipher principles of cis-regulatory evolution. Using tens of millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Sacccharomyces cerevisiae, we construct a deep transformer neural network model that generalizes with exceptional accuracy, and enables sequence design for gene expression engineering. Using our model, we predict and experimentally validate expression divergence under random genetic drift and strong selection weak mutation regimes, show that conflicting expression objectives in different environments constrain expression adaptation, and find that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for detecting selective constraint on gene expression using our model and natural sequence variation, and validate it using observed cis-regulatory diversity across 1,011 yeast strains, cross-species RNA-seq from three different clades, and measured expression-to-fitness curves. Finally, we develop a characterization of regulatory evolvability, use it to visualize fitness landscapes in two dimensions, discover evolvability archetypes, quantify the mutational robustness of individual sequences and highlight the mutational robustness of extant natural regulatory sequence populations. Our work provides a general framework that addresses key questions in the evolution of cis-regulatory sequences.



2015 ◽  
Author(s):  
Ilias Georgakopoulos-Soares ◽  
Naman Jain ◽  
Jesse Gray ◽  
Martin Hemberg

DNA regulatory elements contain short motifs where transcription factors (TFs) can bind to modulate gene expression. Although the broad principles of TF regulation are well understood, the rules that dictate how combinatorial TF binding translates into transcriptional activity remain largely unknown. With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as Massively Parallel Reporter Assays (MPRAs) and similar methods remains challenging. We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences therefore allowing the user to investigate the rules that govern TF occupancy. MPRA SNP design can be used to investigate the functional effects of single or combinations of SNPs at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs.



Sign in / Sign up

Export Citation Format

Share Document