scholarly journals High throughput characterization of genetic effects on DNA:protein binding and gene transcription

2018 ◽  
Author(s):  
Cynthia A. Kalita ◽  
Christopher D. Brown ◽  
Andrew Freiman ◽  
Jenna Isherwood ◽  
Xiaoquan Wen ◽  
...  

Many variants associated with complex traits are in non-coding regions, and contribute to phenotypes by disrupting regulatory sequences. To characterize these variants, we developed a streamlined protocol for a high-throughput reporter assay, BiT-STARR-seq (Biallelic Targeted STARR-seq), that identifies allele-specific expression (ASE) while accounting for PCR duplicates through unique molecular identifiers. We tested 75,501 oligos (43,500 SNPs) and identified 2,720 SNPs with significant ASE (FDR 10%). To validate disruption of binding as one of the mechanisms underlying ASE, we developed a new high throughput allele specific binding assay for NFKB-p50. We identified 2,951 SNPs with allele-specific binding (ASB) (FDR 10%); 173 of these SNPs also had ASE (OR=1.97, p-value=0.0006). Of variants associated with complex traits, 1,531 resulted in ASE and 1,662 showed ASB. For example, we characterized that the Crohn’s disease risk variant for rs3810936 increases NFKB binding and results in altered gene expression.

2014 ◽  
Author(s):  
Gregory A Moyerbrailean ◽  
Chris T Harvey ◽  
Cynthia A Kalita ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

Ongoing large experimental characterization is crucial to determine all regulatory sequences, yet we do not know which genetic variants in those regions are non-silent. Here, we present a novel analysis integrating sequence and DNase I footprinting data for 653 samples to predict the impact of a sequence change on transcription factor binding for a panel of 1,372 motifs. Most genetic variants in footprints (5,810,227) do not show evidence of allele-specific binding (ASB). In contrast, functional genetic variants predicted by our computational models are highly enriched for ASB (3,217 SNPs at 20% FDR). Comparing silent to functional non-coding genetic variants, the latter are 1.22-fold enriched for GWAS traits, have lower allele frequencies, and affect footprints more distal to promoters or active in fewer tissues. Finally, integration of the annotations into 18 GWAS meta-studies improves identification of likely causal SNPs and transcription factors relevant for complex traits.


2021 ◽  
Author(s):  
S. Sánchez-Ramírez ◽  
A. D. Cutter

ABSTRACTSummaryChanges to regulatory sequences account for important phenotypic differences between species and populations. In heterozygote individuals, regulatory polymorphism typically manifests as allele-specific expression (ASE) of transcripts. ASE data from inter-species and inter-population hybrids, in conjunction with expression data from the parents, can be used to infer regulatory changes in cis and trans throughout the genome. Improper data handling, however, can create problems of mapping bias and excessive loss of information, which are prone to arise unintentionally from the cumbersome pipelines with multiple dependencies that are common among current methods. Here, we introduce a new, selfcontained method implemented in Python that generates allele-specific expression counts from genotype-specific map alignments. Rather than assessing individual SNPs, our approach sorts and counts reads within a given homologous region by comparing individual read-mapping statistics from each parental alignment. Reads that are aligned ambiguously to both references are resolved proportionally to the allele-specific matching read counts or statistically using a binomial distribution. Using simulations, we show CompMap has low error rates in assessing regulatory divergence.AvailabilityThe Python code with examples and installation instructions is available on the GitHub repository https://github.com/santiagosnchez/[email protected] information


2019 ◽  
Author(s):  
Jennifer L Asimit ◽  
Daniel B Rainbow ◽  
Mary D Fortune ◽  
Nastasiya F Grinberg ◽  
Linda S Wicker ◽  
...  

AbstractThousands of genetic variants have been associated with human disease risk, but linkage disequilibrium (LD) hinders fine-mapping the causal variants. We show that stepwise regression, and, to a lesser extent, stochastic search fine mapping can mis-identify as causal, SNPs which jointly tag distinct causal variants. Frequent sharing of causal variants between immune-mediated diseases (IMD) motivated us to develop a computationally efficient multinomial fine-mapping (MFM) approach that borrows information between diseases in a Bayesian framework. We show that MFM has greater accuracy than single disease analysis when shared causal variants exist, and negligible loss of precision otherwise. Applying MFM to data from six IMD revealed causal variants undetected in individual disease analysis, including in IL2RA where we confirm functional effects of multiple causal variants using allele-specific expression in sorted CD4+ T cells from genotype-selected individuals. MFM has the potential to increase fine-mapping resolution in related diseases enabling the identification of associated cellular and molecular phenotypes.


2016 ◽  
Author(s):  
Elisha D Roberson

Unique Molecular Identifiers (UMIs) have been incorporated into RNA-Seq experiments to overcome issues with abundance estimation from samples that may have many PCR amplification cycles. However, the use of UMIs in many different types of sequencing experiments could be beneficial, including amplicon sequencing, ATAC-Seq, and ChIP-Seq. Furthermore, UMIs help to overcome artifacts in high-coverage DNA-Seq, and would enable more accurate RNA-Seq genotyping and allele-specific expression calculation. The main advantage of using UMIs is that identical molecules that are true PCR duplicates can be discerned from unique molecules with identical break points.


2019 ◽  
Author(s):  
Romuald Laso-Jadart ◽  
Kevin Sugier ◽  
Emmanuelle Petit ◽  
Karine Labadie ◽  
Pierre Peterlongo ◽  
...  

AbstractAllele-specific expression (ASE) is a widely studied molecular mechanism at cell, tissue and organism levels. Here, we extrapolated the concept of ASE to the population-scale (psASE), aggregating ASEs detected at smaller scales. We developed a novel approach to detect psASE based on metagenomic and metatranscriptomic data of environmental samples containing communities of organisms. This approach which measures the deviation between the frequency and the relative expression of biallelic loci, was applied on samples collected during the Tara Oceans expedition (2009-2013), in combination to new Oithona similis transcriptomes, a widespread marine copepod. Among a total of 25,768 single nucleotide variants (SNVs) of O. similis, 587 (2.3%) were targeted by psASE in at least one population. The distribution of SNVs targeted by psASE in different populations is significantly shaped by population genomic differentiation (p-value = 9.3×10−9), supporting a partial genetic control of psASE. To investigate the link between evolution and psASE, loci under selection were compared to loci under psASE. A significant amount of SNVs (0.6%) were targeted by both selection and psASE (p-values < 9.89×10−3), supporting the hypothesis that natural selection and ASE may lead to the same phenotype. Population-scale ASE offers new insights into the gene regulation control in populations and its link with natural selection.


2016 ◽  
Author(s):  
Elisha D Roberson

Unique Molecular Identifiers (UMIs) have been incorporated into RNA-Seq experiments to overcome issues with abundance estimation from samples that may have many PCR amplification cycles. However, the use of UMIs in many different types of sequencing experiments could be beneficial, including amplicon sequencing, ATAC-Seq, and ChIP-Seq. Furthermore, UMIs help to overcome artifacts in high-coverage DNA-Seq, and would enable more accurate RNA-Seq genotyping and allele-specific expression calculation. The main advantage of using UMIs is that identical molecules that are true PCR duplicates can be discerned from unique molecules with identical break points.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jing Xie ◽  
Tieming Ji ◽  
Marco A. R. Ferreira ◽  
Yahan Li ◽  
Bhaumik N. Patel ◽  
...  

Abstract Background High-throughput sequencing experiments, which can determine allele origins, have been used to assess genome-wide allele-specific expression. Despite the amount of data generated from high-throughput experiments, statistical methods are often too simplistic to understand the complexity of gene expression. Specifically, existing methods do not test allele-specific expression (ASE) of a gene as a whole and variation in ASE within a gene across exons separately and simultaneously. Results We propose a generalized linear mixed model to close these gaps, incorporating variations due to genes, single nucleotide polymorphisms (SNPs), and biological replicates. To improve reliability of statistical inferences, we assign priors on each effect in the model so that information is shared across genes in the entire genome. We utilize Bayesian model selection to test the hypothesis of ASE for each gene and variations across SNPs within a gene. We apply our method to four tissue types in a bovine study to de novo detect ASE genes in the bovine genome, and uncover intriguing predictions of regulatory ASEs across gene exons and across tissue types. We compared our method to competing approaches through simulation studies that mimicked the real datasets. The R package, BLMRM, that implements our proposed algorithm, is publicly available for download at https://github.com/JingXieMIZZOU/BLMRM. Conclusions We will show that the proposed method exhibits improved control of the false discovery rate and improved power over existing methods when SNP variation and biological variation are present. Besides, our method also maintains low computational requirements that allows for whole genome analysis.


2007 ◽  
Vol 282 (46) ◽  
pp. 33336-33345 ◽  
Author(s):  
Mario Renda ◽  
Ilaria Baglivo ◽  
Bonnie Burgess-Beusse ◽  
Sabrina Esposito ◽  
Roberto Fattorusso ◽  
...  

The DNA-binding protein CTCF (CCCTC binding factor) mediates enhancer blocking insulation at sites throughout the genome and plays an important role in regulating allele-specific expression at the Igf2/H19 locus and at other imprinted loci. Evidence is also accumulating that CTCF is involved in large scale organization of genomic chromatin. Although CTCF has 11 zinc fingers, we show here that only 4 of these are essential to strong binding and that they recognize a core 12-bp DNA sequence common to most CTCF sites. By deleting individual fingers and mutating individual sites, we determined the orientation of binding. Furthermore, we were able to identify the specific finger and its point of DNA interaction that are responsible for the loss of CTCF binding when CpG residues are methylated in the imprinted Igf2/H19 locus. This single interaction appears to be critical for allele-specific binding and insulation by CTCF.


2018 ◽  
Author(s):  
Min Wang ◽  
Timothy P Hancock ◽  
Amanda J. Chamberlain ◽  
Christy J. Vander Jagt ◽  
Jennie E Pryce ◽  
...  

AbstractBackgroundTopological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants.ResultsWe used genomic synteny to map TADs and CTCF binding motifs from humans, mice, dogs and macaques to the bovine genome. We found that our mapped TADs exhibited the same hallmark properties of those sourced from experimental data, such as housekeeping gene, tRNA genes, CTCF binding motifs, SINEs, H3K4me3 and H3K27ac. Then we showed that runs of genes with the same pattern of allele-specific expression (ASE) (either favouring paternal or maternal allele) were often located in the same TAD or between the same conserved CTCF binding motifs. Analyses of variance showed that when averaged across all bovine tissues tested, TADs explained 14% of ASE variation (standard deviation, SD: 0.056), while CTCF explained 27% (SD: 0.078). Furthermore, we showed that the quantitative trait loci (QTLs) associated with gene expression variation (eQTLs) or ASE variation (aseQTLs), which were identified from mRNA transcripts from 141 lactating cows’ white blood and milk cells, were highly enriched at putative bovine CTCF binding motifs. The most significant aseQTL and eQTL for each genic target were located within the same TAD as the gene more often than expected (Chi-Squared test P-value ≤ 0.001).ConclusionsOur results suggest that genomic synteny can be used to functionally annotate conserved transcriptional components, and provides a tool to reduce the search space for causative regulatory variants in the bovine genome.


Genetics ◽  
1995 ◽  
Vol 140 (4) ◽  
pp. 1389-1406 ◽  
Author(s):  
G I Patterson ◽  
K M Kubo ◽  
T Shroyer ◽  
V L Chandler

Abstract The b gene encodes a transcriptional regulator of the maize anthocyanin biosynthetic pathway. Certain b alleles participate in paramutation, an allele-specific interaction that heritably alters transcription. The moderately transcribed B' allele heritably reduces the transcription of the highly transcribed B-I allele in a B'/B-I heterozygote, such that the B-I allele becomes B'. To identify the cis-acting sequences required for paramutation, we used B' or B-I alleles to isolate intragenic recombinants with B-Peru, an allele that is insensitive to paramutation and has distinct tissue-specific regulation. Physical mapping of the recombinant alleles showed that most of the crossovers were in a small region near the 5' end of the b-transcribed region. Analysis of the recombinant alleles revealed that the ability to cause and respond to paramutation and the control of tissue-specific expression both localize to the 5' region of the gene. The 3' boundary of these functions lies just upstream of the translation initiation codon. The 5' boundary has been estimated to be no more than 0.1 cM further upstream (1-150 kb). Thus, sequences critical for paramutation lie upstream of the b coding sequences and may include transcriptional regulatory sequences.


Sign in / Sign up

Export Citation Format

Share Document