scholarly journals Regulatory DNA in A. thaliana can tolerate high levels of sequence divergence

2017 ◽  
Author(s):  
C.M. Alexandre ◽  
J.R. Urton ◽  
K. Jean-Baptiste ◽  
M.W. Dorrity ◽  
J.C. Cuperus ◽  
...  

ABSTRACTVariation in regulatory DNA is thought to drive evolution. Cross-species comparisons of regulatory DNA have provided evidence for both weak purifying selection and substantial turnover in regulatory regions. However, disruption of transcription factor binding sites can affect the expression of neighboring genes. Thus, the base-pair level functional annotation of regulatory DNA has proven challenging. Here, we explore regulatory DNA variation and its functional consequences in genetically diverse strains of the plant Arabidopsis thaliana, which largely maintain the positional homology of regulatory DNA. Using chromatin accessibility to delineate regulatory DNA genome-wide, we find that 15% of approximately 50,000 regulatory sites varied in accessibility among strains. Some of these accessibility differences are associated with extensive underlying sequence variation, encompassing many deletions and dramatically hypervariable sequence. For the majority of such regulatory sites, nearby gene expression was similar, despite this large genetic variation. However, among all regulatory sites, those with both high levels of sequence variation and differential chromatin accessibility are the most likely to reside near genes with differential expression among strains. Unexpectedly, the vast majority of regulatory sites that differed in chromatin accessibility among strains show little variation in the underlying DNA sequence, implicating variation in upstream regulators.

2017 ◽  
Vol 35 (4) ◽  
pp. 837-854 ◽  
Author(s):  
Cristina M Alexandre ◽  
James R Urton ◽  
Ken Jean-Baptiste ◽  
John Huddleston ◽  
Michael W Dorrity ◽  
...  

AbstractVariation in regulatory DNA is thought to drive phenotypic variation, evolution, and disease. Prior studies of regulatory DNA and transcription factors across animal species highlighted a fundamental conundrum: Transcription factor binding domains and cognate binding sites are conserved, while regulatory DNA sequences are not. It remains unclear how conserved transcription factors and dynamic regulatory sites produce conserved expression patterns across species. Here, we explore regulatory DNA variation and its functional consequences within Arabidopsis thaliana, using chromatin accessibility to delineate regulatory DNA genome-wide. Unlike in previous cross-species comparisons, the positional homology of regulatory DNA is maintained among A. thaliana ecotypes and less nucleotide divergence has occurred. Of the ∼50,000 regulatory sites in A. thaliana, we found that 15% varied in accessibility among ecotypes. Some of these accessibility differences were associated with extensive, previously unannotated sequence variation, encompassing many deletions and ancient hypervariable alleles. Unexpectedly, for the majority of such regulatory sites, nearby gene expression was unaffected. Nevertheless, regulatory sites with high levels of sequence variation and differential chromatin accessibility were the most likely to be associated with differential gene expression. Finally, and most surprising, we found that the vast majority of differentially accessible sites show no underlying sequence variation. We argue that these surprising results highlight the necessity to consider higher-order regulatory context in evaluating regulatory variation and predicting its phenotypic consequences.


Genetics ◽  
2020 ◽  
Vol 215 (3) ◽  
pp. 569-578
Author(s):  
William K. Storck ◽  
Sabrina Z. Abdulla ◽  
Michael R. Rountree ◽  
Vincent T. Bicocca ◽  
Eric U. Selker

In chromatin, nucleosomes are composed of ∼146 bp of DNA wrapped around a histone octamer, and are highly dynamic structures subject to remodeling and exchange. Histone turnover has previously been implicated in various processes including the regulation of chromatin accessibility, segregation of chromatin domains, and dilution of histone marks. Histones in different chromatin environments may turnover at different rates, possibly with functional consequences. Neurospora crassa sports a chromatin environment that is more similar to that of higher eukaryotes than yeasts, which have been utilized in the past to explore histone exchange. We constructed a simple light-inducible system to profile histone exchange in N. crassa on a 3xFLAG-tagged histone H3 under the control of the rapidly inducible vvd promoter. After induction with blue light, incorporation of tagged H3 into chromatin occurred within 20 min. Previous studies of histone turnover involved considerably longer incubation periods and relied on a potentially disruptive change of medium for induction. We used this reporter to explore replication-independent histone turnover at genes and examine changes in histone turnover at heterochromatin domains in different heterochromatin mutant strains. In euchromatin, H3-3xFLAG patterns were almost indistinguishable from that observed in wild-type in all mutant backgrounds tested, suggesting that loss of heterochromatin machinery has little effect on histone turnover in euchromatin. However, turnover at heterochromatin domains increased with loss of trimethylation of lysine 9 of histone H3 or HP1, but did not depend on DNA methylation. Our reporter strain provides a simple yet powerful tool to assess histone exchange across multiple chromatin contexts.


2018 ◽  
Author(s):  
Tracy J. Ballinger ◽  
Britta Bouwman ◽  
Reza Mirzazadeh ◽  
Silvano Garnerone ◽  
Nicola Crosetto ◽  
...  

AbstractBackgroundStructural variants (SVs) are known to play important roles in a variety of cancers, but their origins and functional consequences are still poorly understood. Many SVs are thought to emerge via errors in the repair processes following DNA double strand breaks (DSBs) and previous studies have experimentally measured DSB frequencies across the genome in cell lines.ResultsUsing these data we derive the first quantitative genome-wide models of DSB susceptibility, based upon underlying chromatin and sequence features. These models are accurate and provide novel insights into the mutational mechanisms generating DSBs. Models trained in one cell type can be successfully applied to others, but a substantial proportion of DSBs appear to reflect cell type specific processes. Using model predictions as a proxy for susceptibility to DSBs in tumours, many SV enriched regions appear to be poorly explained by selectively neutral mutational bias alone. A substantial number of these regions show unexpectedly high SV breakpoint frequencies given their predicted susceptibility to mutation, and are therefore credible targets of positive selection in tumours. These putatively positively selected SV hotspots are enriched for genes previously shown to be oncogenic. In contrast, several hundred regions across the genome show unexpectedly low levels of SVs, given their relatively high susceptibility to mutation. These novel ‘coldspot’ regions appear to be subject to purifying selection in tumours and are enriched for active promoters and enhancers.ConclusionsWe conclude that models of DSB susceptibility offer a rigorous approach to the inference of SVs putatively subject to selection in tumours.


2019 ◽  
Author(s):  
Florian Schmidt ◽  
Alexander Marx ◽  
Marie Hebel ◽  
Martin Wegner ◽  
Nina Baumgarten ◽  
...  

AbstractUnderstanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIToutperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.


2019 ◽  
Author(s):  
Surag Nair ◽  
Daniel S. Kim ◽  
Jacob Perricone ◽  
Anshul Kundaje

AbstractMotivationGenome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks (CNNs) have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types.ResultsWe introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis and trans regulation of chromatin dynamics across 123 diverse cellular contexts.AvailabilityThe code is available athttps://github.com/kundajelab/[email protected]


2020 ◽  
Author(s):  
William R. Rice

Previous work found that the centromeric repeats of the Western European house mouse (Mus musculus domesticus) are composed predominantly of a 120 bp monomer that is shared by the X and autosomes. Polymorphism in length and sequence was also reported. Here I quantified the length and sequence polymorphism of the centromeric repeats found on the X and autosomes. The levels of local and global sequence variation were also compared. I found three length variants: a 64mer, 112mer and 120mer with relative frequencies of 2.4%, 8.6%, and 89%, respectively. There was substantial sequence variation within all three length variants with a rank-order of: 64mer < 120mer < 112mer. The 64mer was never found alone on long Sanger traces, and was arranged predominantly as a 176 bp higher-order repeat composed of a 64/112mer dimer. Reanalysis of archived ChIP-seq reads found that all three length variants were enriched with the foundational centromere protein CENP-A, but the enrichment was far higher for the 120mer. This pattern indicates that only the 120mer contributes substantially to the functional centromeres, i.e., to the kinetochore-binding, centric cores of the centromeric repeat arrays. Despite only moderate sequence divergence among random pairs of 120mers (averaging 5.9%), other measures of sequence diversity were exceptionally high: i) variant richness (numerical diversity) –on average, one new sequence variant was observed every 4th additional monomer randomly sampled (in N = 7.2 × 103 monomers), and ii) variant evenness –all of the nearly 2 × 103 observed sequence variants were at low frequency, with the most common variant having a frequency of only 5.7%. I next used long Sanger trace data from the Mouse Genome Project to assess the pattern of monomer diversity among neighboring 120mers. Unexpectedly, side-by-side monomers were rarely identical in sequence, and sequence divergence between these neighbors was nearly as high as that between random pairs taken from the genome-wide pool of all 120mers. I also used long Sanger traces to determine sequence variation among neighborhoods of 5 contiguous 120 bp monomers. Sequence diversity within these small regions typically spanned most of the entire range of that found genome-wide. Despite high sequence variation within these neighborhoods, the density of monomers with functional binding motifs for CENP-B (i.e., b-boxes with sequence NTTCGNNNNANNCGGGN) was strongly conserved at about 50%. The overarching pattern of monomer structure at the centromeric repeats of this subspecies is: i) high homogeneity in the density CENP-B binding sites, and ii) high heterogeneity in monomer sequence at both local and global levels.


2019 ◽  
Vol 35 (14) ◽  
pp. i108-i116 ◽  
Author(s):  
Surag Nair ◽  
Daniel S Kim ◽  
Jacob Perricone ◽  
Anshul Kundaje

Abstract Motivation Genome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types. Results We introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis- and trans-regulation of chromatin dynamics across 123 diverse cellular contexts. Availability and implementation The code is available at https://github.com/kundajelab/ChromDragoNN. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


2021 ◽  
Vol 22 (12) ◽  
pp. 6556
Author(s):  
Junjun Huang ◽  
Xiaoyu Li ◽  
Xin Chen ◽  
Yaru Guo ◽  
Weihong Liang ◽  
...  

ATP-binding cassette (ABC) transporter proteins are a gene super-family in plants and play vital roles in growth, development, and response to abiotic and biotic stresses. The ABC transporters have been identified in crop plants such as rice and buckwheat, but little is known about them in soybean. Soybean is an important oil crop and is one of the five major crops in the world. In this study, 255 ABC genes that putatively encode ABC transporters were identified from soybean through bioinformatics and then categorized into eight subfamilies, including 7 ABCAs, 52 ABCBs, 48 ABCCs, 5 ABCDs, 1 ABCEs, 10 ABCFs, 111 ABCGs, and 21 ABCIs. Their phylogenetic relationships, gene structure, and gene expression profiles were characterized. Segmental duplication was the main reason for the expansion of the GmABC genes. Ka/Ks analysis suggested that intense purifying selection was accompanied by the evolution of GmABC genes. The genome-wide collinearity of soybean with other species showed that GmABCs were relatively conserved and that collinear ABCs between species may have originated from the same ancestor. Gene expression analysis of GmABCs revealed the distinct expression pattern in different tissues and diverse developmental stages. The candidate genes GmABCB23, GmABCB25, GmABCB48, GmABCB52, GmABCI1, GmABCI5, and GmABCI13 were responsive to Al toxicity. This work on the GmABC gene family provides useful information for future studies on ABC transporters in soybean and potential targets for the cultivation of new germplasm resources of aluminum-tolerant soybean.


Sign in / Sign up

Export Citation Format

Share Document