scholarly journals Effective gene expression prediction from sequence by integrating long-range interactions

2021 ◽  
Author(s):  
Žiga Avsec ◽  
Vikram Agarwal ◽  
Daniel Visentin ◽  
Joseph R. Ledsam ◽  
Agnieszka Grabska-Barwinska ◽  
...  

AbstractThe next phase of genome biology research requires understanding how DNA sequence encodes phenotypes, from the molecular to organismal levels. How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequence through the use of a new deep learning architecture called Enformer that is able to integrate long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Notably, Enformer outperformed the best team on the critical assessment of genome interpretation (CAGI5) challenge for noncoding variant interpretation with no additional training. Furthermore, Enformer learned to predict promoter-enhancer interactions directly from DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of growing human disease associations to cell-type-specific gene regulatory mechanisms and provide a framework to interpret cis-regulatory evolution. To foster these downstream applications, we have made the pre-trained Enformer model openly available, and provide pre-computed effect predictions for all common variants in the 1000 Genomes dataset.One-sentence summaryImproved noncoding variant effect prediction and candidate enhancer prioritization from a more accurate sequence to expression model driven by extended long-range interaction modelling.

2021 ◽  
Vol 18 (10) ◽  
pp. 1196-1203 ◽  
Author(s):  
Žiga Avsec ◽  
Vikram Agarwal ◽  
Daniel Visentin ◽  
Joseph R. Ledsam ◽  
Agnieszka Grabska-Barwinska ◽  
...  

AbstractHow noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer–promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.


2019 ◽  
Vol 2019 ◽  
pp. 1-12
Author(s):  
Livia Eiselleova ◽  
Viktor Lukjanov ◽  
Simon Farkas ◽  
David Svoboda ◽  
Karel Stepka ◽  
...  

The eukaryotic nucleus is a highly complex structure that carries out multiple functions primarily needed for gene expression, and among them, transcription seems to be the most fundamental. Diverse approaches have demonstrated that transcription takes place at discrete sites known as transcription factories, wherein RNA polymerase II (RNAP II) is attached to the factory and immobilized while transcribing DNA. It has been proposed that transcription factories promote chromatin loop formation, creating long-range interactions in which relatively distant genes can be transcribed simultaneously. In this study, we examined long-range interactions between the POU5F1 gene and genes previously identified as being POU5F1 enhancer-interacting, namely, CDYL, TLE2, RARG, and MSX1 (all involved in transcriptional regulation), in human pluripotent stem cells (hPSCs) and their early differentiated counterparts. As a control gene, RUNX1 was used, which is expressed during hematopoietic differentiation and not associated with pluripotency. To reveal how these long-range interactions between POU5F1 and the selected genes change with the onset of differentiation and upon RNAP II inhibition, we performed three-dimensional fluorescence in situ hybridization (3D-FISH) followed by computational simulation analysis. Our analysis showed that the numbers of long-range interactions between specific genes decrease during differentiation, suggesting that the transcription of monitored genes is associated with pluripotency. In addition, we showed that upon inhibition of RNAP II, long-range associations do not disintegrate and remain constant. We also analyzed the distance distributions of these genes in the context of their positions in the nucleus and revealed that they tend to have similar patterns resembling normal distribution. Furthermore, we compared data created in vitro and in silico to assess the biological relevance of our results.


1938 ◽  
Vol 34 (2) ◽  
pp. 238-252 ◽  
Author(s):  
J. S. Wang

The statistical theory of long-range interactions between adsorbed particles on a plane lattice is worked out approximately, by treating in detail the distribution of adsorbed particles among a few sites inside and on the boundary of a circular region, and regarding the distribution outside the circle as uniform and continuous with a density Kθ per unit area, where K is the number of lattice points per unit area and θ is the fraction of surface covered by adsorbed particles. The continuous distribution begins at a distance ρ from the centre of the circle, ρ being determined by the condition that the probability of occupation of a first shell site is equal to the probability θ of occupation of the central site. Using this method, general formulae for the adsorption isotherm and the heat of adsorption are obtained. Numerical applications for dipole interactions and for quadratic and hexagonal lattices are worked out in detail and the case in which the dipole moment varies with θ is discussed.


2013 ◽  
Vol 27 (24) ◽  
pp. 1350143 ◽  
Author(s):  
MIRABEAU SAHA ◽  
TIMOLEON C. KOFANÉ

In this paper, the comparison between power-law long-range interaction and Kac–Baker long-range interaction in the DNA molecule is investigated. This is done by employing an extended version of spin-like model of the DNA molecule with long-range interaction between intra-strand nucleotides and helicoidal coupling between inter-strand nucleotides when an RNA-polymerase binds to the DNA at biological temperature. Results show that LRIs have an undeniable effect on the DNA dynamics and that one is free to use either PLLRI or KBLRI to study DNA behaviors.


Author(s):  
Ting XIE ◽  
Andrea Orbán ◽  
Xiaodong Xing ◽  
Eliane Luc-Koenig ◽  
Romain Vexiau ◽  
...  

Abstract Ultracold temperatures in dilute quantum gases opened the way to an exquisite control of matter at the quantum level. Here we focus on the control of ultracold atomic collisions using a laser to engineer their interactions at large interatomic distances. We show that the entrance channel of two colliding ultracold atoms can be coupled to a repulsive collisional channel by the laser light so that the overall interaction between the two atoms becomes repulsive: this prevents them to come close together and to undergo inelastic processes, thus protecting the atomic gases from unwanted losses. We illustrate such an optical shielding mechanism with 39K and 133Cs atoms colliding at ultracold temperature (<1 microkelvin). The process is described in the framework of the dressed-state picture and we then solve the resulting stationary coupled Schrödinger equations. The role of spontaneous emission and photoinduced inelastic scattering is also investigated as possible limitations of the shielding efficiency. We predict an almost complete suppression of inelastic collisions over a broad range of Rabi frequencies and detunings from the 39K D2 line of the optical shielding laser, both within the [0, 200 MHz] interval. We found that the polarization of the shielding laser has a minor influence on this efficiency. This proposal could easily be formulated for other bialkali-metal pairs as their long-range interaction are all very similar to each other.


2020 ◽  
Author(s):  
Jeremy Bigness ◽  
Xavi Loinaz ◽  
Shalin Patel ◽  
Erica Larschan ◽  
Ritambhara Singh

Long-range spatial interactions among genomic regions are critical for regulating gene expression and their disruption has been associated with a host of diseases. However, when modeling the effects of regulatory factors on gene expression, most deep learning models either neglect long-range interactions or fail to capture the inherent 3D structure of the underlying biological system. This prevents the field from obtaining a more comprehensive understanding of gene regulation and from fully leveraging the structural information present in the data sets. Here, we propose a graph convolutional neural network (GCNN) framework to integrate measurements probing spatial genomic organization and measurements of local regulatory factors, specifically histone modifications, to predict gene expression. This formulation enables the model to incorporate crucial information about long-range interactions via a natural encoding of spatial interaction relationships into a graph representation. Furthermore, we show that our model is interpretable in terms of the observed biological regulatory factors, highlighting both the histone modifications and the interacting genomic regions that contribute to a gene's predicted expression. We apply our GCNN model to datasets for GM12878 (lymphoblastoid) and K562 (myelogenous leukemia) cell lines and demonstrate its state-of-the-art prediction performance. We also obtain importance scores corresponding to the histone mark features and interacting regions for some exemplar genes and validate them with evidence from the literature. Our model presents a novel setup for predicting gene expression by integrating multimodal datasets.


2007 ◽  
Vol 35 (6) ◽  
pp. 1551-1557 ◽  
Author(s):  
L. Ye ◽  
Z. Wu ◽  
M. Eleftheriou ◽  
R. Zhou

Recent NMR experiments have revealed that a single residue mutation W62G on protein hen's-egg white lysozyme can cause a dramatic loss of long-range interactions and protein stability; however, the molecular mechanism for this surprising phenomenon is not completely clear. In this mini-review, we have summarized some of our recent work on the molecular mechanism with large-scale molecular modelling, and also utilized a new wavelet method to analyse the local structural clusters present in both the wild-type and mutant folding trajectories. These extensive MD (Molecular Dynamics) simulations (10+ μs) were performed in 8 M urea, mimicking the experimental condition. Detailed analyses revealed that the Trp62 residue is the key to a co-operative long-range interaction within the wild-type protein: it acts as a bridge between neighbouring basic residues, mainly arginine residues, through π-type hydrogen bonds or π-cation interactions to form an Arg-Trp-Arg ‘sandwich-like’ local structure. The local cluster near Trp62 further extends its interaction to other clusters, such as the one near Trp111, through Arg112, which is involved in such an Arg-Trp-Arg bridging structure, thus achieving the long-range interactions for the wild-type. On the other hand, the mutant does not have this bridging effect and forms much less local clusters or contacts, and therefore results in a much less stable structure. Overall, these findings not only support the general conclusions of the experiment, but also provide a detailed but somewhat different molecular picture of the disruption of the long-range interactions.


2000 ◽  
Vol 66 (4-5) ◽  
pp. 189-196 ◽  
Author(s):  
A. Ceccarelli ◽  
N. Zhukovskaya ◽  
T. Kawata ◽  
S. Bozzaro ◽  
J. Williams

2008 ◽  
Vol 82 (18) ◽  
pp. 9008-9022 ◽  
Author(s):  
Sinéad Diviney ◽  
Andrew Tuplin ◽  
Madeleine Struthers ◽  
Victoria Armstrong ◽  
Richard M. Elliott ◽  
...  

ABSTRACT The genome of hepatitis C virus (HCV) contains cis-acting replication elements (CREs) comprised of RNA stem-loop structures located in both the 5′ and 3′ noncoding regions (5′ and 3′ NCRs) and in the NS5B coding sequence. Through the application of several algorithmically independent bioinformatic methods to detect phylogenetically conserved, thermodynamically favored RNA secondary structures, we demonstrate a long-range interaction between sequences in the previously described CRE (5BSL3.2, now SL9266) with a previously predicted unpaired sequence located 3′ to SL9033, approximately 200 nucleotides upstream. Extensive reverse genetic analysis both supports this prediction and demonstrates a functional requirement in genome replication. By mutagenesis of the Con-1 replicon, we show that disruption of this alternative pairing inhibited replication, a phenotype that could be restored to wild-type levels through the introduction of compensating mutations in the upstream region. Substitution of the CRE with the analogous region of different genotypes of HCV produced replicons with phenotypes consistent with the hypothesis that both local and long-range interactions are critical for a fundamental aspect of genome replication. This report further extends the known interactions of the SL9266 CRE, which has also been shown to form a “kissing loop” interaction with the 3′ NCR (P. Friebe, J. Boudet, J. P. Simorre, and R. Bartenschlager, J. Virol. 79:380-392, 2005), and suggests that cooperative long-range binding with both 5′ and 3′ sequences stabilizes the CRE at the core of a complex pseudoknot. Alternatively, if the long-range interactions were mutually exclusive, the SL9266 CRE may function as a molecular switch controlling a critical aspect of HCV genome replication.


Sign in / Sign up

Export Citation Format

Share Document