scholarly journals HiC-GNN: A Generalizable Model for 3D Chromosome Reconstruction Using Graph Convolutional Neural Networks

2021 ◽  
Author(s):  
Van Hovenga ◽  
Oluwatosin Oluwadare ◽  
Jugal Kalita

Chromosome conformation capture (3C) is a method of measuring chromosome topology in terms of loci interaction. The Hi-C method is a derivative of 3C that allows for genome wide quantification of chromosome interaction. From such interaction data, it is possible to infer the three-dimensional (3D) structure of the underlying chromosome. In this paper, we use a node embedding algorithm and a graph neural network to predict the 3D coordinates of each genomic loci from the corresponding Hi-C contact data. Unlike other chromosome structure prediction methods, our method can generalize a single model across Hi-C resolutions, multiple restriction enzymes, and multiple cell populations while maintaining reconstruction accuracy. We derive these results using three separate Hi-C data sets from the GM12878, GM06990, and K562 cell lines. We also compare the reconstruction accuracy of our method to four other existing methods and show that our method yields superior performance. Our algorithm outperforms the state-of-the-art methods in the accuracy of prediction and introduces a novel method for 3D structure prediction from Hi-C data.

2019 ◽  
Author(s):  
◽  
Oluwatosin Oluwadare

Sixteen years after the sequencing of the human genome, the Human Genome Project (HGP), and 17 years after the introduction of Chromosome Conformation Capture (3C) technologies, three-dimensional (3-D) inference and big data remains problematic in the field of genomics, and specifically, in the field of 3C data analysis. Three-dimensional inference involves the reconstruction of a genome's 3D structure or, in some cases, ensemble of structures from contact interaction frequencies extracted from a variant of the 3C technology called the Hi-C technology. Further questions remain about chromosome topology and structure; enhancer-promoter interactions; location of genes, gene clusters, and transcription factors; the relationship between gene expression and epigenetics; and chromosome visualization at a higher scale, among others. In this dissertation, four major contributions are described, first, 3DMax, a tool for chromosome and genome 3-D structure prediction from H-C data using optimization algorithm, second, GSDB, a comprehensive and common repository that contains 3D structures for Hi-C datasets from novel 3D structure reconstruction tools developed over the years, third, ClusterTAD, a method for topological associated domains (TAD) extraction from Hi-C data using unsupervised learning algorithm. Finally, we introduce a tool called, GenomeFlow, a comprehensive graphical tool to facilitate the entire process of modeling and analysis of 3D genome organization. It is worth noting that GenomeFlow and GSDB are the first of their kind in the 3D chromosome and genome research field. All the methods are available as software tools that are freely available to the scientific community.


2019 ◽  
Author(s):  
Oluwatosin Oluwadare ◽  
Max Highsmith ◽  
Jianlin Cheng

ABSTRACTAdvances in the study of chromosome conformation capture (3C) technologies, such as Hi-C technique - capable of capturing chromosomal interactions in a genome-wide scale - have led to the development of three-dimensional (3D) chromosome and genome structure reconstruction methods from Hi-C data. The 3D genome structure is important because it plays a role in a variety of important biological activities such as DNA replication, gene regulation, genome interaction, and gene expression. In recent years, numerous Hi-C datasets have been generated, and likewise, a number of genome structure construction algorithms have been developed. However, until now, there has been no freely available repository for 3D chromosome structures. In this work, we outline the construction of a novel Genome Structure Database (GSDB) to create a comprehensive repository that contains 3D structures for Hi-C datasets constructed by a variety of 3D structure reconstruction tools. GSDB contains over 50,000 structures constructed by 12 state-of-the-art chromosome and genome structure prediction methods for publicly used Hi-C datasets with varying resolution. The database is useful for the community to study the function of genome from a 3D perspective. GSDB is accessible at http://sysbio.rnet.missouri.edu/3dgenome/GSDB


2017 ◽  
Vol 20 (4) ◽  
pp. 1205-1214
Author(s):  
Jincheol Park ◽  
Shili Lin

Abstract How chromosomes fold and how distal genomic elements interact with one another at a genomic scale have been actively pursued in the past decade following the seminal work describing the Chromosome Conformation Capture (3C) assay. Essentially, 3C-based technologies produce two-dimensional (2D) contact maps that capture interactions between genomic fragments. Accordingly, a plethora of analytical methods have been proposed to take a 2D contact map as input to recapitulate the underlying whole genome three-dimensional (3D) structure of the chromatin. However, their performance in terms of several factors, including data resolution and ability to handle contact map features, have not been sufficiently evaluated. This task is taken up in this article, in which we consider several recent and/or well-regarded methods, both optimization-based and model-based, for their aptness of producing 3D structures using contact maps generated based on a population of cells. These methods are evaluated and compared using both simulated and real data. Several criteria have been used. For simulated data sets, the focus is on accurate recapitulation of the entire structure given the existence of the gold standard. For real data sets, comparison with distances measured by Florescence in situ Hybridization and consistency with several genomic features of known biological functions are examined.


Sequencing ◽  
2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Amitava Moulick ◽  
Debashis Mukhopadhyay ◽  
Shonima Talapatra ◽  
Nirmalya Ghoshal ◽  
Sarmistha Sen Raychaudhuri

Plantago ovata Forsk is a medicinally important plant. Metallothioneins are cysteine rich proteins involved in the detoxification of heavy metals. Molecular cloning and modeling of MT from P. ovata is not reported yet. The present investigation will describe the isolation, structure prediction, characterization, and expression under copper stress of type 2 metallothionein (MT2) from this species. The gene of the protein comprises three exons and two introns. The deduced protein sequence contains 81 amino acids with a calculated molecular weight of about 8.1 kDa and a theoretical pI value of 4.77. The transcript level of this protein was increased in response to copper stress. Homology modeling was used to construct a three-dimensional structure of P. ovata MT2. The 3D structure model of P. ovata MT2 will provide a significant clue for further structural and functional study of this protein.


2016 ◽  
Author(s):  
François Serra ◽  
Davide Baù ◽  
Guillaume Filion ◽  
Marc A. Marti-Renom

The sequence of a genome is insufficient to understand all genomic processes carried out in the cell nucleus. To achieve this, the knowledge of its three- dimensional architecture is necessary. Advances in genomic technologies and the development of new analytical methods, such as Chromosome Conformation Capture (3C) and its derivatives, now permit to investigate the spatial organization of genomes. However, inferring structures from raw contact data is a tedious process for shortage of available tools. Here we present TADbit, a computational framework to analyze and model the chromatin fiber in three dimensions. To illustrate the use of TADbit, we automatically modeled 50 genomic domains from the fly genome revealing differential structural features of the previously defined chromatin colors, establishing a link between the conformation of the genome and the local chromatin composition. More generally, TADbit allows to obtain three-dimensional models ready for visualization from 3C-based experiments and to characterize their relation to gene expression and epigenetic states. TADbit is open-source and available for download from http://www.3DGenomes.org.


2021 ◽  
Author(s):  
Marina A Pak ◽  
Karina A Markhieva ◽  
Mariia S Novikova ◽  
Dmitry S Petrov ◽  
Ilya S Vorobyev ◽  
...  

AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is "solved". However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the AlphaFold predictions on the impact of a single mutation on structure with a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold cannot be immediately applied to other problems or applications in protein folding.


2019 ◽  
Vol 16 (3) ◽  
pp. 172988141985171 ◽  
Author(s):  
Naeem Iqbal Ratyal ◽  
Imtiaz Ahmad Taj ◽  
Muhammad Sajid ◽  
Nouman Ali ◽  
Anzar Mahmood ◽  
...  

Face recognition underpins numerous applications; however, the task is still challenging mainly due to the variability of facial pose appearance. The existing methods show competitive performance but they are still short of what is needed. This article presents an effective three-dimensional pose invariant face recognition approach based on subject-specific descriptors. This results in state-of-the-art performance and delivers competitive accuracies. In our method, the face images are registered by transforming their acquisition pose into frontal view using three-dimensional variance of the facial data. The face recognition algorithm is initialized by detecting iso-depth curves in a coordinate plane perpendicular to the subject gaze direction. In this plane, discriminating keypoints are detected on the iso-depth curves of the facial manifold to define subject-specific descriptors using subject-specific regions. Importantly, the proposed descriptors employ Kernel Fisher Analysis-based features leading to the face recognition process. The proposed approach classifies unseen faces by pooling performance figures obtained from underlying classification algorithms. On the challenging data sets, FRGC v2.0 and GavabDB, our method obtains face recognition accuracies of 99.8% and 100% yielding superior performance compared to the existing methods.


Author(s):  
Badri Adhikari

AbstractProtein structure prediction continues to stand as an unsolved problem in bioinformatics and biomedicine. Deep learning algorithms and the availability of metagenomic sequences have led to the development of new approaches to predict inter-residue distances—the key intermediate step. Different from the recently successful methods which frame the problem as a multi-class classification problem, this article introduces a real-valued distance prediction method REALDIST. Using a representative set of 43 thousand protein chains, a variant of deep ResNet is trained to predict real-valued distance maps. The contacts derived from the real-valued distance maps predicted by this method, on the most difficult CASP13 free-modeling protein datasets, demonstrate a long-range top-L precision of 52%, which is 17% higher than the top CASP13 predictor Raptor-X and slightly higher than the more recent trRosetta method. Similar improvements are observed on the CAMEO ‘hard’ and ‘very hard’ datasets. Three-dimensional (3D) structure prediction guided by real-valued distances reveals that for short proteins the mean accuracy of the 3D models is slightly higher than the top human predictor AlphaFold and server predictor Quark in the CASP13 competition.


2021 ◽  
Author(s):  
Michael Heinzinger ◽  
Maria Littmann ◽  
Ian Sillitoe ◽  
Nicola Bordin ◽  
Christine Orengo ◽  
...  

Thanks to the recent advances in protein three-dimensional (3D) structure prediction, in particular through AlphaFold 2 and RoseTTAFold, the abundance of protein 3D information will explode over the next year(s). Expert resources based on 3D structures such as SCOP and CATH have been organizing the complex sequence-structure-function relations into a hierarchical classification schema. Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI) transferring annotations from a protein with experimentally known annotation to a query without annotation. Here, we presented a novel approach that expands the concept of HBI from a low-dimensional sequence-distance lookup to the level of a high-dimensional embedding-based annotation transfer (EAT). Secondly, we introduced a novel solution using single protein sequence representations from protein Language Models (pLMs), so called embeddings (Prose, ESM-1b, ProtBERT, and ProtT5), as input to contrastive learning, by which a new set of embeddings was created that optimized constraints captured by hierarchical classifications of protein 3D structures. These new embeddings (dubbed ProtTucker) clearly improved what was historically referred to as threading or fold recognition. Thereby, the new embeddings enabled the intrusion into the midnight zone of protein comparisons, i.e., the region in which the level of pairwise sequence similarity is akin of random relations and therefore is hard to navigate by HBI methods. Cautious benchmarking showed that ProtTucker reached much further than advanced sequence comparisons without the need to compute alignments allowing it to be orders of magnitude faster. Code is available at https://github.com/Rostlab/EAT .


Sign in / Sign up

Export Citation Format

Share Document