Distance-based protein folding powered by deep learning

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

Download Full-text

Distance-based Protein Folding Powered by Deep Learning

10.1101/465955 ◽

2018 ◽

Cited By ~ 9

Author(s):

Jinbo Xu

Keyword(s):

Protein Folding ◽

Deep Learning ◽

Protein Structure ◽

Membrane Proteins ◽

Family Size ◽

Personal Computer ◽

Experimental Validation ◽

Distance Matrix ◽

Geometric Constraints ◽

Folding Simulation

AbstractDirect coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming folding simulation. We show that we can accurately predict the distance matrix of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving any folding simulation. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, DCA cannot fold any of these hard targets in the absence of folding simulation, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into complex, fragment-based folding simulation. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on top L/5 long-range predicted contacts. Latest experimental validation in CAMEO shows that our server predicted correct fold for two membrane proteins of new fold while all the other servers failed. These results imply that it is now feasible to predict correct fold for proteins lack of similar structures in PDB on a personal computer without folding simulation.SignificanceAccurate description of protein structure and function is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics. Although greatly improved, experimental protein structure determination is still low-throughput and costly, especially for membrane proteins. As such, computational structure prediction is often resorted. Predicting the structure of a protein with a new fold (i.e., without similar structures in PDB) is very challenging and usually needs a large amount of computing power. This paper shows that by using a powerful deep learning technique, even with only a personal computer we can predict new folds much more accurately than ever before. This method also works well on membrane protein folding.

Download Full-text

PolyFold: An interactive visual simulator for distance-based protein folding

PLoS ONE ◽

10.1371/journal.pone.0243331 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243331

Author(s):

Andrew J. McGehee ◽

Sutanu Bhattacharya ◽

Rahmatullah Roche ◽

Debswapna Bhattacharya

Keyword(s):

Protein Folding ◽

Structure Prediction ◽

Distance Matrix ◽

Geometric Constraints ◽

Visualization System ◽

Folding Process ◽

Graphical Interfaces ◽

Dynamic Perspective ◽

Static View ◽

Stochastic Optimization Algorithms

Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.

Download Full-text

Deep learning takes on protein folding

Science ◽

10.1126/science.373.6557.866-b ◽

2021 ◽

Vol 373 (6557) ◽

pp. 866.2-866

Author(s):

Valda Vinson

Keyword(s):

Protein Folding ◽

Deep Learning

Download Full-text

Exploring Techniques for Photo-realistic Image Generation from 3D Models - A Deep Learning Approach

10.1109/mysurucon52639.2021.9641645 ◽

2021 ◽

Author(s):

Pranoy Ghosh ◽

Krithika M Pai ◽

Manohara Pai M M ◽

Ujjwal Verma ◽

Frederic Rivet ◽

...

Keyword(s):

Deep Learning ◽

3D Models ◽

Learning Approach ◽

Image Generation ◽

Realistic Image

Download Full-text

Using geometric constraints to improve performance of image classifiers for automatic segmentation of traffic signs

GEOMATICA ◽

10.1139/geomat-2020-0010 ◽

2021 ◽

pp. 1-23

Author(s):

Roholah Yazdan ◽

Masood Varshosaz ◽

Saied Pirasteh ◽

Fabio Remondino

Keyword(s):

Deep Learning ◽

Learning Algorithm ◽

Automatic Segmentation ◽

Geometric Constraints ◽

Support Vector ◽

Svm Classifier ◽

Data Set ◽

Traffic Signs ◽

Deep Learning Algorithm ◽

Comparison Results

Automatic detection and recognition of traffic signs from images is an important topic in many applications. At first, we segmented the images using a classification algorithm to delineate the areas where the signs are more likely to be found. In this regard, shadows, objects having similar colours, and extreme illumination changes can significantly affect the segmentation results. We propose a new shape-based algorithm to improve the accuracy of the segmentation. The algorithm works by incorporating the sign geometry to filter out the wrong pixels from the classification results. We performed several tests to compare the performance of our algorithm against those obtained by popular techniques such as Support Vector Machine (SVM), K-Means, and K-Nearest Neighbours. In these tests, to overcome the unwanted illumination effects, the images are transformed into colour spaces Hue, Saturation, and Intensity, YUV, normalized red green blue, and Gaussian. Among the traditional techniques used in this study, the best results were obtained with SVM applied to the images transformed into the Gaussian colour space. The comparison results also suggested that by adding the geometric constraints proposed in this study, the quality of sign image segmentation is improved by 10%–25%. We also comparted the SVM classifier enhanced by incorporating the geometry of signs with a U-Shaped deep learning algorithm. Results suggested the performance of both techniques is very close. Perhaps the deep learning results could be improved if a more comprehensive data set is provided.

Download Full-text

GPDOF — A FAST ALGORITHM TO DECOMPOSE UNDER-CONSTRAINED GEOMETRIC CONSTRAINT SYSTEMS: APPLICATION TO 3D MODELING

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195906002154 ◽

2006 ◽

Vol 16 (05n06) ◽

pp. 479-511 ◽

Cited By ~ 3

Author(s):

GILLES TROMBETTONI ◽

MARTA WILCZKOWIAK

Keyword(s):

Equation System ◽

3D Models ◽

General Purpose ◽

Constrained Systems ◽

Geometric Constraints ◽

Geometric Constraint ◽

Maximum Matching ◽

Constraint System ◽

Constraint Systems ◽

Geometrical Constraints

Our approach exploits a general-purpose decomposition algorithm, called GPDOF, and a dictionary of very efficient solving procedures, called r-methods, based on theorems of geometry. GPDOF decomposes an equation system into a sequence of small subsystems solved by r-methods, and produces a set of input parameters.1. Recursive assembly methods (decomposition-recombination), maximum matching based algorithms, and other famous propagation schema are not well-suited or cannot be easily extended to tackle geometric constraint systems that are under-constrained. In this paper, we show experimentally that, provided that redundant constraints have been removed from the system, GPDOF can quickly decompose large under-constrained systems of geometrical constraints. We have validated our approach by reconstructing, from images, 3D models of buildings using interactively introduced geometrical constraints. Models satisfying the set of linear, bilinear and quadratic geometric constraints are optimized to fit the image information. Our models contain several hundreds of equations. The constraint system is decomposed in a few seconds, and can then be solved in hundredths of seconds.

Download Full-text

Deep Learning-Based Methods for Prostate Segmentation in Magnetic Resonance Imaging

Applied Sciences ◽

10.3390/app11020782 ◽

2021 ◽

Vol 11 (2) ◽

pp. 782 ◽

Cited By ~ 1

Author(s):

Albert Comelli ◽

Navdeep Dahiya ◽

Alessandro Stefano ◽

Federica Vernuccio ◽

Marzia Portoghese ◽

...

Keyword(s):

Magnetic Resonance Imaging ◽

Deep Learning ◽

Magnetic Resonance ◽

Prostate Gland ◽

Dice Similarity Coefficient ◽

Processing Unit ◽

Imaging Features ◽

Resonance Imaging ◽

Central Processing ◽

Prostate Segmentation

Magnetic Resonance Imaging-based prostate segmentation is an essential task for adaptive radiotherapy and for radiomics studies whose purpose is to identify associations between imaging features and patient outcomes. Because manual delineation is a time-consuming task, we present three deep-learning (DL) approaches, namely UNet, efficient neural network (ENet), and efficient residual factorized convNet (ERFNet), whose aim is to tackle the fully-automated, real-time, and 3D delineation process of the prostate gland on T2-weighted MRI. While UNet is used in many biomedical image delineation applications, ENet and ERFNet are mainly applied in self-driving cars to compensate for limited hardware availability while still achieving accurate segmentation. We apply these models to a limited set of 85 manual prostate segmentations using the k-fold validation strategy and the Tversky loss function and we compare their results. We find that ENet and UNet are more accurate than ERFNet, with ENet much faster than UNet. Specifically, ENet obtains a dice similarity coefficient of 90.89% and a segmentation time of about 6 s using central processing unit (CPU) hardware to simulate real clinical conditions where graphics processing unit (GPU) is not always available. In conclusion, ENet could be efficiently applied for prostate delineation even in small image training datasets with potential benefit for patient management personalization.

Download Full-text

The PRIDE server for protein three-dimensional similarity

Journal of Applied Crystallography ◽

10.1107/s002188980201292x ◽

2002 ◽

Vol 35 (5) ◽

pp. 648-649 ◽

Cited By ~ 4

Author(s):

Kristian Vlahovicek ◽

Oliviero Carugo ◽

Sándor Pongor

Keyword(s):

Similarity Search ◽

Three Dimensional ◽

Pairwise Comparison ◽

Distance Matrix ◽

Data Bank ◽

Protein Domain ◽

Distance Distributions ◽

Protein Domain Structure ◽

Structural Clustering ◽

Structure Query

The PRIDE server is an implementation of thePRIDEalgorithm that compares protein three-dimensional structures in terms of their Cαdistance distributions. In response to queries presented as single or concatenated Protein Data Bank (PDB) files, the server can carry out (i) a pairwise comparison of two protein three-dimensional structures, (ii) a structural clustering of protein three-dimensional structures, providing a distance matrix and a dendrogram as an output; and (iii) a similarity search with a protein domain structure query against the CATH database.

Download Full-text

Interactive design of 3D models with geometric constraints

The Visual Computer ◽

10.1007/bf01905695 ◽

1991 ◽

Vol 7 (5-6) ◽

pp. 309-325 ◽

Cited By ~ 9

Author(s):

Maarten J. G. M. van Emmerik

Keyword(s):

3D Models ◽

Geometric Constraints ◽

Interactive Design

Download Full-text

INLINING 3D RECONSTRUCTION, MULTI-SOURCE TEXTURE MAPPING AND SEMANTIC ANALYSIS USING OBLIQUE AERIAL IMAGERY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b3-605-2016 ◽

2016 ◽

Vol XLI-B3 ◽

pp. 605-612

Author(s):

D. Frommholz ◽

M. Linkiewicz ◽

A. M. Poznanska

Keyword(s):

Semantic Analysis ◽

Texture Mapping ◽

Point Clouds ◽

3D Models ◽

Geometric Constraints ◽

Aerial Images ◽

Surface Model ◽

Ray Casting ◽

Real World Data ◽

Visible Image

This paper proposes an in-line method for the simplified reconstruction of city buildings from nadir and oblique aerial images that at the same time are being used for multi-source texture mapping with minimal resampling. Further, the resulting unrectified texture atlases are analyzed for fac¸ade elements like windows to be reintegrated into the original 3D models. Tests on real-world data of Heligoland/ Germany comprising more than 800 buildings exposed a median positional deviation of 0.31 m at the fac¸ades compared to the cadastral map, a correctness of 67% for the detected windows and good visual quality when being rendered with GPU-based perspective correction. As part of the process building reconstruction takes the oriented input images and transforms them into dense point clouds by semi-global matching (SGM). The point sets undergo local RANSAC-based regression and topology analysis to detect adjacent planar surfaces and determine their semantics. Based on this information the roof, wall and ground surfaces found get intersected and limited in their extension to form a closed 3D building hull. For texture mapping the hull polygons are projected into each possible input bitmap to find suitable color sources regarding the coverage and resolution. Occlusions are detected by ray-casting a full-scale digital surface model (DSM) of the scene and stored in pixel-precise visibility maps. These maps are used to derive overlap statistics and radiometric adjustment coefficients to be applied when the visible image parts for each building polygon are being copied into a compact texture atlas without resampling whenever possible. The atlas bitmap is passed to a commercial object-based image analysis (OBIA) tool running a custom rule set to identify windows on the contained fac¸ade patches. Following multi-resolution segmentation and classification based on brightness and contrast differences potential window objects are evaluated against geometric constraints and conditionally grown, fused and filtered morphologically. The output polygons are vectorized and reintegrated into the previously reconstructed buildings by sparsely ray-tracing their vertices. Finally the enhanced 3D models get stored as textured geometry for visualization and semantically annotated ”LOD-2.5” CityGML objects for GIS applications.

Download Full-text