scholarly journals Distance-based protein folding powered by deep learning

2019 ◽  
Vol 116 (34) ◽  
pp. 16856-16865 ◽  
Author(s):  
Jinbo Xu

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

2018 ◽  
Author(s):  
Jinbo Xu

AbstractDirect coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming folding simulation. We show that we can accurately predict the distance matrix of a protein by deep learning, even for proteins with ∼60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving any folding simulation. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, DCA cannot fold any of these hard targets in the absence of folding simulation, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into complex, fragment-based folding simulation. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on top L/5 long-range predicted contacts. Latest experimental validation in CAMEO shows that our server predicted correct fold for two membrane proteins of new fold while all the other servers failed. These results imply that it is now feasible to predict correct fold for proteins lack of similar structures in PDB on a personal computer without folding simulation.SignificanceAccurate description of protein structure and function is a fundamental step towards understanding biological life and highly relevant in the development of therapeutics. Although greatly improved, experimental protein structure determination is still low-throughput and costly, especially for membrane proteins. As such, computational structure prediction is often resorted. Predicting the structure of a protein with a new fold (i.e., without similar structures in PDB) is very challenging and usually needs a large amount of computing power. This paper shows that by using a powerful deep learning technique, even with only a personal computer we can predict new folds much more accurately than ever before. This method also works well on membrane protein folding.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243331
Author(s):  
Andrew J. McGehee ◽  
Sutanu Bhattacharya ◽  
Rahmatullah Roche ◽  
Debswapna Bhattacharya

Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.


Science ◽  
2021 ◽  
Vol 373 (6557) ◽  
pp. 866.2-866
Author(s):  
Valda Vinson

Author(s):  
Pranoy Ghosh ◽  
Krithika M Pai ◽  
Manohara Pai M M ◽  
Ujjwal Verma ◽  
Frederic Rivet ◽  
...  

GEOMATICA ◽  
2021 ◽  
pp. 1-23
Author(s):  
Roholah Yazdan ◽  
Masood Varshosaz ◽  
Saied Pirasteh ◽  
Fabio Remondino

Automatic detection and recognition of traffic signs from images is an important topic in many applications. At first, we segmented the images using a classification algorithm to delineate the areas where the signs are more likely to be found. In this regard, shadows, objects having similar colours, and extreme illumination changes can significantly affect the segmentation results. We propose a new shape-based algorithm to improve the accuracy of the segmentation. The algorithm works by incorporating the sign geometry to filter out the wrong pixels from the classification results. We performed several tests to compare the performance of our algorithm against those obtained by popular techniques such as Support Vector Machine (SVM), K-Means, and K-Nearest Neighbours. In these tests, to overcome the unwanted illumination effects, the images are transformed into colour spaces Hue, Saturation, and Intensity, YUV, normalized red green blue, and Gaussian. Among the traditional techniques used in this study, the best results were obtained with SVM applied to the images transformed into the Gaussian colour space. The comparison results also suggested that by adding the geometric constraints proposed in this study, the quality of sign image segmentation is improved by 10%–25%. We also comparted the SVM classifier enhanced by incorporating the geometry of signs with a U-Shaped deep learning algorithm. Results suggested the performance of both techniques is very close. Perhaps the deep learning results could be improved if a more comprehensive data set is provided.


2006 ◽  
Vol 16 (05n06) ◽  
pp. 479-511 ◽  
Author(s):  
GILLES TROMBETTONI ◽  
MARTA WILCZKOWIAK

Our approach exploits a general-purpose decomposition algorithm, called GPDOF, and a dictionary of very efficient solving procedures, called r-methods, based on theorems of geometry. GPDOF decomposes an equation system into a sequence of small subsystems solved by r-methods, and produces a set of input parameters.1. Recursive assembly methods (decomposition-recombination), maximum matching based algorithms, and other famous propagation schema are not well-suited or cannot be easily extended to tackle geometric constraint systems that are under-constrained. In this paper, we show experimentally that, provided that redundant constraints have been removed from the system, GPDOF can quickly decompose large under-constrained systems of geometrical constraints. We have validated our approach by reconstructing, from images, 3D models of buildings using interactively introduced geometrical constraints. Models satisfying the set of linear, bilinear and quadratic geometric constraints are optimized to fit the image information. Our models contain several hundreds of equations. The constraint system is decomposed in a few seconds, and can then be solved in hundredths of seconds.


2021 ◽  
Vol 11 (2) ◽  
pp. 782 ◽  
Author(s):  
Albert Comelli ◽  
Navdeep Dahiya ◽  
Alessandro Stefano ◽  
Federica Vernuccio ◽  
Marzia Portoghese ◽  
...  

Magnetic Resonance Imaging-based prostate segmentation is an essential task for adaptive radiotherapy and for radiomics studies whose purpose is to identify associations between imaging features and patient outcomes. Because manual delineation is a time-consuming task, we present three deep-learning (DL) approaches, namely UNet, efficient neural network (ENet), and efficient residual factorized convNet (ERFNet), whose aim is to tackle the fully-automated, real-time, and 3D delineation process of the prostate gland on T2-weighted MRI. While UNet is used in many biomedical image delineation applications, ENet and ERFNet are mainly applied in self-driving cars to compensate for limited hardware availability while still achieving accurate segmentation. We apply these models to a limited set of 85 manual prostate segmentations using the k-fold validation strategy and the Tversky loss function and we compare their results. We find that ENet and UNet are more accurate than ERFNet, with ENet much faster than UNet. Specifically, ENet obtains a dice similarity coefficient of 90.89% and a segmentation time of about 6 s using central processing unit (CPU) hardware to simulate real clinical conditions where graphics processing unit (GPU) is not always available. In conclusion, ENet could be efficiently applied for prostate delineation even in small image training datasets with potential benefit for patient management personalization.


2002 ◽  
Vol 35 (5) ◽  
pp. 648-649 ◽  
Author(s):  
Kristian Vlahovicek ◽  
Oliviero Carugo ◽  
Sándor Pongor

The PRIDE server is an implementation of thePRIDEalgorithm that compares protein three-dimensional structures in terms of their Cαdistance distributions. In response to queries presented as single or concatenated Protein Data Bank (PDB) files, the server can carry out (i) a pairwise comparison of two protein three-dimensional structures, (ii) a structural clustering of protein three-dimensional structures, providing a distance matrix and a dendrogram as an output; and (iii) a similarity search with a protein domain structure query against the CATH database.


1991 ◽  
Vol 7 (5-6) ◽  
pp. 309-325 ◽  
Author(s):  
Maarten J. G. M. van Emmerik

Author(s):  
D. Frommholz ◽  
M. Linkiewicz ◽  
A. M. Poznanska

This paper proposes an in-line method for the simplified reconstruction of city buildings from nadir and oblique aerial images that at the same time are being used for multi-source texture mapping with minimal resampling. Further, the resulting unrectified texture atlases are analyzed for fac¸ade elements like windows to be reintegrated into the original 3D models. Tests on real-world data of Heligoland/ Germany comprising more than 800 buildings exposed a median positional deviation of 0.31 m at the fac¸ades compared to the cadastral map, a correctness of 67% for the detected windows and good visual quality when being rendered with GPU-based perspective correction. As part of the process building reconstruction takes the oriented input images and transforms them into dense point clouds by semi-global matching (SGM). The point sets undergo local RANSAC-based regression and topology analysis to detect adjacent planar surfaces and determine their semantics. Based on this information the roof, wall and ground surfaces found get intersected and limited in their extension to form a closed 3D building hull. For texture mapping the hull polygons are projected into each possible input bitmap to find suitable color sources regarding the coverage and resolution. Occlusions are detected by ray-casting a full-scale digital surface model (DSM) of the scene and stored in pixel-precise visibility maps. These maps are used to derive overlap statistics and radiometric adjustment coefficients to be applied when the visible image parts for each building polygon are being copied into a compact texture atlas without resampling whenever possible. The atlas bitmap is passed to a commercial object-based image analysis (OBIA) tool running a custom rule set to identify windows on the contained fac¸ade patches. Following multi-resolution segmentation and classification based on brightness and contrast differences potential window objects are evaluated against geometric constraints and conditionally grown, fused and filtered morphologically. The output polygons are vectorized and reintegrated into the previously reconstructed buildings by sparsely ray-tracing their vertices. Finally the enhanced 3D models get stored as textured geometry for visualization and semantically annotated ”LOD-2.5” CityGML objects for GIS applications.


Sign in / Sign up

Export Citation Format

Share Document