scholarly journals Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

2016 ◽  
Author(s):  
Sheng Wang ◽  
Siqi Sun ◽  
Zhen Li ◽  
Renyu Zhang ◽  
Jinbo Xu

AbstractMotivationProtein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not extremely useful for de novo structure prediction.MethodThis paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can model contact occurring patterns and very complex sequence-structure relationship and thus, obtain high-quality contact prediction regardless of how many sequence homologs are available for proteins in question.ResultsOur method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained by only non-membrane proteins, our deep learning method works very well on membrane protein contact prediction. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 5 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues and one α protein of 217 residues and one α/β of 260 residues.Availability:http://raptorx.uchicago.edu/ContactMap/Author SummaryProtein contact prediction and contact-assisted folding has made good progress due to direct evolutionary coupling analysis (DCA). However, DCA is effective on only some proteins with a very large number of sequence homologs. To further improve contact prediction, we borrow ideas from deep learning, which has recently revolutionized object recognition, speech recognition and the GO game. Our deep learning method can model complex sequence-structure relationship and high-order correlation (i.e., contact occurring patterns) and thus, improve contact prediction accuracy greatly. Our test results show that our method greatly outperforms the state-of-the-art methods regardless how many sequence homologs are available for a protein in question. Ab initio folding guided by our predicted contacts may fold many more test proteins than the other contact predictors. Our contact-assisted 3D models also have much better quality than homology models built from the training proteins, especially for membrane proteins. One interesting finding is that even trained with only soluble proteins, our method performs very well on membrane proteins. Recent blind test in CAMEO confirms that our method can fold large proteins with a new fold and only a small number of sequence homologs.


2017 ◽  
Author(s):  
Sheng Wang ◽  
Siqi Sun ◽  
Jinbo Xu

AbstractHere we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L=length). A more advanced implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as an image pixel-level labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and co-evolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both 1D and 2D deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method.



RSC Advances ◽  
2019 ◽  
Vol 9 (18) ◽  
pp. 10326-10339 ◽  
Author(s):  
Abbas Khan ◽  
Aman Chandra Kaushik ◽  
Syed Shujait Ali ◽  
Nisar Ahmad ◽  
Dong-Qing Wei

Herein, a two-step de novo approach was developed for the prediction of piperine targets and another prediction of similar (piperine) compounds from a small molecule library using a deep-learning method.



Author(s):  
Oleksii Prykhodko ◽  
Simon Viet Johansson ◽  
Panagiotis-Christos Kotsias ◽  
Esben Jannik Bjerrum ◽  
Ola Engkvist ◽  
...  

<p>Recently deep learning method has been used for generating novel structures. In the current study, we proposed a new deep learning method, LatentGAN, which combine an autoencoder and a generative adversarial neural network for doing de novo molecule design. We applied the method for structure generation in two scenarios, one is to generate random drug-like compounds and the other is to generate target biased compounds. Our results show that the method works well in both cases, in which sampled compounds from the trained model can largely occupy the same chemical space of the training set and still a substantial fraction of the generated compound are novel. The distribution of drug-likeness score for compounds sampled from LatentGAN is also similar to that of the training set.</p>



2019 ◽  
Author(s):  
Hao He ◽  
Can Liu ◽  
Haiguang Liu

AbstractWe present an algorithm based on a deep learning method for model reconstruction from small angle X-ray scattering (SAXS) data. An auto-encoder for protein 3D models was trained to compress 3D shape information into vectors of a 200-dimensional latent space, and the vectors are optimized using genetic algorithms to build 3D models that are consistent with the scattering data. The algorithm was implemented using Python with the TensorFlow framework and tested with experimental data, demonstrating capacity and robustness of accurate model reconstruction even without using prior model size information.SynopsisA deep learning method based on the auto-encoder framework for model reconstruction from small angle scattering data



2021 ◽  
Author(s):  
Ziwei Xie ◽  
Jinbo Xu

Motivation: Inter-protein (interfacial) contact prediction is very useful for in silico structural characterization of protein-protein interactions. Although deep learning has been applied to this problem, its accuracy is not as good as intra-protein contact prediction. Results: We propose a new deep learning method GLINTER (Graph Learning of INTER-protein contacts) for interfacial contact prediction of dimers, leveraging a rotational invariant representation of protein tertiary structures and a pretrained language model of multiple sequence alignments (MSAs). Tested on the 13th and 14th CASP-CAPRI datasets, the average top L/10 precision achieved by GLINTER is 54.35% on the homodimers and 51.56% on all the dimers, much higher than 30.43% obtained by the latest deep learning method DeepHomo on the homodimers and 14.69% obtained by BIPSPI on all the dimers. Our experiments show that GLINTER-predicted contacts help improve selection of docking decoys.



Author(s):  
Oleksii Prykhodko ◽  
Simon Viet Johansson ◽  
Panagiotis-Christos Kotsias ◽  
Esben Jannik Bjerrum ◽  
Ola Engkvist ◽  
...  

<p>Recently deep learning method has been used for generating novel structures. In the current study, we proposed a new deep learning method, LatentGAN, which combine an autoencoder and a generative adversarial neural network for doing de novo molecule design. We applied the method for structure generation in two scenarios, one is to generate random drug-like compounds and the other is to generate target biased compounds. Our results show that the method works well in both cases, in which sampled compounds from the trained model can largely occupy the same chemical space of the training set and still a substantial fraction of the generated compound are novel. The distribution of drug-likeness score for compounds sampled from LatentGAN is also similar to that of the training set.</p>



2019 ◽  
Vol 9 (9) ◽  
pp. 1823 ◽  
Author(s):  
Zilong Zhuang ◽  
Huichun Lv ◽  
Jie Xu ◽  
Zizhao Huang ◽  
Wei Qin

Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a deep learning method based on stacked residual dilated convolutional neural network (SRDCNN) is proposed for real-time bearing fault diagnosis, which is subtly combined by the dilated convolution, the input gate structure of long short-term memory network (LSTM) and the residual network. In the SRDCNN model, the dilated convolution is used to exponentially increase the receptive field of convolution kernel and extract features from the sample with more points, alleviating the influence of randomness. The input gate structure of LSTM could effectively remove noise and control the entry of information contained in the input sample. Meanwhile, the residual network is introduced to overcome the problem of vanishing gradients caused by the deeper structure of the neural network, hence improving the overall classification accuracy. The experimental results indicate that compared with three excellent models, the proposed SRDCNN model has higher denoising ability and better workload adaptability.



2019 ◽  
Author(s):  
Jinbo Xu ◽  
Sheng Wang

AbstractThis paper reports the CASP13 results of distance-based contact prediction, threading and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median MSA (multiple sequence alignment) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2 and L long-range contact precision of 70%, 58% and 45%, respectively, and predicted correct folds (TMscore>0.5) for 18 of 32 targets. Although on average underperforming AlphaFold in 3D modeling, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1 and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (1) predicted distance is more useful than contacts for both template-based and free modeling; and (2) structure modeling may be improved by integrating alignment and co-evolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.



2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Minghui Guo ◽  
Kangjian Wang ◽  
Shunlan Liu ◽  
Yongzhao Du ◽  
Peizhong Liu ◽  
...  

Ultrasound is one of the critical methods for diagnosis and treatment in thyroid examination. In clinical application, many reasons, such as large outpatient traffic, time-consuming training of sonographers, and uneven professional level of physicians, often cause irregularities during the ultrasonic examination, leading to misdiagnosis or missed diagnosis. In order to standardize the thyroid ultrasound examination process, this paper proposes using a deep learning method based on residual network to recognize the Thyroid Ultrasound Standard Plane (TUSP). At first, referring to multiple relevant guidelines, eight TUSP were determined with the advice of clinical ultrasound experts. A total of 5,500 TUSP images of 8 categories were collected with the approval and review of the Ethics Committee and the patient’s informed consent. Then, after desensitizing and filling the images, the 18-layer residual network model (ResNet-18) was trained for TUSP image recognition, and five-fold cross-validation was performed. Finally, through indicators like accuracy rate, we compared the recognition effect of other mainstream deep convolutional neural network models. Experimental results showed that ResNet-18 has the best recognition effect on TUSP images with an average accuracy rate of 91.07%. The average macro precision, average macro recall, and average macro F1-score are 91.39%, 91.34%, and 91.30%, respectively. It proves that the deep learning method based on residual network can effectively recognize TUSP images, which is expected to standardize clinical thyroid ultrasound examination and reduce misdiagnosis and missed diagnosis.



Sign in / Sign up

Export Citation Format

Share Document