CATHER: a novel threading algorithm with predicted contacts

Zongyang Du; Shuo Pan; Qi Wu; Zhenling Peng; Jianyi Yang

doi:10.1093/bioinformatics/btz876

CATHER: a novel threading algorithm with predicted contacts

Bioinformatics ◽

10.1093/bioinformatics/btz876 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2119-2125 ◽

Cited By ~ 1

Author(s):

Zongyang Du ◽

Shuo Pan ◽

Qi Wu ◽

Zhenling Peng ◽

Jianyi Yang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Supplementary Information ◽

Supplementary Data ◽

Contact Map ◽

Test Set ◽

Benchmark Tests ◽

Independent Test ◽

Push Forward

Abstract Motivation Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. Results We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. Availability and implementation http://yanglab.nankai.edu.cn/CATHER/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks

Bioinformatics ◽

10.1093/bioinformatics/bty1006 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2403-2410 ◽

Cited By ~ 35

Author(s):

Jack Hanson ◽

Kuldip Paliwal ◽

Thomas Litfin ◽

Yuedong Yang ◽

Yaoqi Zhou

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Rapid Expansion ◽

Solvent Accessible Surface Area ◽

Supplementary Information ◽

Test Set ◽

Contact Maps ◽

Unseen Data

Abstract Motivation Sequence-based prediction of one dimensional structural properties of proteins has been a long-standing subproblem of protein structure prediction. Recently, prediction accuracy has been significantly improved due to the rapid expansion of protein sequence and structure libraries and advances in deep learning techniques, such as residual convolutional networks (ResNets) and Long-Short-Term Memory Cells in Bidirectional Recurrent Neural Networks (LSTM-BRNNs). Here we leverage an ensemble of LSTM-BRNN and ResNet models, together with predicted residue-residue contact maps, to continue the push towards the attainable limit of prediction for 3- and 8-state secondary structure, backbone angles (θ, τ, ϕ and ψ), half-sphere exposure, contact numbers and solvent accessible surface area (ASA). Results The new method, named SPOT-1D, achieves similar, high performance on a large validation set and test set (≈1000 proteins in each set), suggesting robust performance for unseen data. For the large test set, it achieves 87% and 77% in 3- and 8-state secondary structure prediction and 0.82 and 0.86 in correlation coefficients between predicted and measured ASA and contact numbers, respectively. Comparison to current state-of-the-art techniques reveals substantial improvement in secondary structure and backbone angle prediction. In particular, 44% of 40-residue fragment structures constructed from predicted backbone Cα-based θ and τ angles are less than 6 Å root-mean-squared-distance from their native conformations, nearly 20% better than the next best. The method is expected to be useful for advancing protein structure and function prediction. Availability and implementation SPOT-1D and its data is available at: http://sparks-lab.org/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep‐learning contact‐map guided protein structure prediction in CASP13

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.25792 ◽

2019 ◽

Vol 87 (12) ◽

pp. 1149-1164 ◽

Cited By ~ 50

Author(s):

Wei Zheng ◽

Yang Li ◽

Chengxin Zhang ◽

Robin Pearce ◽

S. M. Mortuza ◽

...

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Contact Map

Download Full-text

Protein contact map refinement for improving structure prediction using generative adversarial networks

Bioinformatics ◽

10.1093/bioinformatics/btab220 ◽

2021 ◽

Author(s):

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Genki Terashi ◽

Aashish Jain ◽

Yuki Kagaya ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

Substantial Improvement ◽

Supplementary Information ◽

Generative Adversarial Networks ◽

Contact Map ◽

Contact Prediction ◽

Adversarial Networks

Abstract Motivation Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue–residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. Results We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. Availability and implementation https://github.com/kiharalab/ContactGAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep learning techniques have significantly impacted protein structure prediction and protein design

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2021.01.007 ◽

2021 ◽

Vol 68 ◽

pp. 194-207

Author(s):

Robin Pearce ◽

Yang Zhang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Learning Techniques

Download Full-text

Template-based prediction of protein structure with deep learning

BMC Genomics ◽

10.1186/s12864-020-07249-8 ◽

2020 ◽

Vol 21 (S11) ◽

Author(s):

Haicang Zhang ◽

Yufeng Shen

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Tertiary Structure Prediction ◽

Protein Tertiary Structure ◽

Protein Threading ◽

Protein Tertiary Structure Prediction

Abstract Background Accurate prediction of protein structure is fundamentally important to understand biological function of proteins. Template-based modeling, including protein threading and homology modeling, is a popular method for protein tertiary structure prediction. However, accurate template-query alignment and template selection are still very challenging, especially for the proteins with only distant homologs available. Results We propose a new template-based modelling method called ThreaderAI to improve protein tertiary structure prediction. ThreaderAI formulates the task of aligning query sequence with template as the classical pixel classification problem in computer vision and naturally applies deep residual neural network in prediction. ThreaderAI first employs deep learning to predict residue-residue aligning probability matrix by integrating sequence profile, predicted sequential structural features, and predicted residue-residue contacts, and then builds template-query alignment by applying a dynamic programming algorithm on the probability matrix. We evaluated our methods both in generating accurate template-query alignment and protein threading. Experimental results show that ThreaderAI outperforms currently popular template-based modelling methods HHpred, CNFpred, and the latest contact-assisted method CEthreader, especially on the proteins that do not have close homologs with known structures. In particular, in terms of alignment accuracy measured with TM-score, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 56, 13, and 11%, respectively, on template-query pairs at the similarity of fold level from SCOPe data. And on CASP13’s TBM-hard data, ThreaderAI outperforms HHpred, CNFpred, and CEthreader by 16, 9 and 8% in terms of TM-score, respectively. Conclusions These results demonstrate that with the help of deep learning, ThreaderAI can significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins.

Download Full-text

Improved protein structure prediction by deep learning irrespective of co-evolution information

Nature Machine Intelligence ◽

10.1038/s42256-021-00348-5 ◽

2021 ◽

Author(s):

Jinbo Xu ◽

Matthew McPartlon ◽

Jin Li

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text

Deep learning to predict subtypes of poorly differentiated lung cancer from biopsy whole slide images.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.8536 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 8536-8536

Author(s):

Gouji Toyokawa ◽

Fahdi Kanavati ◽

Seiya Momosaki ◽

Kengo Tateishi ◽

Hiroaki Takeoka ◽

...

Keyword(s):

Lung Cancer ◽

Deep Learning ◽

Learning Model ◽

Test Set ◽

Cancer Subtypes ◽

Independent Test ◽

Poorly Differentiated ◽

Test Sets ◽

Deep Learning Model ◽

Whole Slide Images

8536 Background: Lung cancer is the leading cause of cancer-related death in many countries, and its prognosis remains unsatisfactory. Since treatment approaches differ substantially based on the subtype, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC) and small cell lung cancer (SCLC), an accurate histopathological diagnosis is of great importance. However, if the specimen is solely composed of poorly differentiated cancer cells, distinguishing between histological subtypes can be difficult. The present study developed a deep learning model to classify lung cancer subtypes from whole slide images (WSIs) of transbronchial lung biopsy (TBLB) specimens, in particular with the aim of using this model to evaluate a challenging test set of indeterminate cases. Methods: Our deep learning model consisted of two separately trained components: a convolutional neural network tile classifier and a recurrent neural network tile aggregator for the WSI diagnosis. We used a training set consisting of 638 WSIs of TBLB specimens to train a deep learning model to classify lung cancer subtypes (ADC, SCC and SCLC) and non-neoplastic lesions. The training set consisted of 593 WSIs for which the diagnosis had been determined by pathologists based on the visual inspection of Hematoxylin-Eosin (HE) slides and of 45 WSIs of indeterminate cases (64 ADCs and 19 SCCs). We then evaluated the models using five independent test sets. For each test set, we computed the receiver operator curve (ROC) area under the curve (AUC). Results: We applied the model to an indeterminate test set of WSIs obtained from TBLB specimens that pathologists had not been able to conclusively diagnose by examining the HE-stained specimens alone. Overall, the model achieved ROC AUCs of 0.993 (confidence interval [CI] 0.971-1.0) and 0.996 (0.981-1.0) for ADC and SCC, respectively. We further evaluated the model using five independent test sets consisting of both TBLB and surgically resected lung specimens (combined total of 2490 WSIs) and obtained highly promising results with ROC AUCs ranging from 0.94 to 0.99. Conclusions: In this study, we demonstrated that a deep learning model could be trained to predict lung cancer subtypes in indeterminate TBLB specimens. The extremely promising results obtained show that if deployed in clinical practice, a deep learning model that is capable of aiding pathologists in diagnosing indeterminate cases would be extremely beneficial as it would allow a diagnosis to be obtained sooner and reduce costs that would result from further investigations.

Download Full-text

Review of: "ROPIUS0: A deep learning-based protocol for protein structure prediction and model selection and its performance in CASP14"

Qeios ◽

10.32388/nmtocb ◽

2021 ◽

Author(s):

jianquan ouyang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Model Selection ◽

Protein Structure Prediction ◽

Structure Prediction

Download Full-text

SPOT‐Fold: Fragment‐Free Protein Structure Prediction Guided by Predicted Backbone Structure and Contact Map

Journal of Computational Chemistry ◽

10.1002/jcc.26132 ◽

2019 ◽

Vol 41 (8) ◽

pp. 745-750 ◽

Cited By ~ 2

Author(s):

Yufeng Cai ◽

Xiongjun Li ◽

Zhe Sun ◽

Yutong Lu ◽

Huiying Zhao ◽

...

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Contact Map ◽

Free Protein ◽

Backbone Structure

Download Full-text

Serpentine: a flexible 2D binning method for differential Hi-C analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa249 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3645-3651

Author(s):

Lyam Baudry ◽

Gaël A Millot ◽

Agnes Thierry ◽

Romain Koszul ◽

Vittore F Scolari

Keyword(s):

Deep Sequencing ◽

Low Noise ◽

Supplementary Information ◽

Supplementary Data ◽

Fractal Nature ◽

Contact Map ◽

Signal To Noise ◽

High Quality ◽

Contact Maps ◽

Contact Frequency

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text