Characterization of Non-Trivial Neighborhood Fold Constraints from Protein Sequences using Generalized Topohydrophobicity.

Bioinformatics and Biology Insights ◽

10.4137/bbi.s426 ◽

2008 ◽

Vol 2 ◽

pp. BBI.S426 ◽

Cited By ~ 2

Author(s):

Guillaume Fourty ◽

Isabelle Callebaut ◽

Jean-Paul Mornon

Keyword(s):

Secondary Structure ◽

Solvent Accessibility ◽

Protein Structures ◽

Comparative Modeling ◽

Local Geometry ◽

Large Set ◽

Structural Constraints ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Prediction of key features of protein structures, such as secondary structure, solvent accessibility and number of contacts between residues, provides useful structural constraints for comparative modeling, fold recognition, ab-initio fold prediction and detection of remote relationships. In this study, we aim at characterizing the number of non-trivial close neighbors, or long-range contacts of a residue, as a function of its “topohydrophobic” index deduced from multiple sequence alignments and of the secondary structure in which it is embedded. The “topohydrophobic” index is calculated using a two-class distribution of amino acids, based on their mean atom depths. From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy. We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0–2), whereas β-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra). This observed behavior allows the distinction, from sequence profiles, between edge and central β-strands within β-sheets. Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.

Download Full-text

SPOT-1D-LM: Reaching Alignment-profile-based Accuracy in Predicting Protein Secondary and Tertiary Structural Properties without Alignment.

10.1101/2021.10.16.464622 ◽

2021 ◽

Author(s):

Jaspreet Singh ◽

Kuldip Paliwal ◽

Jaswinder Singh ◽

Yaoqi Zhou

Keyword(s):

Structural Properties ◽

Solvent Accessibility ◽

Protein Structures ◽

Language Models ◽

Sequence Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Structural And Functional Properties ◽

Sequence Profiles

Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a combination of traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) allows a leap in accuracy over single-sequence based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers. This large improvement leads to an accuracy comparable to or better than the current state-of-the-art techniques for predicting these 1D structural properties based on sequence profiles generated from multiple sequence alignments. The high-accuracy prediction in both secondary and tertiary structural properties indicates that it is possible to make highly accurate prediction of protein structures without homologous sequences, the remaining obstacle in the post AlphaFold2 era.

Download Full-text

Refined template selection and combination algorithm significantly improves template-based modeling accuracy

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500069 ◽

2019 ◽

Vol 17 (02) ◽

pp. 1950006 ◽

Cited By ~ 4

Author(s):

Ashish Runthala ◽

Shibasish Chowdhury

Keyword(s):

Structural Information ◽

Protein Structures ◽

Comparative Modeling ◽

Target Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Modeling Accuracy ◽

Model Protein ◽

Rank And Select

In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.

Download Full-text

Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles 1 1Edited by J. Doudna

Journal of Molecular Biology ◽

10.1006/jmbi.2001.5102 ◽

2001 ◽

Vol 313 (5) ◽

pp. 1003-1011 ◽

Cited By ~ 169

Author(s):

Daniel Gautheret ◽

André Lambert

Keyword(s):

Secondary Structure ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Rna Motif

Download Full-text

QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz552 ◽

2019 ◽

Cited By ~ 3

Author(s):

Fabian Sievers ◽

Desmond G Higgins

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Reference Sequence ◽

Supplementary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Reference Sequences ◽

Selection Of

Abstract Motivation Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. Results We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. Availability and implementation QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.0404193101 ◽

2004 ◽

Vol 101 (33) ◽

pp. 12102-12107 ◽

Cited By ~ 51

Author(s):

A. Coventry ◽

D. J. Kleitman ◽

B. Berger

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Statistical Detection

Download Full-text

Fold and flexibility: what can proteins' mechanical properties tell us about their folding nucleus?

Journal of The Royal Society Interface ◽

10.1098/rsif.2015.0876 ◽

2015 ◽

Vol 12 (112) ◽

pp. 20150876 ◽

Cited By ~ 15

Author(s):

Sophie Sacquin-Mora

Keyword(s):

Mechanical Properties ◽

Brownian Dynamics ◽

Protein Structures ◽

Interaction Network ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Folding Nucleus ◽

Dynamics Simulations ◽

Brownian Dynamics Simulations

The determination of a protein's folding nucleus, i.e. a set of native contacts playing an important role during its folding process, remains an elusive yet essential problem in biochemistry. In this work, we investigate the mechanical properties of 70 protein structures belonging to 14 protein families presenting various folds using coarse-grain Brownian dynamics simulations. The resulting rigidity profiles combined with multiple sequence alignments show that a limited set of rigid residues, which we call the consensus nucleus, occupy conserved positions along the protein sequence. These residues' side chains form a tight interaction network within the protein's core, thus making our consensus nuclei potential folding nuclei. A review of experimental and theoretical literature shows that most (above 80%) of these residues were indeed identified as folding nucleus member in earlier studies.

Download Full-text

Prediction of Protein Secondary Structure by Combining Nearest-neighbor Algorithms and Multiple Sequence Alignments

Journal of Molecular Biology ◽

10.1006/jmbi.1994.0116 ◽

1995 ◽

Vol 247 (1) ◽

pp. 11-15 ◽

Cited By ~ 191

Author(s):

Asaf A. Salamov ◽

Victor V. Solovyev

Keyword(s):

Secondary Structure ◽

Nearest Neighbor ◽

Protein Secondary Structure ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

RNA inter-nucleotide 3D closeness prediction by deep residual neural networks

Bioinformatics ◽

10.1093/bioinformatics/btaa932 ◽

2020 ◽

Author(s):

Saisai Sun ◽

Wenkai Wang ◽

Zhenling Peng ◽

Jianyi Yang

Keyword(s):

Neural Networks ◽

Secondary Structure ◽

Rna Structure ◽

Supplementary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

Guide Rna ◽

Multiple Sequence Alignments ◽

Contact Distance ◽

Distance Restraints

Abstract Motivation Recent years have witnessed that the inter-residue contact/distance in proteins could be accurately predicted by deep neural networks, which significantly improve the accuracy of predicted protein structure models. In contrast, fewer studies have been done for the prediction of RNA inter-nucleotide 3D closeness. Results We proposed a new algorithm named RNAcontact for the prediction of RNA inter-nucleotide 3D closeness. RNAcontact was built based on the deep residual neural networks. The covariance information from multiple sequence alignments and the predicted secondary structure were used as the input features of the networks. Experiments show that RNAcontact achieves the respective precisions of 0.8 and 0.6 for the top L/10 and L (where L is the length of an RNA) predictions on an independent test set, significantly higher than other evolutionary coupling methods. Analysis shows that about 1/3 of the correctly predicted 3D closenesses are not base pairings of secondary structure, which are critical to the determination of RNA structure. In addition, we demonstrated that the predicted 3D closeness could be used as distance restraints to guide RNA structure folding by the 3dRNA package. More accurate models could be built by using the predicted 3D closeness than the models without using 3D closeness. Availability and implementation The webserver and a standalone package are available at: http://yanglab.nankai.edu.cn/RNAcontact/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Computational Methods for Protein Secondary Structure Prediction Using Multiple Sequence Alignments

Current Protein and Peptide Science ◽

10.2174/1389203003381324 ◽

2000 ◽

Vol 1 (3) ◽

pp. 273-301 ◽

Cited By ~ 21

Author(s):

Jaap Heringa

Keyword(s):

Secondary Structure ◽

Computational Methods ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Faculty Opinions recommendation of QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.736183723.793577501 ◽

2020 ◽

Author(s):

Janusz Bujnicki ◽

Pritha Ghosh

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text