Evaluating DCA-based method performances for RNA contact prediction by a well-curated dataset

Mapping Intimacies ◽

10.1101/822023 ◽

2019 ◽

Author(s):

F. Pucci ◽

M. Zerihun ◽

E. Peter ◽

A. Schug

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Sequence Data ◽

Three Dimensional ◽

Spatial Proximity ◽

Dimensional Structure ◽

Rna Sequences ◽

Contact Prediction ◽

Rna Molecules ◽

Quality Of Structure

AbstractRNA molecules play many pivotal roles in the cellular functioning that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analysed by methods such as Direct Coupling Analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated dataset of about seventy RNA structures with high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor difference between the performances of the different methods. Moreover, we discuss how these predictions are robust for different contact definitions and how strongly depend on procedures used to curate and align the families of homologous RNA sequences.

RNA 3D modeling with FARFAR2, online

10.1101/2020.11.26.399451 ◽

2020 ◽

Author(s):

Andrew Watkins ◽

Rhiju Das

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Chemical Shifts ◽

Three Dimensional ◽

Structural Data ◽

Dimensional Structure ◽

Rna Structures ◽

Beet Western Yellows Virus ◽

H Nmr ◽

Sampling Algorithms

AbstractUnderstanding the three-dimensional structure of an RNA molecule is often essential to understanding its function. Sampling algorithms and energy functions for RNA structure prediction are improving, due to the increasing diversity of structural data available for training statistical potentials and testing structural data, along with a steady supply of blind challenges through the RNA Puzzles initiative. The recent FARFAR2 algorithm enables near-native structure predictions on fairly complex RNA structures, including automated selection of final candidate models and estimation of model accuracy. Here, we describe the use of a publicly available webserver for RNA modeling for realistic scenarios using FARFAR2, available at https://rosie.rosettacommons.org/farfar2. We walk through two cases in some detail: a simple model pseudoknot from the frameshifting element of beet western yellows virus modeled using the “basic interface” to the webserver, and a replication of RNA-Puzzle 20, a metagenomic twister sister ribozyme, using the “advanced interface.” We also describe example runs of FARFAR2 modeling including two kinds of experimental data: a c-di-GMP riboswitch modeled with low resolution restraints from MOHCA-seq experiments and a tandem GA motif modeled with 1H NMR chemical shifts.

Computational modeling of RNA 3D structure based on experimental data

Bioscience Reports ◽

10.1042/bsr20180430 ◽

2019 ◽

Vol 39 (2) ◽

Cited By ~ 13

Author(s):

Almudena Ponce-Salvatierra ◽

Astha ◽

Katarzyna Merdas ◽

Chandran Nithin ◽

Pritha Ghosh ◽

...

Keyword(s):

Experimental Data ◽

Computational Methods ◽

Rna Structure ◽

Structure Prediction ◽

3D Structure ◽

Rna Structures ◽

Data Types ◽

Rna Sequences ◽

Rna Molecules

Abstract RNA molecules are master regulators of cells. They are involved in a variety of molecular processes: they transmit genetic information, sense cellular signals and communicate responses, and even catalyze chemical reactions. As in the case of proteins, RNA function is dictated by its structure and by its ability to adopt different conformations, which in turn is encoded in the sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore the majority of known RNAs remain structurally uncharacterized. To address this problem, predictive computational methods were developed based on the accumulated knowledge of RNA structures determined so far, the physical basis of the RNA folding, and taking into account evolutionary considerations, such as conservation of functionally important motifs. However, all theoretical methods suffer from various limitations, and they are generally unable to accurately predict structures for RNA sequences longer than 100-nt residues unless aided by additional experimental data. In this article, we review experimental methods that can generate data usable by computational methods, as well as computational approaches for RNA structure prediction that can utilize data from experimental analyses. We outline methods and data types that can be potentially useful for RNA 3D structure modeling but are not commonly used by the existing software, suggesting directions for future development.

A COMPARATIVE STUDY OF PROTEIN TERTIARY STRUCTURE PREDICTION METHODS

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1168 ◽

2014 ◽

pp. 15-18

Author(s):

CHANDRAYANI N. ROKDE ◽

DR.MANALI KSHIRSAGAR

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Data ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

X Ray Crystallography ◽

Protein Tertiary Structure Prediction

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure

PLoS ONE ◽

10.1371/journal.pone.0259871 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259871

Author(s):

Richard John Wheeler

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Data ◽

Three Dimensional ◽

Model Organisms ◽

Dimensional Structure ◽

Sequence Alignments ◽

High Quality ◽

Multiple Sequence

AlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold or RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. Problematic species include Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life (Discoba) in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for Discoba species, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

A resource for improved predictions of Trypanosoma and Leishmania protein three-dimensional structure

10.1101/2021.09.02.458674 ◽

2021 ◽

Author(s):

Richard John Wheeler

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Data ◽

Three Dimensional ◽

Model Organisms ◽

Dimensional Structure ◽

Sequence Alignments ◽

High Quality ◽

Multiple Sequence

AbstractAlphaFold2 and RoseTTAfold represent a transformative advance for predicting protein structure. They are able to make very high-quality predictions given a high-quality alignment of the protein sequence with related proteins. These predictions are now readily available via the AlphaFold database of predicted structures and AlphaFold/RoseTTAfold Colaboratory notebooks for custom predictions. However, predictions for some species tend to be lower confidence than model organisms. This includes Trypanosoma cruzi and Leishmania infantum: important unicellular eukaryotic human parasites in an early-branching eukaryotic lineage. The cause appears to be due to poor sampling of this branch of life in the protein sequences databases used for the AlphaFold database and ColabFold. Here, by comprehensively gathering openly available protein sequence data for species from this lineage, significant improvements to AlphaFold2 protein structure prediction over the AlphaFold database and ColabFold are demonstrated. This is made available as an easy-to-use tool for the parasitology community in the form of Colaboratory notebooks for generating multiple sequence alignments and AlphaFold2 predictions of protein structure for Trypanosoma, Leishmania and related species.

An algorithm for template-based prediction of secondary structures of individual RNA sequences

10.1101/171108 ◽

2017 ◽

Author(s):

Josef Pánek ◽

Martin Černý

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

De Novo ◽

Secondary Structures ◽

Rna Structures ◽

Rna Sequences ◽

Biologically Relevant ◽

Rna Molecules ◽

Rna Structure Prediction ◽

Limited Applicability

ABSTRACTWhile understanding the structure of RNA molecules is vital for deciphering their functions, determining RNA structures experimentally is exceptionally hard. At the same time, extant approaches to computational RNA structure prediction have limited applicability and reliability. In this paper we provide a method to solve a simpler yet still biologically relevant problem: prediction of secondary RNA structure using structure of different molecules as a template.Our method identifies conserved and unconserved subsequences within an RNA molecule. For conserved subsequences, the template structure is directly transferred into the generated structure and combined with de-novo predicted structure for the unconserved subsequences with low evolutionary conservation. The method also determines, when the generated structure is unreliable.The method is validated using experimentally identified structures. The accuracy of the method exceeds that of classical prediction algorithms and constrained prediction methods. This is demonstrated by comparison using large number of heterogeneous RNAs. The presented method is fast and robust, and useful for various applications requiring knowledge of secondary structures of individual RNA sequences.

The Annotation of RNA Motifs

Comparative and Functional Genomics ◽

10.1002/cfg.213 ◽

2002 ◽

Vol 3 (6) ◽

pp. 518-524 ◽

Cited By ~ 26

Author(s):

Neocles B. Leontis ◽

Eric Westhof

Keyword(s):

Protein Interactions ◽

Rna Structure ◽

Three Dimensional ◽

Global Structure ◽

Hierarchical Organization ◽

Dimensional Structure ◽

Base Pairs ◽

Rna Motifs ◽

Rna Sequences ◽

Ordered Array

The recent deluge of new RNA structures, including complete atomic-resolution views of both subunits of the ribosome, has on the one hand literally overwhelmed our individual abilities to comprehend the diversity of RNA structure, and on the other hand presented us with new opportunities for comprehensive use of RNA sequences for comparative genetic, evolutionary and phylogenetic studies. Two concepts are key to understanding RNA structure: hierarchical organization of global structure and isostericity of local interactions. Global structure changes extremely slowly, as it relies on conserved long-range tertiary interactions. Tertiary RNA–RNA and quaternary RNA–protein interactions are mediated by RNA motifs, defined as recurrent and ordered arrays of non-Watson–Crick base-pairs. A single RNA motif comprises a family of sequences, all of which can fold into the same three-dimensional structure and can mediate the same interaction(s). The chemistry and geometry of base pairing constrain the evolution of motifs in such a way that random mutations that occur within motifs are accepted or rejected insofar as they can mediate a similar ordered array of interactions. The steps involved in the analysis and annotation of RNA motifs in 3D structures are: (a) decomposition of each motif into non-Watson–Crick base-pairs; (b) geometric classification of each basepair; (c) identification of isosteric substitutions for each basepair by comparison to isostericity matrices; (d) alignment of homologous sequences using the isostericity matrices to identify corresponding positions in the crystal structure; (e) acceptance or rejection of the null hypothesis that the motif is conserved.

Faculty Opinions recommendation of RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717952764.793458245 ◽

2012 ◽

Author(s):

Douglas Turner

Keyword(s):

Structure Prediction ◽

Three Dimensional ◽

Dimensional Structure ◽

Three Dimensional Structure

RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction

RNA ◽

10.1261/rna.031054.111 ◽

2012 ◽

Vol 18 (4) ◽

pp. 610-625 ◽

Cited By ~ 162

Author(s):

J. A. Cruz ◽

M.-F. Blanchet ◽

M. Boniecki ◽

J. M. Bujnicki ◽

S.-J. Chen ◽

...

Keyword(s):

Structure Prediction ◽

Three Dimensional ◽

Dimensional Structure ◽

Three Dimensional Structure

Analysis of structural similarities between brain Thy-1 antigen and immunoglobulin domains. Evidence for an evolutionary relationship and a hypothesis for its functional significance

Biochemical Journal ◽

10.1042/bj1950031 ◽

1981 ◽

Vol 195 (1) ◽

pp. 31-40 ◽

Cited By ~ 73

Author(s):

F E Cohen ◽

J Novotný ◽

M J E Sternberg ◽

D G Campbell ◽

A F Williams

Keyword(s):

Functional Significance ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Three Dimensional ◽

Evolutionary Relationship ◽

Dimensional Structure ◽

Sequence Homologies ◽

Beta Structure ◽

Beta 2 Microglobulin

The Thy-1 membrane glycoprotein from rat brain is shown to have structural and sequence homologies with immunoglobulin (Ig) domains on the basis of the following evidence. 1. The two disulphide bonds of Thy-1 are both consistent with the Ig-fold. 2. The molecule contains extensive beta-structure as shown by the c.d. spectrum. 3. Secondary structure prediction locates beta-strands along the sequence in a manner consistent with the Ig-fold. 4. On the basis of rules derived from known beta-sheet structures, a three-dimensional structure with the Ig-fold is predicted as favourable for Thy-1. 5. Sequences in the proposed beta-strands of Thy-1 and known beta-strands of Ig domains show significant sequence homology. This homology is statistically more significant than for the comparison of proposed beta-strand sequences of beta 2-microglobulin with Ig domains. An hypothesis is presented for the possible functional significance of an evolutionary relationship between Thy-1 and Ig. It is suggested that both Thy-1 and Ig evolved from primitive molecules, with an Ig fold, which mediated cell--cell interactions. The present-day role of Thy-1 may be similar to that of the primitive domain.