RNA 3D structure prediction using multiple sequence alignment information

RNA 3D structure prediction guided by independent folding of homologous sequences

BMC Bioinformatics ◽

10.1186/s12859-019-3120-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 5

Author(s):

Marcin Magnus ◽

Kalli Kappel ◽

Rhiju Das ◽

Janusz M. Bujnicki

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

3D Models ◽

Target Sequence ◽

Rna Sequences ◽

Homologous Sequences ◽

3D Structure Prediction ◽

Folding Simulations

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule’s sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence. Conclusion This work, for the first time to our knowledge, demonstrates the importance of the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure “foldability” or “predictability” of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identifying limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.

Download Full-text

RNA 3D structure prediction guided by independent folding of homologous sequences

10.21203/rs.2.10793/v4 ◽

2019 ◽

Author(s):

Marcin Magnus ◽

Kalli Kappel ◽

Rhiju Das ◽

Janusz Bujnicki

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Prediction Method ◽

3D Models ◽

Target Sequence ◽

Rna Sequences ◽

Homologous Sequences ◽

3D Structure Prediction ◽

Folding Simulations

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule's sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction method. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence. Conclusion This work, for the first time to our knowledge, demonstrates how important is the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure “foldability” or “predictability” of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identification of limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.

Download Full-text

RNA 3D structure prediction guided by independent folding of homologous sequences

10.21203/rs.2.10793/v2 ◽

2019 ◽

Author(s):

Marcin Magnus ◽

Kalli Kappel ◽

Rhiju Das ◽

Janusz Bujnicki

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Prediction Method ◽

3D Models ◽

Target Sequence ◽

Rna Sequences ◽

Homologous Sequences ◽

3D Structure Prediction ◽

Folding Simulations

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule's sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction method. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence. Conclusion This work, for the first time to our knowledge, demonstrates how important is the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure “foldability” or “predictability” of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identification of limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.

Download Full-text

RNA 3D structure prediction guided by independent folding of homologous sequences

10.21203/rs.2.10793/v3 ◽

2019 ◽

Author(s):

Marcin Magnus ◽

Kalli Kappel ◽

Rhiju Das ◽

Janusz Bujnicki

Keyword(s):

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Prediction Method ◽

3D Models ◽

Target Sequence ◽

Rna Sequences ◽

Homologous Sequences ◽

3D Structure Prediction ◽

Folding Simulations

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule's sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction method. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence. Conclusion This work, for the first time to our knowledge, demonstrates how important is the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure “foldability” or “predictability” of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identification of limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.

Download Full-text

Bioinspired Algorithms in Solving Three-Dimensional Protein Structure Prediction Problems

Bio-Inspired Computing for Information Retrieval Applications - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-2375-8.ch012 ◽

2017 ◽

pp. 316-337

Author(s):

Raghunath Satpathy

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Prediction Method ◽

Optimization Methods ◽

Point Of View ◽

Living Organisms ◽

Prediction Problems

Proteins play a vital molecular role in all living organisms. Experimentally, it is difficult to predict the protein structure, however alternatively theoretical prediction method holds good for it. The 3D structure prediction of proteins is very much important in biology and this leads to the discovery of different useful drugs, enzymes, and currently this is considered as an important research domain. The prediction of proteins is related to identification of its tertiary structure. From the computational point of view, different models (protein representations) have been developed along with certain efficient optimization methods to predict the protein structure. The bio-inspired computation is used mostly for optimization process during solving protein structure. These algorithms now a days has received great interests and attention in the literature. This chapter aim basically for discussing the key features of recently developed five different types of bio-inspired computational algorithms, applied in protein structure prediction problems.

Download Full-text

Computational modeling of RNA 3D structure based on experimental data

Bioscience Reports ◽

10.1042/bsr20180430 ◽

2019 ◽

Vol 39 (2) ◽

Cited By ~ 13

Author(s):

Almudena Ponce-Salvatierra ◽

Astha ◽

Katarzyna Merdas ◽

Chandran Nithin ◽

Pritha Ghosh ◽

...

Keyword(s):

Experimental Data ◽

Computational Methods ◽

Rna Structure ◽

Structure Prediction ◽

3D Structure ◽

Rna Structures ◽

Data Types ◽

Rna Sequences ◽

Rna Molecules

Abstract RNA molecules are master regulators of cells. They are involved in a variety of molecular processes: they transmit genetic information, sense cellular signals and communicate responses, and even catalyze chemical reactions. As in the case of proteins, RNA function is dictated by its structure and by its ability to adopt different conformations, which in turn is encoded in the sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore the majority of known RNAs remain structurally uncharacterized. To address this problem, predictive computational methods were developed based on the accumulated knowledge of RNA structures determined so far, the physical basis of the RNA folding, and taking into account evolutionary considerations, such as conservation of functionally important motifs. However, all theoretical methods suffer from various limitations, and they are generally unable to accurately predict structures for RNA sequences longer than 100-nt residues unless aided by additional experimental data. In this article, we review experimental methods that can generate data usable by computational methods, as well as computational approaches for RNA structure prediction that can utilize data from experimental analyses. We outline methods and data types that can be potentially useful for RNA 3D structure modeling but are not commonly used by the existing software, suggesting directions for future development.

Download Full-text

Analyzing effect of quadruple multiple sequence alignments on deep learning based protein inter-residue distance prediction

Scientific Reports ◽

10.1038/s41598-021-87204-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aashish Jain ◽

Genki Terashi ◽

Yuki Kagaya ◽

Sai Raghavendra Maddhuri Venkata Subramaniya ◽

Charles Christoffer ◽

...

Keyword(s):

Deep Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

3D Structure ◽

Evolutionary Information ◽

Learning Approaches ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Novel Approach

AbstractProtein 3D structure prediction has advanced significantly in recent years due to improving contact prediction accuracy. This improvement has been largely due to deep learning approaches that predict inter-residue contacts and, more recently, distances using multiple sequence alignments (MSAs). In this work we present AttentiveDist, a novel approach that uses different MSAs generated with different E-values in a single model to increase the co-evolutionary information provided to the model. To determine the importance of each MSA’s feature at the inter-residue level, we added an attention layer to the deep neural network. We show that combining four MSAs of different E-value cutoffs improved the model prediction performance as compared to single E-value MSA features. A further improvement was observed when an attention layer was used and even more when additional prediction tasks of bond angle predictions were added. The improvement of distance predictions were successfully transferred to achieve better protein tertiary structure modeling.

Download Full-text

Predicting pseudoknotted structures across two RNA sequences

Bioinformatics ◽

10.1093/bioinformatics/bts575 ◽

2012 ◽

Vol 28 (23) ◽

pp. 3058-3065 ◽

Cited By ~ 4

Author(s):

Jana Sperschneider ◽

Amitava Datta ◽

Michael J. Wise

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Prediction Method ◽

Supplementary Information ◽

Rna Structures ◽

Rna Sequences ◽

Test Set ◽

Comparative Structure

Abstract Motivation Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures

Bioinformatics ◽

10.1093/bioinformatics/btaa944 ◽

2020 ◽

Author(s):

Louis Becquey ◽

Eric Angel ◽

Fariza Tahi

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Fundamental Problem ◽

3D Structure ◽

Data Gathering ◽

Supplementary Information ◽

Rna Sequences ◽

Scoring Matrices

Abstract Motivation Applied research in machine learning progresses faster when a clean dataset is available and ready to use. Several datasets have been proposed and released over the years for specific tasks such as image classification, speech-recognition and more recently for protein structure prediction. However, for the fundamental problem of RNA structure prediction, information is spread between several databases depending on the level we are interested in: sequence, secondary structure, 3D structure or interactions with other macromolecules. In order to speed-up advances in machine-learning based approaches for RNA secondary and/or 3D structure prediction, a dataset integrating all this information is required, to avoid spending time on data gathering and cleaning. Results Here, we propose the first attempt of a standardized and automatically generated dataset dedicated to RNA combining together: RNA sequences, homology information (under the form of position-specific scoring matrices) and information derived by annotation of available 3D structures (including secondary structure, canonical and non-canonical interactions and backbone torsion angles). The data are retrieved from public databases PDB, Rfam and SILVA. The paper describes the procedure to build such dataset and the RNA structure descriptors we provide. Some statistical descriptions of the resulting dataset are also provided. Availability and implementation The dataset is updated every month and available online (in flat-text file format) on the EvryRNA software platform (https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet). An efficient parallel pipeline to build the dataset is also provided for easy reproduction or modification. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text