Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning

Author(s):  
Jaswinder Singh ◽  
Kuldip Paliwal ◽  
Tongchuan Zhang ◽  
Jaspreet Singh ◽  
Thomas Litfin ◽  
...  

Abstract Motivation The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. Results The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, non-canonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. Availability and implementation Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (10) ◽  
pp. 3072-3076 ◽  
Author(s):  
Elena Rivas ◽  
Jody Clements ◽  
Sean R Eddy

Abstract Pairwise sequence covariations are a signal of conserved RNA secondary structure. We describe a method for distinguishing when lack of covariation signal can be taken as evidence against a conserved RNA structure, as opposed to when a sequence alignment merely has insufficient variation to detect covariations. We find that alignments for several long non-coding RNAs previously shown to lack covariation support do have adequate covariation detection power, providing additional evidence against their proposed conserved structures. Availability and implementation The R-scape web server is at eddylab.org/R-scape, with a link to download the source code. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


2020 ◽  
Vol 36 (9) ◽  
pp. 2920-2922
Author(s):  
Matan Drory Retwitzer ◽  
Vladimir Reinharz ◽  
Alexander Churkin ◽  
Yann Ponty ◽  
Jérôme Waldispühl ◽  
...  

Abstract Summary RNA design has conceptually evolved from the inverse RNA folding problem. In the classical inverse RNA problem, the user inputs an RNA secondary structure and receives an output RNA sequence that folds into it. Although modern RNA design methods are based on the same principle, a finer control over the resulting sequences is sought. As an important example, a substantial number of non-coding RNA families show high preservation in specific regions, while being more flexible in others and this information should be utilized in the design. By using the additional information, RNA design tools can help solve problems of practical interest in the growing fields of synthetic biology and nanotechnology. incaRNAfbinv 2.0 utilizes a fragment-based approach, enabling a control of specific RNA secondary structure motifs. The new version allows significantly more control over the general RNA shape, and also allows to express specific restrictions over each motif separately, in addition to other advanced features. Availability and implementation incaRNAfbinv 2.0 is available through a standalone package and a web-server at https://www.cs.bgu.ac.il/incaRNAfbinv. Source code, command-line and GUI wrappers can be found at https://github.com/matandro/RNAsfbinv. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 39 (suppl_2) ◽  
pp. W100-W106 ◽  
Author(s):  
Michiaki Hamada ◽  
Koichiro Yamada ◽  
Kengo Sato ◽  
Martin C. Frith ◽  
Kiyoshi Asai

2021 ◽  
Author(s):  
Maxie Dion Schmidt ◽  
Anna Kirkpatrick ◽  
Christine Heitsch

AbstractSummaryWe present a new graphical tool for RNA secondary structure analysis. The central feature is the ability to visually compare/contrast up to three base pairing configurations for a given sequence in a compact, standardized circular arc diagram layout. This is complemented by a built-in CT-style file viewer and radial layout substructure viewer which are directly linked to the arc diagram window via the zoom selection tool. Additional functionality includes the computation of some numerical information, and the ability to export images and data for later use. This tool should be of use to researchers seeking to better understand similarities and differences between structural alternatives for an RNA sequence.Availability and implementationhttps://github.com/gtDMMB/RNAStructViz/wikiAuthor [email protected], [email protected], and [email protected]


2020 ◽  
Author(s):  
Masaki Tagashira

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Jaswinder Singh ◽  
Jack Hanson ◽  
Kuldip Paliwal ◽  
Yaoqi Zhou

AbstractThe majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only $$<$$<250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of $$> $$>10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.


mBio ◽  
2020 ◽  
Vol 11 (6) ◽  
Author(s):  
P. Simmonds

ABSTRACT The ultimate outcome of the coronavirus disease 2019 (COVID-19) pandemic is unknown and is dependent on a complex interplay of its pathogenicity, transmissibility, and population immunity. In the current study, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was investigated for the presence of large-scale internal RNA base pairing in its genome. This property, termed genome-scale ordered RNA structure (GORS) has been previously associated with host persistence in other positive-strand RNA viruses, potentially through its shielding effect on viral RNA recognition in the cell. Genomes of SARS-CoV-2 were remarkably structured, with minimum folding energy differences (MFEDs) of 15%, substantially greater than previously examined viruses such as hepatitis C virus (HCV) (MFED of 7 to 9%). High MFED values were shared with all coronavirus genomes analyzed and created by several hundred consecutive energetically favored stem-loops throughout the genome. In contrast to replication-associated RNA structure, GORS was poorly conserved in the positions and identities of base pairing with other sarbecoviruses—even similarly positioned stem-loops in SARS-CoV-2 and SARS-CoV rarely shared homologous pairings, indicative of more rapid evolutionary change in RNA structure than in the underlying coding sequences. Sites predicted to be base paired in SARS-CoV-2 showed less sequence diversity than unpaired sites, suggesting that disruption of RNA structure by mutation imposes a fitness cost on the virus that is potentially restrictive to its longer evolution. Although functionally uncharacterized, GORS in SARS-CoV-2 and other coronaviruses represents important elements in their cellular interactions that may contribute to their persistence and transmissibility. IMPORTANCE The detection and characterization of large-scale RNA secondary structure in the genome of SARS-CoV-2 indicate an extraordinary and unsuspected degree of genome structural organization; this could be effectively visualized through a newly developed contour plotting method that displays positions, structural features, and conservation of RNA secondary structure between related viruses. Such RNA structure imposes a substantial evolutionary cost; paired sites showed greater restriction in diversity and represent a substantial additional constraint in reconstructing its molecular epidemiology. Its biological relevance arises from previously documented associations between possession of structured genomes and persistence, as documented for HCV and several other RNA viruses infecting humans and mammals. Shared properties potentially conferred by large-scale structure in SARS-CoV-2 include increasing evidence for prolonged infections and induced immune dysfunction that prevents development of protective immunity. The findings provide an additional element to cellular interactions that potentially influences the natural history of SARS-CoV-2, its pathogenicity, and its transmission.


Sign in / Sign up

Export Citation Format

Share Document