scholarly journals RMalign: an RNA structural alignment tool based on a size independent scoring function

2018 ◽  
Author(s):  
Jinfang Zheng ◽  
Juan Xie ◽  
Xu Hong ◽  
Shiyong Liu

ABSTRACTRNA-protein 3D complex structure prediction is still challenging. Recently, a template-based approach PRIME is proposed in our team to build RNA-protein complex 3D structure models with a higher success rate than computational docking software. However, scoring function of RNA alignment algorithm SARA in PRIME is size-dependent, which limits its ability to detect templates in some cases. Herein, we developed a novel RNA 3D structural alignment approach RMalign, which is based on a size-independent scoring function RMscore. The parameter in RMscore is then optimized in randomly selected RNA pairs and phase transition points (from dissimilar to similar) are determined in another randomly selected RNA pairs. In tRNA benchmarking, the precision of RMscore is higher than that of SARAscore (0.8771 and 0.7766, respectively) with phase transition points. In balance-FSCOR benchmarking, RMalign performed as good as ESA-RNA with a non-normalized score measuring RNA structure similarity. In balance-x-FSCOR benchmarking, RMalign achieves much better than a state-of-the-art RNA 3D structural alignment approach SARA due to a size-independent scoring function. Taking the advantage of RMalign, we update our RNA-protein modeling approach PRIME to version 2.0. The PRIME2.0 significantly improves about 10% success rate than PRIME.Author summaryRNA structures are important for RNA functions. With the increasing of RNA structures in PDB, RNA 3D structure alignment approaches have been developed. However, the scoring function which is used for measuring RNA structural similarity is still length dependent. This shortcoming limits its ability to detect RNA structure templates in modeling RNA structure or RNA-protein 3D complex structure. Thus, we developed a length independent scoring function RMscore to enhance the ability to detect RNA structure homologs. The benchmarking data shows that RMscore can distinct the similar and dissimilar RNA structure effectively. RMscore should be a useful scoring function in modeling RNA structures for the biological community. Based on RMscore, we develop an RNA 3D structure alignment RMalign. In both RNA structure and function classification benchmarking, RMalign obtains as good as or even better performance than the state-of-the-art approaches. With a length independent scoring function RMscore, RMalign should be useful for the modeling RNA structures. Based on above results, we update PRIME to PRIME2.0. We provide a more accurate RNA-protein 3D complex structure modeling tool PRIME2.0 which should be useful for the biological community.

2022 ◽  
Vol 1 ◽  
Author(s):  
Zhi-Hao Guo ◽  
Li Yuan ◽  
Ya-Lan Tan ◽  
Ben-Gong Zhang ◽  
Ya-Zhou Shi

The 3D architectures of RNAs are essential for understanding their cellular functions. While an accurate scoring function based on the statistics of known RNA structures is a key component for successful RNA structure prediction or evaluation, there are few tools or web servers that can be directly used to make comprehensive statistical analysis for RNA 3D structures. In this work, we developed RNAStat, an integrated tool for making statistics on RNA 3D structures. For given RNA structures, RNAStat automatically calculates RNA structural properties such as size and shape, and shows their distributions. Based on the RNA structure annotation from DSSR, RNAStat provides statistical information of RNA secondary structure motifs including canonical/non-canonical base pairs, stems, and various loops. In particular, the geometry of base-pairing/stacking can be calculated in RNAStat by constructing a local coordinate system for each base. In addition, RNAStat also supplies the distribution of distance between any atoms to the users to help build distance-based RNA statistical potentials. To test the usability of the tool, we established a non-redundant RNA 3D structure dataset, and based on the dataset, we made a comprehensive statistical analysis on RNA structures, which could have the guiding significance for RNA structure modeling. The python code of RNAStat, the dataset used in this work, and corresponding statistical data files are freely available at GitHub (https://github.com/RNA-folding-lab/RNAStat).


2010 ◽  
Vol 06 (01) ◽  
pp. 77-95
Author(s):  
TUN-WEN PAI ◽  
RUEI-HSIANG CHANG ◽  
CHIEN-MING CHEN ◽  
PO-HAN SU ◽  
LEE-JYI WANG ◽  
...  

Protein structure alignment facilitates the analysis of protein functionality. Through superimposed structures and the comparison of variant components, common or specific features of proteins can be identified. Several known protein families exhibit analogous tertiary structures but divergent primary sequences. These proteins in the same structural class are unable to be aligned by sequence-based methods. The main objective of the present study was to develop an efficient and effective algorithm for multiple structure alignment based on geometrical correlation of secondary structures, which are conserved in evolutionary heritage. The method utilizes mutual correlation analysis of secondary structure elements (SSEs) and selects representative segments as the key anchors for structural alignment. The system exploits a fast vector transformation technique to represent SSEs in vector format, and the mutual geometrical relationship among vectors is projected onto an angle-distance map. Through a scoring function and filtering mechanisms, the best candidates of vectors are selected, and an effective constrained multiple structural alignment module is performed. The correctness of the algorithm was verified by the multiple structure alignment of proteins in the SCOP database. Several protein sets with low sequence identities were aligned, and the results were compared with those obtained by three well-known structural alignment approaches. The results show that the proposed method is able to perform multiple structural alignments effectively and to obtain satisfactory results, especially for proteins possessing low sequence identity.


2019 ◽  
Vol 35 (21) ◽  
pp. 4459-4461 ◽  
Author(s):  
Sha Gong ◽  
Chengxin Zhang ◽  
Yang Zhang

Abstract Motivation Comparison of RNA 3D structures can be used to infer functional relationship of RNA molecules. Most of the current RNA structure alignment programs are built on size-dependent scales, which complicate the interpretation of structure and functional relations. Meanwhile, the low speed prevents the programs from being applied to large-scale RNA structural database search. Results We developed an open-source algorithm, RNA-align, for RNA 3D structure alignment which has the structure similarity scaled by a size-independent and statistically interpretable scoring metric. Large-scale benchmark tests show that RNA-align significantly outperforms other state-of-the-art programs in both alignment accuracy and running speed. The major advantage of RNA-align lies at the quick convergence of the heuristic alignment iterations and the coarse-grained secondary structure assignment, both of which are crucial to the speed and accuracy of RNA structure alignments. Availability and implementation https://zhanglab.ccmb.med.umich.edu/RNA-align/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (S19) ◽  
Author(s):  
Lei Deng ◽  
Guolun Zhong ◽  
Chenzhe Liu ◽  
Judong Luo ◽  
Hui Liu

Abstract Background Protein comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging. Results Here, we propose MADOKA, an ultra-fast approach for massive structural neighbor searching using a novel two-phase algorithm. Initially, we apply a fast alignment between pairwise structures. Then, we employ a score to select pairs with more similarity to carry out a more accurate fragment-based residue-level alignment. MADOKA performs about 6–100 times faster than existing methods, including TM-align and SAL, in massive alignments. Moreover, the quality of structural alignment of MADOKA is better than the existing algorithms in terms of TM-score and number of aligned residues. We also develop a web server to search structural neighbors in PDB database (About 360,000 protein chains in total), as well as additional features such as 3D structure alignment visualization. The MADOKA web server is freely available at: http://madoka.denglab.org/ Conclusions MADOKA is an efficient approach to search for protein structure similarity. In addition, we provide a parallel implementation of MADOKA which exploits massive power of multi-core CPUs.


2021 ◽  
Vol 8 ◽  
Author(s):  
Yuanzhe Zhou ◽  
Jun Li ◽  
Travis Hurst ◽  
Shi-Jie Chen

Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) chemical probing serves as a convenient and efficient experiment technique for providing information about RNA local flexibility. The local structural information contained in SHAPE reactivity data can be used as constraints in 2D/3D structure predictions. Here, we present SHAPE predictoR (SHAPER), a web server for fast and accurate SHAPE reactivity prediction. The main purpose of the SHAPER web server is to provide a portal that uses experimental SHAPE data to refine 2D/3D RNA structure selection. Input structures for the SHAPER server can be obtained through experimental or computational modeling. The SHAPER server can accept RNA structures with single or multiple conformations, and the predicted SHAPE profile and correlation with experimental SHAPE data (if provided) for each conformation can be freely downloaded through the web portal. The SHAPER web server is available at http://rna.physics.missouri.edu/shaper/.


2019 ◽  
Vol 39 (2) ◽  
Author(s):  
Almudena Ponce-Salvatierra ◽  
Astha ◽  
Katarzyna Merdas ◽  
Chandran Nithin ◽  
Pritha Ghosh ◽  
...  

Abstract RNA molecules are master regulators of cells. They are involved in a variety of molecular processes: they transmit genetic information, sense cellular signals and communicate responses, and even catalyze chemical reactions. As in the case of proteins, RNA function is dictated by its structure and by its ability to adopt different conformations, which in turn is encoded in the sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore the majority of known RNAs remain structurally uncharacterized. To address this problem, predictive computational methods were developed based on the accumulated knowledge of RNA structures determined so far, the physical basis of the RNA folding, and taking into account evolutionary considerations, such as conservation of functionally important motifs. However, all theoretical methods suffer from various limitations, and they are generally unable to accurately predict structures for RNA sequences longer than 100-nt residues unless aided by additional experimental data. In this article, we review experimental methods that can generate data usable by computational methods, as well as computational approaches for RNA structure prediction that can utilize data from experimental analyses. We outline methods and data types that can be potentially useful for RNA 3D structure modeling but are not commonly used by the existing software, suggesting directions for future development.


Genes ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 392 ◽  
Author(s):  
Bernhard Thiel ◽  
Roman Ochsenreiter ◽  
Veerendra Gadekar ◽  
Andrea Tanzer ◽  
Ivo L. Hofacker

In this work, we present a computational screen conducted for functional RNA structures, resulting in over 100,000 conserved RNA structure elements found in alignments of mouse (mm10) against 59 other vertebrates. We explicitly included masked repeat regions to explore the potential of transposable elements and low-complexity regions to give rise to regulatory RNA elements. In our analysis pipeline, we implemented a four-step procedure: (i) we screened genome-wide alignments for potential structure elements using RNAz-2, (ii) realigned and refined candidate loci with LocARNA-P, (iii) scored candidates again with RNAz-2 in structure alignment mode, and (iv) searched for additional homologous loci in mouse genome that were not covered by genome alignments. The 3’-untranslated regions (3’-UTRs) of protein-coding genes and small noncoding RNAs are enriched for structures, while coding sequences are depleted. Repeat-associated loci make up about 95% of the homologous loci identified and are, as expected, predominantly found in intronic and intergenic regions. Nevertheless, we report the structure elements enriched in specific genome elements, such as 3’-UTRs and long noncoding RNAs (lncRNAs). We provide full access to our results via a custom UCSC genome browser trackhub freely available on our website (http://rna.tbi.univie.ac.at/trackhubs/#RNAz).


2005 ◽  
Vol 03 (03) ◽  
pp. 609-626 ◽  
Author(s):  
ZHUOZHI WANG ◽  
KAIZHONG ZHANG

Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,19 we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be [Formula: see text], where n is the total number of structures.


2014 ◽  
Vol 51 ◽  
pp. 413-441 ◽  
Author(s):  
S. Cai ◽  
C. Luo ◽  
K. Su

It is widely acknowledged that stochastic local search (SLS) algorithms can efficiently find models for satisfiable instances of the satisfiability (SAT) problem, especially for random k-SAT instances. However, compared to random 3-SAT instances where SLS algorithms have shown great success, random k-SAT instances with long clauses remain very difficult. Recently, the notion of second level score, denoted as "score_2", was proposed for improving SLS algorithms on long-clause SAT instances, and was first used in the powerful CCASat solver as a tie breaker. In this paper, we propose three new scoring functions based on score_2. Despite their simplicity, these functions are very effective for solving random k-SAT with long clauses. The first function combines score and score_2, and the second one additionally integrates the diversification property "age". These two functions are used in developing a new SLS algorithm called CScoreSAT. Experimental results on large random 5-SAT and 7-SAT instances near phase transition show that CScoreSAT significantly outperforms previous SLS solvers. However, CScoreSAT cannot rival its competitors on random k-SAT instances at phase transition. We improve CScoreSAT for such instances by another scoring function which combines score_2 with age. The resulting algorithm HScoreSAT exhibits state-of-the-art performance on random k-SAT (k>3) instances at phase transition. We also study the computation of score_2, including its implementation and computational complexity.


2021 ◽  
Vol 49 (6) ◽  
pp. 3409-3426
Author(s):  
Arancha Catalan-Moreno ◽  
Marta Cela ◽  
Pilar Menendez-Gil ◽  
Naiara Irurzun ◽  
Carlos J Caballero ◽  
...  

Abstract Thermoregulation of virulence genes in bacterial pathogens is essential for environment-to-host transition. However, the mechanisms governing cold adaptation when outside the host remain poorly understood. Here, we found that the production of cold shock proteins CspB and CspC from Staphylococcus aureus is controlled by two paralogous RNA thermoswitches. Through in silico prediction, enzymatic probing and site-directed mutagenesis, we demonstrated that cspB and cspC 5′UTRs adopt alternative RNA structures that shift from one another upon temperature shifts. The open (O) conformation that facilitates mRNA translation is favoured at ambient temperatures (22°C). Conversely, the alternative locked (L) conformation, where the ribosome binding site (RBS) is sequestered in a double-stranded RNA structure, is folded at host-related temperatures (37°C). These structural rearrangements depend on a long RNA hairpin found in the O conformation that sequesters the anti-RBS sequence. Notably, the remaining S. aureus CSP, CspA, may interact with a UUUGUUU motif located in the loop of this long hairpin and favour the folding of the L conformation. This folding represses CspB and CspC production at 37°C. Simultaneous deletion of the cspB/cspC genes or their RNA thermoswitches significantly decreases S. aureus growth rate at ambient temperatures, highlighting the importance of CspB/CspC thermoregulation when S. aureus transitions from the host to the environment.


Sign in / Sign up

Export Citation Format

Share Document