ncRNA CONSENSUS SECONDARY STRUCTURE DERIVATION USING GRAMMAR STRINGS

2011 ◽  
Vol 09 (02) ◽  
pp. 317-337 ◽  
Author(s):  
RUJIRA ACHAWANANTAKUN ◽  
YANNI SUN ◽  
SEYEDEH SHOHREH TAKYAR

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string–based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at .

Author(s):  
Thomas K. F. Wong ◽  
S. M. Yiu

Non-coding RNAs (ncRNAs) are found to be critical for many biological processes. However, identifying these molecules is very difficult and challenging due to the lack of strong detectable signals such as opening read frames. Most computational approaches rely on the observation that the secondary structures of ncRNA molecules are conserved within the same family. Aligning a known ncRNA to a target candidate to determine the sequence and structural similarity helps in identifying de novo ncRNA molecules that are in the same family of the known ncRNA. However, the problem becomes more difficult if the secondary structure contains pseudoknots. Only until recently, many of the existing approaches could not handle structures with pseudoknots. This chapter reviews the state-of-the-art algorithms for different types of structures that contain pseudoknots including standard pseudoknot, simple non-standard pseudoknot, recursive standard pseudoknot, and recursive simple non-standard pseudoknot. Although none of the algorithms is designed for general pseudoknots, these algorithms already cover all known ncRNAs in both Rfam and PseudoBase databases. The evaluation of the algorithms also shows that the approach is useful in identifying ncRNA molecules in other species, which are in the same family of a known ncRNA.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Linyu Wang ◽  
Xiaodan Zhong ◽  
Shuo Wang ◽  
Yuanning Liu

Abstract Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Michela Quadrini

Abstract RNA molecules play crucial roles in various biological processes. Their three-dimensional configurations determine the functions and, in turn, influences the interaction with other molecules. RNAs and their interaction structures, the so-called RNA–RNA interactions, can be abstracted in terms of secondary structures, i.e., a list of the nucleotide bases paired by hydrogen bonding within its nucleotide sequence. Each secondary structure, in turn, can be abstracted into cores and shadows. Both are determined by collapsing nucleotides and arcs properly. We formalize all of these abstractions as arc diagrams, whose arcs determine loops. A secondary structure, represented by an arc diagram, is pseudoknot-free if its arc diagram does not present any crossing among arcs otherwise, it is said pseudoknotted. In this study, we face the problem of identifying a given structural pattern into secondary structures or the associated cores or shadow of both RNAs and RNA–RNA interactions, characterized by arbitrary pseudoknots. These abstractions are mapped into a matrix, whose elements represent the relations among loops. Therefore, we face the problem of taking advantage of matrices and submatrices. The algorithms, implemented in Python, work in polynomial time. We test our approach on a set of 16S ribosomal RNAs with inhibitors of Thermus thermophilus, and we quantify the structural effect of the inhibitors.


RSC Advances ◽  
2021 ◽  
Vol 11 (14) ◽  
pp. 8277-8281
Author(s):  
Xiaoyan Huang ◽  
Zhuoxi Liang ◽  
Jiqiu Wen ◽  
Yong Liu ◽  
Ayoub Taallah ◽  
...  

The collaboration of nanochannel confinement and dynamic negative growth creates orderly aligned manganese-based nanotube arrays decorated with nanopores as a secondary structure.


2013 ◽  
Vol 39 (1) ◽  
pp. 57-85 ◽  
Author(s):  
Alexander Fraser ◽  
Helmut Schmid ◽  
Richárd Farkas ◽  
Renjing Wang ◽  
Hinrich Schütze

We study constituent parsing of German, a morphologically rich and less-configurational language. We use a probabilistic context-free grammar treebank grammar that has been adapted to the morphologically rich properties of German by markovization and special features added to its productions. We evaluate the impact of adding lexical knowledge. Then we examine both monolingual and bilingual approaches to parse reranking. Our reranking parser is the new state of the art in constituency parsing of the TIGER Treebank. We perform an analysis, concluding with lessons learned, which apply to parsing other morphologically rich and less-configurational languages.


2021 ◽  
Vol 14 (11) ◽  
pp. 2445-2458
Author(s):  
Valerio Cetorelli ◽  
Paolo Atzeni ◽  
Valter Crescenzi ◽  
Franco Milicchio

We introduce landmark grammars , a new family of context-free grammars aimed at describing the HTML source code of pages published by large and templated websites and therefore at effectively tackling Web data extraction problems. Indeed, they address the inherent ambiguity of HTML, one of the main challenges of Web data extraction, which, despite over twenty years of research, has been largely neglected by the approaches presented in literature. We then formalize the Smallest Extraction Problem (SEP), an optimization problem for finding the grammar of a family that best describes a set of pages and contextually extract their data. Finally, we present an unsupervised learning algorithm to induce a landmark grammar from a set of pages sharing a common HTML template, and we present an automatic Web data extraction system. The experiments on consolidated benchmarks show that the approach can substantially contribute to improve the state-of-the-art.


2005 ◽  
Vol 85 (4) ◽  
pp. 437-448 ◽  
Author(s):  
P. Yu ◽  
J. J. McKinnon ◽  
H. W. Soita ◽  
C. R. Christensen ◽  
D. A. Christensen

The objectives of the study were to use synchrotron Fourier transform infrared microspectroscopy (S-FTIR) as a novel approach to: (1) reveal ultra-structural chemical features of protein secondary structures of flaxseed tissues affected by variety (golden and brown) and heat processing (raw and roasted), and (2) quantify protein secondary structures using Gaussian and Lorentzian methods of multi-component peak modeling. By using multi-component peak modeling at protein amide I region of 1700–1620 cm-1, the results showed that the golden flaxseed contained relatively higher percentage of α-helix (47.1 vs. 36.9%), lower percentage of β-sheet (37.2 vs. 46.3%) and higher (P < 0.05) ratio of α-helix to β-sheet than the brown flaxseed (1.3 vs. 0.8). The roasting reduced (P < 0.05) percentage of α-helix (from 47.1 to 36.1%), increased percentage of β-sheet (from 37.2 to 49.8%) and reduced α-helix to β-sheet ratio (1.3 to 0.7) of the golden flaxseed tissues. However, the roasting did not affect percentage and ratio of α-helix and β-sheet in the brown flaxseed tissue. No significant differences were found in quantification of protein secondary structures between Gaussian and Lorentzian methods. These results demonstrate the potential of highly spatially resolved S-FTIR to localize relatively pure protein in the tissue and reveal protein secondary structures at a cellular level. The results indicated relative differences in protein secondary structures between flaxseed varieties and differences in sensitivities of protein secondary structure to the heat processing. Further study is needed to understand the relationship between protein secondary structure and protein digestion and utilization of flaxseed and to investigate whether the changes in the relative amounts of protein secondary structures are primarily responsible for differences in protein availability. Key words: Synchrotron, FTIR microspectrosopy, flaxseeds, intrinsic structural matrix, protein secondary structures, protein nutritive value


1987 ◽  
Vol 7 (9) ◽  
pp. 3194-3198 ◽  
Author(s):  
D Solnick ◽  
S I Lee

We set up an alternative splicing system in vitro in which the relative amounts of two spliced RNAs, one containing and the other lacking a particular exon, were directly proportional to the length of an inverted repeat inserted into the flanking introns. We then used the system to measure the effect of intramolecular complementarity on alternative splicing in vivo. We found that an alternative splice was induced in vivo only when the introns contained more than approximately 50 nucleotides of perfect complementarity, that is, only when the secondary structure was much more stable than most if not all possible secondary structures in natural mRNA precursors. We showed further that intron insertions containing long complements to splice sites and a branch point inhibited splicing in vitro but not in vivo. These results raise the possibility that in cells most pre-mRNA secondary structures either are not maintained long enough to influence splicing choices, or never form at all.


2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


Biomolecules ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1773
Author(s):  
Bahareh Behkamal ◽  
Mahmoud Naghibzadeh ◽  
Mohammad Reza Saberi ◽  
Zeinab Amiri Tehranizadeh ◽  
Andrea Pagnani ◽  
...  

Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4–10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.


Sign in / Sign up

Export Citation Format

Share Document