scholarly journals ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Linyu Wang ◽  
Xiaodan Zhong ◽  
Shuo Wang ◽  
Yuanning Liu

Abstract Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.

2011 ◽  
Vol 09 (02) ◽  
pp. 317-337 ◽  
Author(s):  
RUJIRA ACHAWANANTAKUN ◽  
YANNI SUN ◽  
SEYEDEH SHOHREH TAKYAR

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string–based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at .


Cells ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 230
Author(s):  
Kevin-Phil Wüsthoff ◽  
Gerhard Steger

In 1985, Keese and Symons proposed a hypothesis on the sequence and secondary structure of viroids from the family : their secondary structure can be subdivided into five structural and functional domains and “viroids have evolved by rearrangement of domains between different viroids infecting the same cell and subsequent mutations within each domain”; this article is one of the most cited in the field of viroids. Employing the pairwise alignment method used by Keese and Symons and in addition to more recent methods, we tried to reproduce the original results and extent them to further members of which were unknown in 1985. Indeed, individual members of consist of a patchwork of sequence fragments from the family but the lengths of fragments do not point to consistent points of rearrangement, which is in conflict with the original hypothesis of fixed domain borders.


2020 ◽  
Vol 36 (10) ◽  
pp. 3072-3076 ◽  
Author(s):  
Elena Rivas ◽  
Jody Clements ◽  
Sean R Eddy

Abstract Pairwise sequence covariations are a signal of conserved RNA secondary structure. We describe a method for distinguishing when lack of covariation signal can be taken as evidence against a conserved RNA structure, as opposed to when a sequence alignment merely has insufficient variation to detect covariations. We find that alignments for several long non-coding RNAs previously shown to lack covariation support do have adequate covariation detection power, providing additional evidence against their proposed conserved structures. Availability and implementation The R-scape web server is at eddylab.org/R-scape, with a link to download the source code. Supplementary information Supplementary data are available at Bioinformatics online.


2009 ◽  
Vol 3 ◽  
pp. BBI.S2578 ◽  
Author(s):  
Junilda Spirollari ◽  
Jason T.L. Wang ◽  
Kaizhong Zhang ◽  
Vivian Bellofatto ◽  
Yongkyu Park ◽  
...  

Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .


Author(s):  
Yang Lin ◽  
Xiaoyong Pan ◽  
Hong-Bin Shen

Abstract Motivation Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. Results In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences.We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. Availability The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2. Supplementary information Supplementary data are available at Bioinformatics online.


2009 ◽  
Vol 07 (02) ◽  
pp. 373-388 ◽  
Author(s):  
HUEI-HUN TSENG ◽  
ZASHA WEINBERG ◽  
JEREMY GORE ◽  
RONALD R. BREAKER ◽  
WALTER L. RUZZO

Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based–homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.


2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0245982
Author(s):  
Yawu Zhao ◽  
Yihui Liu

Protein secondary structure prediction is extremely important for determining the spatial structure and function of proteins. In this paper, we apply an optimized convolutional neural network and long short-term memory neural network models to protein secondary structure prediction, which is called OCLSTM. We use an optimized convolutional neural network to extract local features between amino acid residues. Then use the bidirectional long short-term memory neural network to extract the remote interactions between the internal residues of the protein sequence to predict the protein structure. Experiments are performed on CASP10, CASP11, CASP12, CB513, and 25PDB datasets, and the good performance of 84.68%, 82.36%, 82.91%, 84.21% and 85.08% is achieved respectively. Experimental results show that the model can achieve better results.


Author(s):  
Roman Andriushchenko ◽  
Milan Češka ◽  
Sebastian Junges ◽  
Joost-Pieter Katoen

AbstractThis paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.


Sign in / Sign up

Export Citation Format

Share Document