ncDLRES: a novel method for non-coding RNAs family prediction based on dynamic LSTM and ResNet

Abstract Background Studies have proven that the same family of non-coding RNAs (ncRNAs) have similar functions, so predicting the ncRNAs family is helpful to the research of ncRNAs functions. The existing calculation methods mainly fall into two categories: the first type is to predict ncRNAs family by learning the features of sequence or secondary structure, and the other type is to predict ncRNAs family by the alignment among homologs sequences. In the first type, some methods predict ncRNAs family by learning predicted secondary structure features. The inaccuracy of predicted secondary structure may cause the low accuracy of those methods. Different from that, ncRFP directly learning the features of ncRNA sequences to predict ncRNAs family. Although ncRFP simplifies the prediction process and improves the performance, there is room for improvement in ncRFP performance due to the incomplete features of its input data. In the secondary type, the homologous sequence alignment method can achieve the highest performance at present. However, due to the need for consensus secondary structure annotation of ncRNA sequences, and the helplessness for modeling pseudoknots, the use of the method is limited. Results In this paper, a novel method “ncDLRES”, which according to learning the sequence features, is proposed to predict the family of ncRNAs based on Dynamic LSTM (Long Short-term Memory) and ResNet (Residual Neural Network). Conclusions ncDLRES extracts the features of ncRNA sequences based on Dynamic LSTM and then classifies them by ResNet. Compared with the homologous sequence alignment method, ncDLRES reduces the data requirement and expands the application scope. By comparing with the first type of methods, the performance of ncDLRES is greatly improved.

Download Full-text

ncRNA CONSENSUS SECONDARY STRUCTURE DERIVATION USING GRAMMAR STRINGS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720011005501 ◽

2011 ◽

Vol 09 (02) ◽

pp. 317-337 ◽

Cited By ~ 4

Author(s):

RUJIRA ACHAWANANTAKUN ◽

YANNI SUN ◽

SEYEDEH SHOHREH TAKYAR

Keyword(s):

Secondary Structure ◽

State Of The Art ◽

Secondary Structures ◽

Consensus Structure ◽

Structure Quality ◽

Consensus Secondary Structure ◽

Structure Representation ◽

Structure Annotation ◽

Context Free ◽

String Alignment

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string–based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at .

Download Full-text

Conserved Motifs and Domains in Members of Pospiviroidae

Cells ◽

10.3390/cells11020230 ◽

2022 ◽

Vol 11 (2) ◽

pp. 230

Author(s):

Kevin-Phil Wüsthoff ◽

Gerhard Steger

Keyword(s):

Secondary Structure ◽

Pairwise Alignment ◽

Functional Domains ◽

Alignment Method ◽

Conserved Motifs ◽

Fixed Domain ◽

Original Hypothesis ◽

The Family

In 1985, Keese and Symons proposed a hypothesis on the sequence and secondary structure of viroids from the family : their secondary structure can be subdivided into five structural and functional domains and “viroids have evolved by rearrangement of domains between different viroids infecting the same cell and subsequent mutations within each domain”; this article is one of the most cited in the field of viroids. Employing the pairwise alignment method used by Keese and Symons and in addition to more recent methods, we tried to reproduce the original results and extent them to further members of which were unknown in 1985. Indeed, individual members of consist of a patchwork of sequence fragments from the family but the lengths of fragments do not point to consistent points of rearrangement, which is in conflict with the original hypothesis of fixed domain borders.

Download Full-text

Estimating the power of sequence covariation for detecting conserved RNA structure

Bioinformatics ◽

10.1093/bioinformatics/btaa080 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3072-3076 ◽

Cited By ~ 11

Author(s):

Elena Rivas ◽

Jody Clements ◽

Sean R Eddy

Keyword(s):

Secondary Structure ◽

Sequence Alignment ◽

Rna Structure ◽

Rna Secondary Structure ◽

Source Code ◽

Web Server ◽

Supplementary Information ◽

Supplementary Data ◽

Detection Power ◽

Non Coding Rnas

Abstract Pairwise sequence covariations are a signal of conserved RNA secondary structure. We describe a method for distinguishing when lack of covariation signal can be taken as evidence against a conserved RNA structure, as opposed to when a sequence alignment merely has insufficient variation to detect covariations. We find that alignments for several long non-coding RNAs previously shown to lack covariation support do have adequate covariation detection power, providing additional evidence against their proposed conserved structures. Availability and implementation The R-scape web server is at eddylab.org/R-scape, with a link to download the source code. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Predicting Consensus Structures for RNA Alignments via Pseudo-Energy Minimization

Bioinformatics and Biology Insights ◽

10.4137/bbi.s2578 ◽

2009 ◽

Vol 3 ◽

pp. BBI.S2578 ◽

Cited By ~ 8

Author(s):

Junilda Spirollari ◽

Jason T.L. Wang ◽

Kaizhong Zhang ◽

Vivian Bellofatto ◽

Yongkyu Park ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Energy Minimization ◽

Secondary Structure Prediction ◽

Sequence Alignments ◽

Rna Sequences ◽

Multiple Sequence ◽

Consensus Secondary Structure

Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .

Download Full-text

A data-centric pipeline using convolutional neural network to select better multiple sequence alignment method

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics ◽

10.1145/3388440.3414909 ◽

2020 ◽

Author(s):

Mengmeng Kuang ◽

Hing-fung Ting

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Method ◽

Multiple Sequence

Download Full-text

lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning

Bioinformatics ◽

10.1093/bioinformatics/btab127 ◽

2021 ◽

Author(s):

Yang Lin ◽

Xiaoyong Pan ◽

Hong-Bin Shen

Keyword(s):

Subcellular Localization ◽

Cell Line ◽

Cell Lines ◽

Short Term Memory ◽

Computational Method ◽

Language Models ◽

Supplementary Information ◽

Deep Model ◽

A Cell ◽

Non Coding Rnas

Abstract Motivation Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. Results In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences.We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. Availability The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FINDING NON-CODING RNAs THROUGH GENOME-SCALE CLUSTERING

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004126 ◽

2009 ◽

Vol 07 (02) ◽

pp. 373-388 ◽

Cited By ~ 21

Author(s):

HUEI-HUN TSENG ◽

ZASHA WEINBERG ◽

JEREMY GORE ◽

RONALD R. BREAKER ◽

WALTER L. RUZZO

Keyword(s):

Secondary Structure ◽

Efficient Method ◽

Regulatory Mechanisms ◽

Homology Search ◽

Substantial Portion ◽

Primary Sequence ◽

Clustering Method ◽

Microbial Genomes ◽

Non Coding Rnas ◽

Genome Scale

Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based–homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

10.1101/2020.08.10.244442 ◽

2020 ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Thermodynamic Integration ◽

Rna Secondary Structure Prediction ◽

Rna Secondary Structures ◽

Non Coding Rnas

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.

Download Full-text

OCLSTM: Optimized convolutional and long short-term memory neural network model for protein secondary structure prediction

PLoS ONE ◽

10.1371/journal.pone.0245982 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0245982

Author(s):

Yawu Zhao ◽

Yihui Liu

Keyword(s):

Neural Network ◽

Secondary Structure ◽

Structure Prediction ◽

Short Term Memory ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Short Term ◽

Protein Secondary Structure Prediction ◽

Term Memory ◽

Long Short Term Memory

Protein secondary structure prediction is extremely important for determining the spatial structure and function of proteins. In this paper, we apply an optimized convolutional neural network and long short-term memory neural network models to protein secondary structure prediction, which is called OCLSTM. We use an optimized convolutional neural network to extract local features between amino acid residues. Then use the bidirectional long short-term memory neural network to extract the remote interactions between the internal residues of the protein sequence to predict the protein structure. Experiments are performed on CASP10, CASP11, CASP12, CB513, and 25PDB datasets, and the good performance of 84.68%, 82.36%, 82.91%, 84.21% and 85.08% is achieved respectively. Experimental results show that the model can achieve better results.

Download Full-text

Inductive Synthesis for Probabilistic Programs Reaches New Horizons

Tools and Algorithms for the Construction and Analysis of Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-72016-2_11 ◽

2021 ◽

pp. 191-209

Author(s):

Roman Andriushchenko ◽

Milan Češka ◽

Sebastian Junges ◽

Joost-Pieter Katoen

Keyword(s):

Deductive Reasoning ◽

Synthesis Process ◽

Worst Case ◽

New Horizons ◽

The Family ◽

Pruning Strategy ◽

Finite State ◽

Novel Method ◽

Partially Observable ◽

Probabilistic Programs

AbstractThis paper presents a novel method for the automated synthesis of probabilistic programs. The starting point is a program sketch representing a finite family of finite-state Markov chains with related but distinct topologies, and a reachability specification. The method builds on a novel inductive oracle that greedily generates counter-examples (CEs) for violating programs and uses them to prune the family. These CEs leverage the semantics of the family in the form of bounds on its best- and worst-case behaviour provided by a deductive oracle using an MDP abstraction. The method further monitors the performance of the synthesis and adaptively switches between inductive and deductive reasoning. Our experiments demonstrate that the novel CE construction provides a significantly faster and more effective pruning strategy leading to an accelerated synthesis process on a wide range of benchmarks. For challenging problems, such as the synthesis of decentralized partially-observable controllers, we reduce the run-time from a day to minutes.

Download Full-text