scholarly journals Assessing Global-Local Secondary Structure Fingerprints to Classify RNA Sequences with Deep Learning

Author(s):  
Kevin Sutanto ◽  
Marcel Turcotte
2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Jake M. Peterson ◽  
Collin A. O’Leary ◽  
Walter N. Moss

AbstractInfluenza virus is a persistent threat to human health; indeed, the deadliest modern pandemic was in 1918 when an H1N1 virus killed an estimated 50 million people globally. The intent of this work is to better understand influenza from an RNA-centric perspective to provide local, structural motifs with likely significance to the influenza infectious cycle for therapeutic targeting. To accomplish this, we analyzed over four hundred thousand RNA sequences spanning three major clades: influenza A, B and C. We scanned influenza segments for local secondary structure, identified/modeled motifs of likely functionality, and coupled the results to an analysis of evolutionary conservation. We discovered 185 significant regions of predicted ordered stability, yet evidence of sequence covariation was limited to 7 motifs, where 3—found in influenza C—had higher than expected amounts of sequence covariation.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Jun Meng ◽  
Qiang Kang ◽  
Zheng Chang ◽  
Yushi Luan

Abstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.


1989 ◽  
Vol 9 (6) ◽  
pp. 2536-2543
Author(s):  
J Y Lee ◽  
D R Engelke

Saccharomyces cerevisiae cellular RNase P is composed of both protein and RNA components that are essential for activity. The isolated holoenzyme contains a highly structured RNA of 369 nucleotides that has extensive sequence similarities to the 286-nucleotide RNA associated with Schizosaccharomyces pombe RNase P but bears little resemblance to the analogous RNA sequences in procaryotes or S. cerevisiae mitochondria. Even so, the predicted secondary structure of S. cerevisiae RNA is strikingly similar to the bacterial phylogenetic consensus rather than to previously predicted structures of other eucaryotic RNase P RNAs.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Rahil Taujale ◽  
Zhongliang Zhou ◽  
Wayland Yeung ◽  
Kelley W. Moremen ◽  
Sheng Li ◽  
...  

AbstractGlycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.


2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


2017 ◽  
Vol 13 (4) ◽  
Author(s):  
Nancy Singh ◽  
Sunil Datt Sharma ◽  
Ragothaman M. Yennamalli

AbstractIn this article, we describe the applicability of a signal processing method, specifically the modified S-transform (MST) method, on RNA sequences to identify periodicities between 2 and 11. MicroRNAs (miRNA) are associated with gene regulation and gene silencing and thus have wide applications in biological sciences. Also, the functionality of miRNA is highly associated with its secondary structures (stem, bulge and loop). Signal processing methods have been previously applied on genomic data to reveal the periodicities that determine a wide variety of biological functions, ranging from exon detection to microsatellite identification in DNA sequences. However, there has been less focus on RNA-based signal processing. Here, we show that the signal processing method can be successfully applied to miRNA sequences. We observed that these periodicities are highly correlated with the secondary structure of miRNA and such methods could possibly be used as indicators of secondary and tertiary structure formation.


2019 ◽  
Vol 16 (9) ◽  
pp. 911-917 ◽  
Author(s):  
Sai Raghavendra Maddhuri Venkata Subramaniya ◽  
Genki Terashi ◽  
Daisuke Kihara

Sign in / Sign up

Export Citation Format

Share Document