scholarly journals reactIDR: Evaluation of the statistical reproducibility of high-throughput structural analyses for a robust RNA reactivity classification

2018 ◽  
Author(s):  
Risa Kawaguchi ◽  
Hisanori Kiryu ◽  
Junichi Iwakiri ◽  
Jun Sese

AbstractMotivationRecently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures called high-throughput RNA structural (HTS) analy- sis, and dozens of different protocols were used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on experimental data generation methodology, which results in many difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.ResultsHere, we introduced a statistical framework, reactIDR, which is applicable to the experimental data obtained using multiple HTS methodologies, and it classifies the nucleotides into three structural categories, stem, loop, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model (HMM) to discriminate accurately between the true and spurious signals obtained in the replicated HTS experiments. In reactIDR, IDR and HMM parameters are efficiently optimized by using an expectation-maximization algorithm. Furthermore, if known reference structures are given, a supervised learning can be applicable in a semi-supervised manner. The results of our analyses for real HTS data showed that reactIDR achieved the highest accuracy in the classification problem of stem/loop structures of rRNA using both individual and integrated HTS datasets as well as the best correspondence with the three-dimensional structure. Because reactIDR is the first method to compare HTS datasets obtained from multiple sources in a single unified model, it has a great potential to increase the accuracy of RNA secondary structure prediction at transcriptome-wide level with further experiments performed.AvailabilityreactIDR is implemented in Python. Source code is publicly available at https://github.com/carushi/reactIDRhttps://github.com/carushi/[email protected] informationSupplementary data are available at online.

1981 ◽  
Vol 195 (1) ◽  
pp. 31-40 ◽  
Author(s):  
F E Cohen ◽  
J Novotný ◽  
M J E Sternberg ◽  
D G Campbell ◽  
A F Williams

The Thy-1 membrane glycoprotein from rat brain is shown to have structural and sequence homologies with immunoglobulin (Ig) domains on the basis of the following evidence. 1. The two disulphide bonds of Thy-1 are both consistent with the Ig-fold. 2. The molecule contains extensive beta-structure as shown by the c.d. spectrum. 3. Secondary structure prediction locates beta-strands along the sequence in a manner consistent with the Ig-fold. 4. On the basis of rules derived from known beta-sheet structures, a three-dimensional structure with the Ig-fold is predicted as favourable for Thy-1. 5. Sequences in the proposed beta-strands of Thy-1 and known beta-strands of Ig domains show significant sequence homology. This homology is statistically more significant than for the comparison of proposed beta-strand sequences of beta 2-microglobulin with Ig domains. An hypothesis is presented for the possible functional significance of an evolutionary relationship between Thy-1 and Ig. It is suggested that both Thy-1 and Ig evolved from primitive molecules, with an Ig fold, which mediated cell--cell interactions. The present-day role of Thy-1 may be similar to that of the primitive domain.


2019 ◽  
Author(s):  
Laura Avino Esteban ◽  
Lyubov R Lonishin ◽  
Daniil Bobrovskiy ◽  
Gregory Leleytner ◽  
Natalya S Bogatyreva ◽  
...  

Abstract Motivation Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability https://github.com/ivankovlab/HypercubeME.git Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (20) ◽  
pp. 5021-5026 ◽  
Author(s):  
Gang Xu ◽  
Qinghua Wang ◽  
Jianpeng Ma

Abstract Motivation Predictions of protein backbone torsion angles (ϕ and ψ) and secondary structure from sequence are crucial subproblems in protein structure prediction. With the development of deep learning approaches, their accuracies have been significantly improved. To capture the long-range interactions, most studies integrate bidirectional recurrent neural networks into their models. In this study, we introduce and modify a recently proposed architecture named Transformer to capture the interactions between the two residues theoretically with arbitrary distance. Moreover, we take advantage of multitask learning to improve the generalization of neural network by introducing related tasks into the training process. Similar to many previous studies, OPUS-TASS uses an ensemble of models and achieves better results. Results OPUS-TASS uses the same training and validation sets as SPOT-1D. We compare the performance of OPUS-TASS and SPOT-1D on TEST2016 (1213 proteins) and TEST2018 (250 proteins) proposed in the SPOT-1D paper, CASP12 (55 proteins), CASP13 (32 proteins) and CASP-FM (56 proteins) proposed in the SAINT paper, and a recently released PDB structure collection from CAMEO (93 proteins) named as CAMEO93. On these six test sets, OPUS-TASS achieves consistent improvements in both backbone torsion angles prediction and secondary structure prediction. On CAMEO93, SPOT-1D achieves the mean absolute errors of 16.89 and 23.02 for ϕ and ψ predictions, respectively, and the accuracies for 3- and 8-state secondary structure predictions are 87.72 and 77.15%, respectively. In comparison, OPUS-TASS achieves 16.56 and 22.56 for ϕ and ψ predictions, and 89.06 and 78.87% for 3- and 8-state secondary structure predictions, respectively. In particular, after using our torsion angles refinement method OPUS-Refine as the post-processing procedure for OPUS-TASS, the mean absolute errors for final ϕ and ψ predictions are further decreased to 16.28 and 21.98, respectively. Availability and implementation The training and the inference codes of OPUS-TASS and its data are available at https://github.com/thuxugang/opus_tass. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (8) ◽  
pp. 2451-2457
Author(s):  
Louis Becquey ◽  
Eric Angel ◽  
Fariza Tahi

Abstract Motivation RNA loops have been modelled and clustered from solved 3D structures into ordered collections of recurrent non-canonical interactions called ‘RNA modules’, available in databases. This work explores what information from such modules can be used to improve secondary structure prediction. We propose a bi-objective method for predicting RNA secondary structures by minimizing both an energy-based and a knowledge-based potential. The tool, called BiORSEO, outputs secondary structures corresponding to the optimal solutions from the Pareto set. Results We compare several approaches to predict secondary structures using inserted RNA modules information: two module data sources, Rna3Dmotif and the RNA 3D Motif Atlas, and different ways to score the module insertions: module size, module complexity or module probability according to models like JAR3D and BayesPairing. We benchmark them against a large set of known secondary structures, including some state-of-the-art tools, and comment on the usefulness of the half physics-based, half data-based approach. Availability and implementation The software is available for download on the EvryRNA website, as well as the datasets. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Fabian Sievers ◽  
Desmond G Higgins

Abstract Motivation Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. Results We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. Availability and implementation QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. Supplementary information Supplementary data are available at Bioinformatics online


2009 ◽  
Vol 42 (2) ◽  
pp. 336-338 ◽  
Author(s):  
Ankit Gupta ◽  
Avnish Deshpande ◽  
Janardhan Kumar Amburi ◽  
Radhakrishnan Sabarinathan ◽  
Ramaswamy Senthilkumar ◽  
...  

Sequence–structure correlation studies are important in deciphering the relationships between various structural aspects, which may shed light on the protein-folding problem. The first step of this process is the prediction of secondary structure for a protein sequence of unknown three-dimensional structure. To this end, a web server has been created to predict the consensus secondary structure using well known algorithms from the literature. Furthermore, the server allows users to see the occurrence of predicted secondary structural elements in other structure and sequence databases and to visualize predicted helices as a helical wheel plot. The web server is accessible at http://bioserver1.physics.iisc.ernet.in/cssp/.


2020 ◽  
Vol 48 (11) ◽  
pp. 5839-5848 ◽  
Author(s):  
Sandro Bottaro ◽  
Parker J Nichols ◽  
Beat Vögeli ◽  
Michele Parrinello ◽  
Kresten Lindorff-Larsen

Abstract We provide an atomic-level description of the structure and dynamics of the UUCG RNA stem–loop by combining molecular dynamics simulations with experimental data. The integration of simulations with exact nuclear Overhauser enhancements data allowed us to characterize two distinct states of this molecule. The most stable conformation corresponds to the consensus three-dimensional structure. The second state is characterized by the absence of the peculiar non-Watson–Crick interactions in the loop region. By using machine learning techniques we identify a set of experimental measurements that are most sensitive to the presence of non-native states. We find that although our MD ensemble, as well as the consensus UUCG tetraloop structures, are in good agreement with experiments, there are remaining discrepancies. Together, our results show that (i) the MD simulation overstabilize a non-native loop conformation, (ii) eNOE data support its presence with a population of ≈10% and (iii) the structural interpretation of experimental data for dynamic RNAs is highly complex, even for a simple model system such as the UUCG tetraloop.


2019 ◽  
Author(s):  
Sandro Bottaro ◽  
Parker J. Nichols ◽  
Beat Vögeli ◽  
Michele Parrinello ◽  
Kresten Lindorff-Larsen

AbstractWe provide an atomic-level description of the structure and dynamics of the UUCG RNA stem-loop by combining molecular dynamics simulations with experimental data. The integration of simulations with exact nuclear Overhauser enhancements data allowed us to characterize two distinct states of this molecule. The most stable conformation corresponds to the consensus three-dimensional structure. The second state is characterized by the absence of the peculiar non-Watson-Crick interactions in the loop region. By using machine learning techniques we identify a set of experimental measurements that are most sensitive to the presence of non-native states. We find that although our MD ensemble, as well as the consensus UUCG tetraloop structures, are in good agreement with experiments, there are remaining discrepancies. Together, our results show that i) the structural interpretation of experimental data for dynamic RNAs is highly complex, even for a simple model system such as the UUCG tetraloop ii) the MD simulation overstabilize a non-native loop conformation, and iii) eNOE data support its presence with a population of ≈10%.


Author(s):  
Sabrina Lusvarghi ◽  
Joanna Sztuba-Solinska ◽  
Katarzyna J. Purzycka ◽  
Jason W. Rausch ◽  
Stuart F.J. Le Grice

Computerized sequence analysis is an integral part of biotechnological research, yet many biologists have received no formal training in this important technology. Sequence Analysis Primer offers the beginner the necessary background to enter this vital field and helps more seasoned researchers to fine-tune their approach. It covers basic data manipulation such as homology searches, stem-loop identification, and protein secondary structure prediction, and is compatible with most sequence analysis programs. A detailed example giving steps for characterizing a new gene sequence provides users with hands-on experience when combined with their current software. The book will be invaluable to researchers and students in molecular biology, genetics, biochemistry, microbiology, and biotechnology.


Sign in / Sign up

Export Citation Format

Share Document