scholarly journals Large-Scale Analyses of Site-Specific Evolutionary Rates across Eukaryote Proteomes Reveal Confounding Interactions between Intrinsic Disorder, Secondary Structure, and Functional Domains

Genes ◽  
2018 ◽  
Vol 9 (11) ◽  
pp. 553 ◽  
Author(s):  
Joseph Ahrens ◽  
Jordon Rahaman ◽  
Jessica Siltberg-Liberles

Various structural and functional constraints govern the evolution of protein sequences. As a result, the relative rates of amino acid replacement among sites within a protein can vary significantly. Previous large-scale work on Metazoan (Animal) protein sequence alignments indicated that amino acid replacement rates are partially driven by a complex interaction among three factors: intrinsic disorder propensity; secondary structure; and functional domain involvement. Here, we use sequence-based predictors to evaluate the effects of these factors on site-specific sequence evolutionary rates within four eukaryotic lineages: Metazoans; Plants; Saccharomycete Fungi; and Alveolate Protists. Our results show broad, consistent trends across all four Eukaryote groups. In all four lineages, there is a significant increase in amino acid replacement rates when comparing: (i) disordered vs. ordered sites; (ii) random coil sites vs. sites in secondary structures; and (iii) inter-domain linker sites vs. sites in functional domains. Additionally, within Metazoans, Plants, and Saccharomycetes, there is a strong confounding interaction between intrinsic disorder and secondary structure—alignment sites exhibiting both high disorder propensity and involvement in secondary structures have very low average rates of sequence evolution. Analysis of gene ontology (GO) terms revealed that in all four lineages, a high fraction of sequences containing these conserved, disordered-structured sites are involved in nucleic acid binding. We also observe notable differences in the statistical trends of Alveolates, where intrinsically disordered sites are more variable than in other Eukaryotes and the statistical interactions between disorder and other factors are less pronounced.

2018 ◽  
Author(s):  
Padideh Danaee ◽  
Mason Rouches ◽  
Michelle Wiley ◽  
Dezhong Deng ◽  
Liang Huang ◽  
...  

ABSTRACTWhile RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, “bpRNA-1m”, of over 100,000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.


Genetics ◽  
1998 ◽  
Vol 149 (1) ◽  
pp. 445-458 ◽  
Author(s):  
Nick Goldman ◽  
Jeffrey L Thorne ◽  
David T Jones

Abstract Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.


2018 ◽  
Vol 92 (22) ◽  
Author(s):  
Tomofumi Mochizuki ◽  
Rie Ohara ◽  
Marilyn J. Roossinck

ABSTRACTThe effect of large-scale synonymous substitutions in a small icosahedral, single-stranded RNA viral genome on virulence, viral titer, and protein evolution were analyzed. The coat protein (CP) gene of the Fny stain of cucumber mosaic virus (CMV) was modified. We created four CP mutants in which all the codons of nine amino acids in the 5′ or 3′ half of the CP gene were replaced by either the most frequently or the least frequently used synonymous codons in monocot plants. When the dicot host (Nicotiana benthamiana) was inoculated with these four CP mutants, viral RNA titers in uninoculated symptomatic leaves decreased, while all mutants eventually showed mosaic symptoms similar to those for the wild type. The codon adaptation index of these four CP mutants against dicot genes was similar to those of the wild-type CP gene, indicating that the reduction of viral RNA titer was due to deleterious changes of the secondary structure of RNAs 3 and 4. When two 5′ mutants were serially passaged inN. benthamiana, viral RNA titers were rapidly restored but competitive fitness remained decreased. Although no nucleic acid changes were observed in the passaged wild-type CMV, one to three amino acid changes were observed in the synonymously mutated CP of each passaged virus, which were involved in recovery of viral RNA titer of 5′ mutants. Thus, we demonstrated that deleterious effects of the large-scale synonymous substitutions in the RNA viral genome facilitated the rapid amino acid mutation(s) in the CP to restore the viral RNA titer.IMPORTANCERecently, it has been known that synonymous substitutions in RNA virus genes affect viral pathogenicity and competitive fitness by alteration of global or local RNA secondary structure of the viral genome. We confirmed that large-scale synonymous substitutions in the CP gene of CMV resulted in decreased viral RNA titer. Importantly, when viral evolution was stimulated by serial-passage inoculation, viral RNA titer was rapidly restored, concurrent with a few amino acid changes in the CP. This novel finding indicates that the deleterious effects of large-scale nucleic acid mutations on viral RNA secondary structure are readily tolerated by structural changes in the CP, demonstrating a novel part of the adaptive evolution of an RNA viral genome. In addition, our experimental system for serial inoculation of large-scale synonymous mutants could uncover a role for new amino acid residues in the viral protein that have not been observed in the wild-type virus strains.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3391 ◽  
Author(s):  
Dariya K. Sydykova ◽  
Claus O. Wilke

Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.


Author(s):  
Lina Yang ◽  
Yang Liu ◽  
Huiwu Luo ◽  
Xichun Li ◽  
Yuan Yan Tang

The function of pseudoknots cannot be ignored in the RNA secondary structure. Existing methods for analyzing RNA secondary structures with pseudoknots exhibit many shortcomings. This paper presents a novel RNA secondary structure visualization method in the case of a joint analysis of RNA primary structures and secondary structures. The way is based on the page number representation of the RNA secondary structure. It innovatively uses five vectors to represent bases, which are sequentially connected to outline the characteristics of the RNA secondary structure. The method covers almost all the constituent elements of the RNA secondary structure and extracts features completely. Experiments are based on the available techniques for large-scale annotation of RNA secondary structures, using a combination method of discrete wavelet transform and fractal dimension. The classification effect is compared with the previous RNA secondary structure representation methods. Experimental results show that the RNA secondary structure visualization method proposed in this paper has good application prospects in RNA secondary structure classification.


2017 ◽  
Author(s):  
Dariya K Sydykova ◽  
Claus O Wilke

Many applications require the calculation of site-specific evolutionary rates from alignments of amino-acid sequences. For example, catalytic residues in enzymes and interface regions in protein complexes can be inferred from observed relative rates. While numerous approaches exist to calculate amino-acid rates, however, it is not entirely clear what physical quantities the inferred rates represent and how these rates relate to the underlying fitness landscape of the evolving protein. Further, amino-acid rates can be calculated in the context of different amino-acid exchangeability matrices, such as JTT, LG, or WAG, and again it is not known how the choice of the matrix influences the physical interpretation of the inferred rates. Here, we develop a theory of measurement for site-specific evolutionary rates, but analytically solving the maximum-likelihood equations for rate inference performed on sequences evolved under a mutation–selection model. We demonstrate that the measurement process can only recover the true expected rates of the mutation–selection model if rates are measured relative to a naïve exchangeability matrix, in which all exchangeabilities are equal to one. Rate measurements using other matrices are quantitatively close but not mathematically correct. Our results demonstrate that insights obtained from phylogenetic-tree inference do not necessarily apply to rate inference, and best practices for the former may be deleterious for the latter.


2018 ◽  
Vol 92 (9) ◽  
pp. e01949-17 ◽  
Author(s):  
Jing Shaw ◽  
Jaume Jorba ◽  
Kun Zhao ◽  
Jane Iber ◽  
Qi Chen ◽  
...  

ABSTRACTWe followed the dynamics of capsid amino acid replacement among 403 Nigerian outbreak isolates of type 2 circulating vaccine-derived poliovirus (cVDPV2) from 2005 through 2011. Four different functional domains were analyzed: (i) neutralizing antigenic (NAg) sites, (ii) residues binding the poliovirus receptor (PVR), (iii) VP1 residues 1 to 32, and (iv) the capsid structural core. Amino acid replacements mapped to 37 of 43 positions across all 4 NAg sites; the most variable and polymorphic residues were in NAg sites 2 and 3b. The most divergent of the 120 NAg variants had no more than 5 replacements in all NAg sites and were still neutralized at titers similar to those of Sabin 2. PVR-binding residues were less variable (25 different variants; 0 to 2 replacements per isolate; 30/44 invariant positions), with the most variable residues also forming parts of NAg sites 2 and 3a. Residues 1 to 32 of VP1 were highly variable (133 different variants; 0 to 6 replacements per isolate; 5/32 invariant positions), with residues 1 to 18 predicted to form a well-conserved amphipathic helix. Replacement events were dated by mapping them onto the branches of time-scaled phylogenies. Rates of amino acid replacement varied widely across positions and followed no simple substitution model. Replacements in the structural core were the most conservative and were fixed at an overall rate ∼20-fold lower than the rates for the NAg sites and VP1 1 to 32 and ∼5-fold lower than the rate for the PVR-binding sites. Only VP1 143-Ile, a non-NAg site surface residue and known attenuation site, appeared to be under strong negative selection.IMPORTANCEThe high rate of poliovirus evolution is offset by strong selection against amino acid replacement at most positions of the capsid. Consequently, poliovirus vaccines developed from strains isolated decades ago have been used worldwide to bring wild polioviruses almost to extinction. The apparent antigenic stability of poliovirus obscures a dynamic of continuous change within the neutralizing antigenic (NAg) sites. During 7 years of a large outbreak in Nigeria, the circulating type 2 vaccine-derived polioviruses generated 120 different NAg site variants via multiple independent pathways. Nonetheless, overall antigenic evolution was constrained, as no isolate had fixed more than 5 amino acid differences from the Sabin 2 NAg sites, and the most divergent isolates were efficiently neutralized by human immune sera. Evolution elsewhere in the capsid was also constrained. Amino acids binding the poliovirus receptor were strongly conserved, and extensive variation in the VP1 amino terminus still conserved a predicted amphipathic helix.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Lei Deng ◽  
Youzhi Liu ◽  
Yechuan Shi ◽  
Wenhao Zhang ◽  
Chun Yang ◽  
...  

Abstract Background RNA binding proteins (RBPs) play a vital role in post-transcriptional processes in all eukaryotes, such as splicing regulation, mRNA transport, and modulation of mRNA translation and decay. The identification of RBP binding sites is a crucial step in understanding the biological mechanism of post-transcriptional gene regulation. However, the determination of RBP binding sites on a large scale is a challenging task due to high cost of biochemical assays. Quite a number of studies have exploited machine learning methods to predict binding sites. Especially, deep learning is increasingly used in the bioinformatics field by virtue of its ability to learn generalized representations from DNA and protein sequences. Results In this paper, we implemented a novel deep neural network model, DeepRKE, which combines primary RNA sequence and secondary structure information to effectively predict RBP binding sites. Specifically, we used word embedding algorithm to extract features of RNA sequences and secondary structures, i.e., distributed representation of k-mers sequence rather than traditional one-hot encoding. The distributed representations are taken as input of convolutional neural networks (CNN) and bidirectional long-term short-term memory networks (BiLSTM) to identify RBP binding sites. Our results show that deepRKE outperforms existing counterpart methods on two large-scale benchmark datasets. Conclusions Our extensive experimental results show that DeepRKE is an efficacious tool for predicting RBP binding sites. The distributed representations of RNA sequences and secondary structures can effectively detect the latent relationship and similarity between k-mers, and thus improve the predictive performance. The source code of DeepRKE is available at https://github.com/youzhiliu/DeepRKE/.


2013 ◽  
Vol 11 (05) ◽  
pp. 1350012 ◽  
Author(s):  
PRADIP GHANTY ◽  
NIKHIL R. PAL ◽  
RAJANI K. MUDI

In this paper, we propose some co-occurrence probability-based features for prediction of protein secondary structure. The features are extracted using occurrence/nonoccurrence of secondary structures in the protein sequences. We explore two types of features: position-specific (based on position of amino acid on fragments of protein sequences) as well as position-independent (independent of amino acid position on fragments of protein sequences). We use a hybrid system, NEUROSVM, consisting of neural networks and support vector machines for classification of secondary structures. We propose two schemes NSVMps and NSVM for protein secondary structure prediction. The NSVMps uses position-specific probability-based features and NEUROSVM classifier whereas NSVM uses the same classifier with position-independent probability-based features. The proposed method falls in the single-sequence category of methods because it does not use any sequence profile information such as position specific scoring matrices (PSSM) derived from PSI-BLAST. Two widely used datasets RS126 and CB513 are used in the experiments. The results obtained using the proposed features and NEUROSVM classifier are better than most of the existing single-sequence prediction methods. Most importantly, the results using NSVMps that are obtained using lower dimensional features, are comparable to those by other existing methods. The NSVMps and NSVM are finally tested on target proteins of the critical assessment of protein structure prediction experiment-9 (CASP9). A larger dataset is used to compare the performance of the proposed methods with that of two recent single-sequence prediction methods. We also investigate the impact of presence of different amino acid residues (in protein sequences) that are responsible for the formation of different secondary structures.


Sign in / Sign up

Export Citation Format

Share Document