scholarly journals Rapid protein sequence evolution via compensatory frameshift is widespread in RNA virus genomes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dongbin Park ◽  
Yoonsoo Hahn

Abstract Background RNA viruses possess remarkable evolutionary versatility driven by the high mutability of their genomes. Frameshifting nucleotide insertions or deletions (indels), which cause the premature termination of proteins, are frequently observed in the coding sequences of various viral genomes. When a secondary indel occurs near the primary indel site, the open reading frame can be restored to produce functional proteins, a phenomenon known as the compensatory frameshift. Results In this study, we systematically analyzed publicly available viral genome sequences and identified compensatory frameshift events in hundreds of viral protein-coding sequences. Compensatory frameshift events resulted in large-scale amino acid differences between the compensatory frameshift form and the wild type even though their nucleotide sequences were almost identical. Phylogenetic analyses revealed that the evolutionary distance between proteins with and without a compensatory frameshift were significantly overestimated because amino acid mismatches caused by compensatory frameshifts were counted as substitutions. Further, this could cause compensatory frameshift forms to branch in different locations in the protein and nucleotide trees, which may obscure the correct interpretation of phylogenetic relationships between variant viruses. Conclusions Our results imply that the compensatory frameshift is one of the mechanisms driving the rapid protein evolution of RNA viruses and potentially assisting their host-range expansion and adaptation.

2004 ◽  
Vol 64 (3a) ◽  
pp. 383-398
Author(s):  
M. L. Christoffersen ◽  
M. E. Araújo ◽  
M. A. M. Moreira

Total sequence phylogenies have low information content. Ordinary misconceptions are that character quality can be ignored and that relying on computer algorithms is enough. Despite widespread preference for a posteriori methods of character evaluation, a priori methods are necessary to produce transformation series that are independent of tree topologies. We propose a stepwise qualitative method for analyzing protein sequences. Informative codons are selected, alternative amino acid transformation series are analyzed, and most parsimonious transformations are hypothesized. We conduct four phylogenetic analyses of philodryanine snakes. The tree based on all nucleotides produces least resolution. Trees based on the exclusion of third positions, on an asymmetric step matrix, and on our protocol, produce similar results. Our method eliminates noise by hypothesizing explicit transformation series for each informative protein-coding amino acid. This approaches qualitative methods for morphological data, in which only characters successfully interpreted in a phylogenetic context are used in cladistic analyses. The method allows utilizing character information contained in the original sequence alignment and, therefore, has higher resolution in inferring a phylogenetic tree than some traditional methods (such as distance methods).


mSphere ◽  
2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Marli Vlok ◽  
Andrew S. Lang ◽  
Curtis A. Suttle

ABSTRACTRNA viruses, particularly genetically diverse members of thePicornavirales, are widespread and abundant in the ocean. Gene surveys suggest that there are spatial and temporal patterns in the composition of RNA virus assemblages, but data on their diversity and genetic variability in different oceanographic settings are limited. Here, we show that specific RNA virus genomes have widespread geographic distributions and that the dominant genotypes are under purifying selection. Genomes from three previously unknown picorna-like viruses (BC-1, -2, and -3) assembled from a coastal site in British Columbia, Canada, as well as marine RNA viruses JP-A, JP-B, andHeterosigma akashiwoRNA virus exhibited different biogeographical patterns. Thus, biotic factors such as host specificity and viral life cycle, and not just abiotic processes such as dispersal, affect marine RNA virus distribution. Sequence differences relative to reference genomes imply that virus quasispecies are under purifying selection, with synonymous single-nucleotide variations dominating in genomes from geographically distinct regions resulting in conservation of amino acid sequences. Conversely, sequences from coastal South Africa that mapped to marine RNA virus JP-A exhibited more nonsynonymous mutations, probably representing amino acid changes that accumulated over a longer separation. This biogeographical analysis of marine RNA viruses demonstrates that purifying selection is occurring across oceanographic provinces. These data add to the spectrum of known marine RNA virus genomes, show the importance of dispersal and purifying selection for these viruses, and indicate that closely related RNA viruses are pathogens of eukaryotic microbes across oceans.IMPORTANCEVery little is known about aquatic RNA virus populations and genome evolution. This is the first study that analyzes marine environmental RNA viral assemblages in an evolutionary and broad geographical context. This study contributes the largest marine RNA virus metagenomic data set to date, substantially increasing the sequencing space for RNA viruses and also providing a baseline for comparisons of marine RNA virus diversity. The new viruses discovered in this study are representative of the most abundant family of marine RNA viruses, theMarnaviridae, and expand our view of the diversity of this important group. Overall, our data and analyses provide a foundation for interpreting marine RNA virus diversity and evolution.


Author(s):  
Zhilong Tian ◽  
Yuqin Wang ◽  
Huibin Shi ◽  
Zhibo Wu ◽  
Xiaohui Zhang ◽  
...  

To further to understand the structure and function of the TAC1 gene, we cloned the full-length cDNAs of the TAC1 genes from goat by rapid amplification of cDNA ends-PCR and the qRT-PCR was used to analyze the TAC1 mRNA expression patterns of goat various tissues. The full-length cDNA of goat TAC1 was 1176 bp, with a 339 bp open reading frame encoding 112 amino acids. The amino acid sequence analysis revealed that goat TAC1 gene encoded a water-drain protein and its relative molecular weight and isoelectric point was 13,012.86 Da and 6.29 respectively. Alignment and phylogenetic analyses revealed that their amino acid sequences were highly similar to those of other vertebrates. TAC1 expression of the goat of the brain, cerebellum, medulla oblongata, heart, liver, spleen, lung, kidney, uterus, ovaries. These results serve as a foundation for further study on the Capra hircus TAC1 gene.


2018 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

ABSTRACTThe significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Here we first characterize lncRNAs by contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between ORF (open reading frame) length and GC content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
E. Heilmann ◽  
J. Kimpel ◽  
B. Hofer ◽  
A. Rössler ◽  
I. Blaas ◽  
...  

AbstractTherapeutic application of RNA viruses as oncolytic agents or gene vectors requires a tight control of virus activity if toxicity is a concern. Here we present a regulator switch for RNA viruses using a conditional protease approach, in which the function of at least one viral protein essential for transcription and replication is linked to autocatalytical, exogenous human immunodeficiency virus (HIV) protease activity. Virus activity can be en- or disabled by various HIV protease inhibitors. Incorporating the HIV protease dimer in the genome of vesicular stomatitis virus (VSV) into the open reading frame of either the P- or L-protein resulted in an ON switch. Here, virus activity depends on co-application of protease inhibitor in a dose-dependent manner. Conversely, an N-terminal VSV polymerase tag with the HIV protease dimer constitutes an OFF switch, as application of protease inhibitor stops virus activity. This technology may also be applicable to other potentially therapeutic RNA viruses.


2020 ◽  
Vol 17 (1) ◽  
Author(s):  
Anouk Willemsen ◽  
Alexander van den Boom ◽  
Julienne Dietz ◽  
Seval Bilge Dagalp ◽  
Firat Dogan ◽  
...  

Abstract Background Papillomaviruses (PVs) infecting artiodactyls are very diverse, and only second in number to PVs infecting primates. PVs associated to lesions in economically important ruminant species have been isolated from cattle and sheep. Methods Potential PV DNA from teat lesions of a Damascus goat was isolated, cloned and sequenced. The PV genome was analyzed using bioinformatics approaches to detect open reading frames and to predict potential features of encoded proteins as well as putative regulatory elements. Sequence comparison and phylogenetic analyses using the concatenated E1E2L2L1 nucleotide and amino acid alignments was used to reveal the relationship of the new PV to the known PV diversity and its closest relevants. Results We isolated and characterized the full-genome of novel Capra hircus papillomavirus. We identified the E6, E7, E1, E2, L2, L1 open reading frames with protein coding potential and putative active elements in the ChPV2 proteins and putative regulatory genome elements. Sequence similarities of L1 and phylogenetic analyses using concatenated E1E2L2L1 nucleotide and amino acid alignments suggest the classification as a new PV type designated ChPV2 with a phylogenetic position within the XiPV genus, basal to the XiPV1 species. ChPV2 is not closely related to ChPV1, the other known goat PV isolated from healthy skin, although both of them belong confidently into a clade composed of PVs infecting cervids and bovids. Interestingly, ChPV2 contains an E6 open reading frame whereas all closely related PVs do not Conclusion ChPV2 is a novel goat PV closely related to the Xi-PV1 species infecting bovines. Phylogenetic relationships and genome architecture of ChPV2 and closely related PV types suggest at least two independent E6 losses within the XiPV clade.


2016 ◽  
Vol 4 (2) ◽  
Author(s):  
Alexander L. Greninger ◽  
Keith R. Jerome

We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the familyDicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus.


2017 ◽  
Author(s):  
Harry A. Thorpe ◽  
Sion C. Bayliss ◽  
Samuel K. Sheppard ◽  
Edward J. Feil

AbstractDespite overwhelming evidence that variation in intergenic regions (IGRs) in bacteria impacts on phenotypes, most current approaches for analysing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on IGRs. We demonstrate the use of Piggy for pan-genome analyses of Staphylococcus aureus and Escherichia coli using large genome datasets. For S. aureus, we show that highly divergent (“switched”) IGRs are associated with differences in gene expression, and we establish a multi-locus reference database of IGR alleles (igMLST; implemented in BIGSdb). Piggy is available at https://github.com/harry-thorpe/piggy.


Sign in / Sign up

Export Citation Format

Share Document