VirusTaxo: Taxonomic classification of virus genome using multi-class hierarchical classification by k-mer enrichment

2021 ◽  
Author(s):  
Rajan Saha Raju ◽  
Abdullah Al Nahid ◽  
Preonath Shuvo ◽  
Rashedul Islam

AbstractTaxonomic classification of viruses is a multi-class hierarchical classification problem, as taxonomic ranks (e.g., order, family and genus) of viruses are hierarchically structured and have multiple classes in each rank. Classification of biological sequences which are hierarchically structured with multiple classes is challenging. Here we developed a machine learning architecture, VirusTaxo, using a multi-class hierarchical classification by k-mer enrichment. VirusTaxo classifies DNA and RNA viruses to their taxonomic ranks using genome sequence. To assign taxonomic ranks, VirusTaxo extracts k-mers from genome sequence and creates bag-of-k-mers for each class in a rank. VirusTaxo uses a top-down hierarchical classification approach and accurately assigns the order, family and genus of a virus from the genome sequence. The average accuracies of VirusTaxo for DNA viruses are 99% (order), 98% (family) and 95% (genus) and for RNA viruses 97% (order), 96% (family) and 82% (genus). VirusTaxo can be used to detect taxonomy of novel viruses using full length genome or contig sequences.AvailabilityOnline version of VirusTaxo is available at https://omics-lab.com/virustaxo/.

mBio ◽  
2020 ◽  
Vol 11 (5) ◽  
Author(s):  
Ignacio de la Higuera ◽  
George W. Kasun ◽  
Ellis L. Torrance ◽  
Alyssa A. Pratt ◽  
Amberlee Maluenda ◽  
...  

ABSTRACT The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding single-stranded DNA (ssDNA) (CRESS-DNA) viruses that encode capsid proteins that are most closely related to those encoded by RNA viruses in the family Tombusviridae. The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of capsid genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and 10 capsid-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode. Most of the variation is reflected in the replication-associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins and that the exchange of Rep protein domains between cruciviruses is rarer than intergenic recombination. Additionally, we suggest members of the stramenopiles/alveolates/Rhizaria supergroup as possible crucivirus hosts. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses. IMPORTANCE Viruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and they appear to have multiple origins from prokaryotic plasmids. A subset of CRESS-DNA viruses, the cruciviruses, have homologues of capsid proteins encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability.


2019 ◽  
Author(s):  
Stéphane Aris-Brosou ◽  
Louis Parent ◽  
Neke Ibeh

AbstractViruses are known to have some of the highest and most diverse mutation rates found in any biological replicator, topped by single-stranded (ss) RNA viruses, while double-stranded (ds) DNA viruses have rates approaching those of bacteria. As mutation rates are tightly and negatively correlated with genome size, selection is a clear driver of viral evolution. However, the role of intragenomic interactions as drivers of viral evolution is less well documented. To understand how these two processes affect viral evolution, we systematically surveyed ssRNA, ssDNA, dsRNA, and dsDNA viruses, to find which virus type and which functions show evidence for episodic diversifying selection and correlated evolution. We show that while evidence for selection is mostly found in single stranded viruses, and correlated evolution is more prevalent in DNA viruses, the genes that are affected by both processes are involved in key aspects of their life cycle, favoring viral stability over proliferation. We further show that both evolutionary processes are intimately linked at the amino acid level, which suggests that selection alone does not explain the whole evolutionary —and epidemiological— potential of viruses.


2018 ◽  
Author(s):  
Konstans Wells ◽  
Serge Morand ◽  
Maya Wardeh ◽  
Matthew Baylis

AbstractEmerging infectious diseases arising from pathogen spillover from mammals to humans comprise a substantial health threat. Tracing virus origin and predicting the most likely host species for future spillover events are major objectives in One Health disciplines. However, the species that share pathogens most widely with other mammals, and the role of different wildlife groups in sharing viruses with humans remain poorly identified. To address this challenge, we applied network analysis and Bayesian hierarchical models to a global database of mammal-virus associations. We show that domesticated mammals and some primates hold the most central positions in networks of known mammal-virus associations. We revealed strong evidence that DNA viruses were phylogenetically more host specific than RNA viruses, while the frequencies of sharing viruses among hosts and the proportion of zoonotic viruses in hosts were larger for RNA than DNA viruses. Among entire host-virus networks, Carnivora and Chiroptera hold central positions for mainly sharing RNA viruses with other host species, while network centrality of Primates scored relatively high for sharing DNA viruses. Ungulates hold central positions for sharing both RNA and DNA viruses. Acknowledging the role of domestic species in addition to host and virus traits in patterns of virus sharing is necessary to improve our understanding of virus spread and spillover in times of global change.


Author(s):  
Sheikh Saba Naz ◽  
Afsheen Aslam ◽  
Taqdees Malik

A successful viral infection is due to the effective evasion of viruses from the immune system. The entry of viruses is usually detected by different cellular receptors including PRRs. Recognition of the viral genome leads to the production of interferons through a signaling stream. This review article will give brief information about escaping mechanisms of DNA and RNA viruses from the host immune system. Glimpses of these strategies include viral endonuclease activity, cap snatching of host mRNA, the formation of replication organelles, stress granule formation, membrane modifications, action of proteases, and evasion from ISGs. Moreover, we will discuss the strategies of DNA viruses to inhibit immune responses include Subversion of mRNA, transcriptional factors, Adaptor proteins, PRRs, evasion from T lymphocytes, Genomic Diversity, Theft or seize of host defense proteins, Imitation of host factors like affecting cytokines and chemokines of the host, and suppression or inhibition of apoptosis, Proteasomal degradation of host antiviral proteins by DNA Viruses. This knowledge is pivotal in understanding of different methodologies that viruses have created to escape antiviral cellular reactions of the host as well as an understanding of virus-host interactions and the origin of viral pathogenesis. Also, this knowledge is significant for the design of gene targeting vectors, antiviral vaccines, and the development of effective treatments directed against DNA and RNA viruses.


2021 ◽  
Vol 18 ◽  
Author(s):  
Carlos Polanco ◽  
Vladimir N. Uversky ◽  
Gilberto Vargas-Alarcón ◽  
Thomas Buhse ◽  
Alberto Huberman ◽  
...  

Background: In the vast variety of viruses known, there is a particular interest in those transmitted to humans and whose ability to disseminate represents a significant public health issue. Objective: The present study’s objective is to bioinformatically characterize the proteins of the two main divisions of viruses, RNA-viruses and DNA-viruses. Methods: In this work, a set of in-house computational programs was used to calculate the polarity/charge profiles and intrinsic disorder predisposition profiles of the proteins of several groups of viruses representing both types extracted from UniProt database. The efficiency of these computational programs was statistically verified. Results: It was found that the polarity/charge profile of the proteins is, in most cases, an efficient discriminant that allows the re-creation of the taxonomy known for both viral groups. Additionally, the entire set of "reviewed" proteins in UniProt database was analyzed to find proteins with the polarity/charge profiles similar to those obtained for each viral group. This search revealed a substantial number of proteins with such polarity-charge profiles. Conclusion: Polarity/charge profile represents a physicochemical metric, which is easy to calculate, and which can be used to effectively identify viral groups from their protein sequences.


Viruses ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 958
Author(s):  
Kaitlyn Speckhart ◽  
Jeffrey M. Williams ◽  
Billy Tsai

To initiate infection, a virus enters a host cell typically via receptor-dependent endocytosis. It then penetrates a subcellular membrane, reaching a destination that supports transcription, translation, and replication of the viral genome. These steps lead to assembly and morphogenesis of the new viral progeny. The mature virus finally exits the host cell to begin the next infection cycle. Strikingly, viruses hijack host molecular chaperones to accomplish these distinct entry steps. Here we highlight how DNA viruses, including polyomavirus and the human papillomavirus, exploit soluble and membrane-associated chaperones to enter a cell, penetrating and escaping an intracellular membrane en route for infection. We also describe the mechanism by which RNA viruses—including flavivirus and coronavirus—co-opt cytosolic and organelle-selective chaperones to promote viral endocytosis, protein biosynthesis, replication, and assembly. These examples underscore the importance of host chaperones during virus infection, potentially revealing novel antiviral strategies to combat virus-induced diseases.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Ruonan Wu ◽  
Michelle R. Davison ◽  
Yuqian Gao ◽  
Carrie D. Nicora ◽  
Jason E. Mcdermott ◽  
...  

AbstractSoil is known to harbor viruses, but the majority are uncharacterized and their responses to environmental changes are unknown. Here, we used a multi-omics approach (metagenomics, metatranscriptomics and metaproteomics) to detect active DNA viruses and RNA viruses in a native prairie soil and to determine their responses to extremes in soil moisture. The majority of transcribed DNA viruses were bacteriophage, but some were assigned to eukaryotic hosts, mainly insects. We also demonstrated that higher soil moisture increased transcription of a subset of DNA viruses. Metaproteome data validated that the specific viral transcripts were translated into proteins, including chaperonins known to be essential for virion replication and assembly. The soil viral chaperonins were phylogenetically distinct from previously described marine viral chaperonins. The soil also had a high abundance of RNA viruses, with highest representation of Reoviridae. Leviviridae were the most diverse RNA viruses in the samples, with higher amounts in wet soil. This study demonstrates that extreme shifts in soil moisture have dramatic impacts on the composition, activity and potential functions of both DNA and RNA soil viruses.


Author(s):  
Muthulakshmi M, Et. al.

Genome sequencing aids in understanding the nature, characteristics, habitat and evolutionary history of all living organisms. Apart from sequencing, the more important task is to correctly place the sequenced genome in the taxonomy. Generally, the taxonomic classification of the living organisms is done by observing their morphological, behavioral, genetic and biochemical characteristics. Among them, taxonomic classification using genetic observation is more accurate scientifically as the Genome sequence analysis exploits the complete characteristics of the organism. In this paper, we developed a novel Frequency based Feature Extraction Technique (FFET) which extracts 120 features and helps to analyze the genome sequence of the organism and to classify them in the taxonomy accordingly. We performed a kingdom level taxonomic classification using the proposed FFET. The proposed FFET extracts features based on storage, frequency of nucleotide bases, pattern arrangement and amino acid composition of genome sequences. The feature extraction technique is applied to 150 samples of genome sequences of various organisms which were downloaded from National Centre for Biotechnology and Information (NCBI) database. The extracted features are classified using various Machine learning and Deep learning classifiers. From the results, it is evident that FFET performs well for classification with Convolutional Neural Network (CNN) classifier with an accuracy of 96.73 %.


2009 ◽  
Vol 83 (10) ◽  
pp. 5109-5116 ◽  
Author(s):  
Kathie A. Mihindukulasuriya ◽  
Nang L. Nguyen ◽  
Guang Wu ◽  
Henry V. Huang ◽  
Amelia P. A. Travassos da Rosa ◽  
...  

ABSTRACT Here, we report the sequencing and classification of Nyamanini virus (NYMV) and Midway virus (MIDWV), two antigenically related viruses that were first isolated in 1957 and 1966, respectively. Although these viruses have been cultured multiple times from cattle egrets, seabirds, and their ticks, efforts to classify them taxonomically using conventional serological and electron microscopic approaches have failed completely. We used a random shotgun sequencing strategy to define the genomes of NYMV and MIDWV. Contigs of 11,631 and 11,752 nucleotides, representing the complete genome of NYMV and the near-complete genome of MIDWV, respectively, were assembled. Each virus genome was predicted to carry six open reading frames (ORFs). BLAST analysis indicated that only two of the ORF proteins of each virus, the putative nucleocapsid and polymerase, had detectable sequence similarity to known viral proteins. Phylogenetic analysis of these ORF proteins demonstrated that the closest relatives of NYNV and MIDWV are negative-stranded-RNA viruses in the order Mononegavirales. On the basis of their very limited sequence similarity to known viruses, we propose that NYMV and MIDWV define a novel genus, Nyavirus, in this order.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiaofeng Xu ◽  
Jinlong Bei ◽  
Yibo Xuan ◽  
Jiayuan Chen ◽  
Defu Chen ◽  
...  

Abstract Background In 2014, a novel tick-borne virus of the Flaviviridae family was first reported in the Mogiana region of Brazil and named the Mogiana tick virus (MGTV). Thereafter, the Jingmen tick virus (JMTV), Kindia tick virus (KITV), and Guangxi tick virus (GXTV)—evolutionarily related to MGTV—were reported. Results In the present study, we used small RNA sequencing (sRNA-seq) to detect viruses in ticks and discovered a new MGTV strain in Amblyomma testudinarium ticks collected in China’s Yunnan Province in 2016. We obtained the full-length genome sequence of this MGTV strain Yunnan2016 (GenBank: MT080097, MT080098, MT080099 and MT080100) and recommended it for its inclusion in the NCBI RefSeq database for future studies on MGTV, JMTV, KITV and GXTV. Phylogenetic analysis showed that MGTV, JMTV, KITV and GXTV are monophyletic and belong to a MGTV group. Furthermore, this MGTV group of viruses may be phylogenetically related to geographical regions that were formerly part of the supercontinents Gondwana and Laurasia. Conclusions To the best of our knowledge, this is the first study in which 5′ and 3′ sRNAs were used to generate full-length genome sequences of, but not limited to, RNA viruses. We also demonstrated the feasibility of using the sRNA-seq based method for the detection of viruses in pooled two and even possible one small ticks. MGTV may preserve the characteristic of ancient RNA viruses, which can be used to study the origin and evolution of RNA viruses. In addition, MGTV can be used as novel species for studies in phylogeography.


Sign in / Sign up

Export Citation Format

Share Document