Genomic Variation and Diversification in Begomovirus Genome in Implication to Host and Vector Adaptation

AbstractTaxonomic classification of viruses is a multi-class hierarchical classification problem, as taxonomic ranks (e.g., order, family and genus) of viruses are hierarchically structured and have multiple classes in each rank. Classification of biological sequences which are hierarchically structured with multiple classes is challenging. Here we developed a machine learning architecture, VirusTaxo, using a multi-class hierarchical classification by k-mer enrichment. VirusTaxo classifies DNA and RNA viruses to their taxonomic ranks using genome sequence. To assign taxonomic ranks, VirusTaxo extracts k-mers from genome sequence and creates bag-of-k-mers for each class in a rank. VirusTaxo uses a top-down hierarchical classification approach and accurately assigns the order, family and genus of a virus from the genome sequence. The average accuracies of VirusTaxo for DNA viruses are 99% (order), 98% (family) and 95% (genus) and for RNA viruses 97% (order), 96% (family) and 82% (genus). VirusTaxo can be used to detect taxonomy of novel viruses using full length genome or contig sequences.AvailabilityOnline version of VirusTaxo is available at https://omics-lab.com/virustaxo/.

Download Full-text

Short term but highly efficient Cas9 expression mediated by excisional system using adenovirus vector and Cre

Scientific Reports ◽

10.1038/s41598-021-03803-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sayaka Nagamoto ◽

Miyuki Agawa ◽

Emi Tsuchitani ◽

Kazunori Akimoto ◽

Saki Kondo Matsushima ◽

...

Keyword(s):

Hepatitis B ◽

Genome Editing ◽

Adenovirus Vector ◽

Virus Genome ◽

Dna Viruses ◽

Tissue Specific ◽

B Virus ◽

Tissue Specific Promoter ◽

A Cell

AbstractGenome editing techniques such as CRISPR/Cas9 have both become common gene engineering technologies and have been applied to gene therapy. However, the problems of increasing the efficiency of genome editing and reducing off-target effects that induce double-stranded breaks at unexpected sites in the genome remain. In this study, we developed a novel Cas9 transduction system, Exci-Cas9, using an adenovirus vector (AdV). Cas9 was expressed on a circular molecule excised by the site-specific recombinase Cre and succeeded in shortening the expression period compared to AdV, which expresses the gene of interest for at least 6 months. As an example, we chose hepatitis B, which currently has more than 200 million carriers in the world and frequently progresses to liver cirrhosis or hepatocellular carcinoma. The efficiencies of hepatitis B virus genome disruption by Exci-Cas9 and Cas9 expression by AdV directly (Avec) were the same, about 80–90%. Furthermore, Exci-Cas9 enabled cell- or tissue-specific genome editing by expressing Cre from a cell- or tissue-specific promoter. We believe that Exci-Cas9 developed in this study is useful not only for resolving the persistent expression of Cas9, which has been a problem in genome editing, but also for eliminating long-term DNA viruses such as human papilloma virus.

Download Full-text

A61 Large RNA genomes: Is RNA polymerase fidelity enough?

Virus Evolution ◽

10.1093/ve/vez002.060 ◽

2019 ◽

Vol 5 (Supplement_1) ◽

Author(s):

F Ferron ◽

B Canard

Keyword(s):

Rna Polymerase ◽

Rna Synthesis ◽

Rna Virus ◽

Virus Genome ◽

Gene Products ◽

Dna Viruses ◽

Viral Rdrp ◽

Polymerase Fidelity ◽

Mechanistic Basis

Abstract Large-genome Nidoviruses and Nidovirus-like viruses reside at the current boundary of largest RNA genome sizes. They code for an unusually large number of gene products matching that of small DNA viruses (e.g. DNA bacteriophages). The order of appearance and distribution of enzyme genes along various virus families (e.g. helicase and ExoN) may be seen as an evolutionary marker in these large RNA genomes lying at the genome size boundary. A positive correlation exists between (+)RNA virus genome sizes and the presence of the RNA helicase and the ExoN domains. Although the mechanistic basis of the presence of the helicase is still unclear, the role of the ExoN activity has been linked to the existence of an RNA synthesis proofreading system. In large Nidovirales, ExoN is bound to a processive replicative RNA-dependent RNA polymerase (RdRp) and corrects mismatched bases during viral RNA synthesis. Over the last decade, a view of the overall process has been refined in Coronaviruses, and in particular in our lab (Ferron et al., PNAS, 2018). We have identified genetic markers of large RNA genomes that we wish to use to data-mine currently existing metagenomic datasets. We have also initiated a collaboration to sequence and explore new viromes that will be searched according to these criteria. Likewise, we have a collection of purified viral RdRps that are currently being used to generate RNA synthesis products that will be compared to existing NGS datasets of cognate viruses. We will be able to have an idea about how much genetic diversity is possibly achievable by viral RdRp (‘tunable fidelity’) versus the detectable diversity (i.e. after selection in the infected cell) that is actually produced.

Download Full-text

Properties and abundance of overlapping genes in viruses

Virus Evolution ◽

10.1093/ve/veaa009 ◽

2020 ◽

Vol 6 (1) ◽

Cited By ~ 6

Author(s):

Timothy E Schlub ◽

Edward C Holmes

Keyword(s):

Genome Structure ◽

Virus Genome ◽

Overlapping Genes ◽

Genome Database ◽

Dna Viruses ◽

Virus Family ◽

Flexible Genome ◽

Gene Overlap ◽

Reference Genomes ◽

Rna And Dna

Abstract Overlapping genes are commonplace in viruses and play an important role in their function and evolution. However, aside from studies on specific groups of viruses, relatively little is known about the extent and nature of gene overlap and its determinants in viruses as a whole. Here, we present an extensive characterisation of gene overlap in viruses through an analysis of reference genomes present in the NCBI virus genome database. We find that over half the instances of gene overlap are very small, covering <10 nt, and 84 per cent are <50 nt in length. Despite this, 53 per cent of all viruses still contained a gene overlap of 50 nt or larger. We also investigate several predictors of gene overlap such as genome structure (single- and double-stranded RNA and DNA), virus family, genome length, and genome segmentation. This revealed that gene overlap occurs more frequently in DNA viruses than in RNA viruses, and more frequently in single-stranded viruses than in double-stranded viruses. Genome segmentation is also associated with gene overlap, particularly in single-stranded DNA viruses. Notably, we observed a large range of overlap frequencies across families of all genome types, suggesting that it is a common evolutionary trait that provides flexible genome structures in all virus families.

Download Full-text

A novel virus genome discovered in an extreme environment suggests recombination between unrelated groups of RNA and DNA viruses

Biology Direct ◽

10.1186/1745-6150-7-13 ◽

2012 ◽

Vol 7 (1) ◽

pp. 13 ◽

Cited By ~ 106

Author(s):

Geoffrey S Diemer ◽

Kenneth M Stedman

Keyword(s):

Extreme Environment ◽

Virus Genome ◽

Dna Viruses ◽

Novel Virus ◽

Rna And Dna

Download Full-text

A new full-length virus genome sequencing method reveals that antiviral RNAi changes geminivirus populations in field-grown cassava

10.1101/168724 ◽

2017 ◽

Cited By ~ 1

Author(s):

Devang Mehta ◽

Matthias Hirsch-Hoffmann ◽

Mariam Were ◽

Andrea Patrignani ◽

Hassan Were ◽

...

Keyword(s):

Single Molecule ◽

Deep Sequencing ◽

Cost Effective ◽

Virus Genome ◽

Full Length ◽

Dna Viruses ◽

Circular Dna ◽

Sequencing Technologies ◽

Virus Genomes ◽

And Control

ABSTRACTDeep-sequencing of virus isolates using short-read sequencing technologies is problematic since viruses are often present in complexes sharing a high-degree of sequence identity. The full-length genomes of such highly-similar viruses cannot be assembled accurately from short sequencing reads. We present a new method, CIDER-Seq (Circular DNA Enrichment Sequencing) which successfully generates accurate full-length virus genomes from individual sequencing reads with no sequence assembly required. CIDER-Seq operates by combining a PCR-free, circular DNA enrichment protocol with Single Molecule Real Time sequencing and a new sequence deconcatenation algorithm. We apply our technique to produce more than 1,200 full-length, highly accurate geminivirus genomes from RNAi-transgenic and control plants in a field trial in Kenya. Using CIDER-Seq we can demonstrate for the first time that the expression of antiviral doublestranded RNA (dsRNA) in transgenic plants causes a consistent shift in virus populations towards species sharing low homology to the transgene derived dsRNA. Our results show that CIDER-seq is a powerful, cost-effective tool for accurately sequencing circular DNA viruses, with future applications in deep-sequencing other forms of circular DNA such as transposons and plasmids.

Download Full-text

Probabilistic graph models for landscape genetics

10.7287/peerj.preprints.2225 ◽

2017 ◽

Author(s):

Brook G. Milligan

Keyword(s):

Population Genetics ◽

Landscape Genetics ◽

Genetic Factors ◽

Spatial Interaction ◽

Genomic Variation ◽

Graph Models ◽

Conceptual Foundation ◽

Probabilistic Graph ◽

Model Structures ◽

Flexible Models

Landscape genetics combines population genetics, landscape ecology, and spatial analysis to identify landscape and genetic factors that influence genetic and genomic variation. Progress in the field depends on a strong conceptual foundation and the means of identifying mechanistic connnections between environmental factors, landscape features, and genetic or genomic variation. Many existing approaches and much of the software commonly in use was developed for population genetics or statistics and is not entirely appropriate for landscape genetics. Probabilistic graph models provide a statistically rigorous and flexible means of constructing models directly applicable to landscape genetics. Probabilistic graph models also allow construction of mechanistic models, which are crucial elements in testing hypotheses. Sophisticated software exists for the analysis of graph models; however, much of it does not handle the types of data used for landscape genetics, model structures involving autoregressive spatial interaction between variables, or the scale of landscape genetics problems. Thus, an important priority for the field is to develop suitably flexible software tools for graph models that overcome these problems and allow landscape geneticists to explore meaningfully mechanistic and flexible models. We are developing such a library and applying it to examples in landscape genetics.

Download Full-text

Epidemiological associations with genomic variation in SARS-CoV-2

10.21203/rs.3.rs-537082/v1 ◽

2021 ◽

Author(s):

Ali Rahnavard ◽

Rebecca Clement ◽

Nathaniel Stearrett ◽

Marcos Pérez-Losada ◽

Keith A. Crandall ◽

...

Keyword(s):

Severe Acute Respiratory Syndrome ◽

Nonstructural Protein ◽

Protein S ◽

Disease Status ◽

Virus Genome ◽

Genomic Variation ◽

Spike Protein ◽

Genome Variation ◽

Host Sex ◽

Novel Coronavirus

Abstract The 2019 novel coronavirus (SARS-CoV-2) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. To diminish the short-term and long-term impacts of coronavirus (CoV), we investigated CoV differences at the nucleotide and protein level and CoV genomic variation associated with epidemiological variation and geography. We divided the CoV genome into 29 constituent regions for this analysis. Our results highlight the variation of CoV variants of lineage and show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation, which makes these two proteins potential targets for treatments. S protein variation is highly correlated with nsp3, nsp6, and 3'−to−5' exonuclease. Country of origin and time since the start of the pandemic were the most influential metadata in these differences. Host sex and age are the lowest in terms of explaining the virus genome variation. We quantified variation explained by regions of the CoV genome across different CoV viruses including, SARS-CoV-2, Middle East respiratory syndrome coronavirus (MERS-CoV), other severe acute respiratory syndrome coronavirus SARS-CoV (SARS-related), and bat-derived severe acute respiratory syndrome (SARS)-like coronaviruses (Bat-SL-CoV). We found that Spike protein and nsp3 explain most of the variation among these viruses; they are also among the genomic regions with the highest number of sites under natural selection. Our results provide a direction to prioritize genes associated with outcome predictors, including health, therapeutic, and vaccine outcomes, and to inform improved DNA tests for predicting disease status.

Download Full-text

Empirical estimates of the mutation rate for an alphabaculovirus

10.1101/2021.09.07.459225 ◽

2021 ◽

Author(s):

Dieke Boezen ◽

Ghulam Ali ◽

Manli Wang ◽

Xi Wang ◽

Wopke van der Werf ◽

...

Keyword(s):

Mutation Rate ◽

Virus Genome ◽

Good Evidence ◽

Viral Fitness ◽

Mutation Rates ◽

Replication Machinery ◽

Dna Viruses ◽

Genome Data ◽

Empirical Estimates ◽

Large Genomes

AbstractMutation rates are of key importance for understanding evolutionary processes and predicting their outcomes. Empirical estimates of mutation rate are available for a number of RNA viruses, but few are available for DNA viruses, which tend to have larger genomes. Whilst some viruses have very high mutation rates, lower mutation rates are expected for viruses with large genomes to ensure genome integrity. Alphabaculoviruses are insect viruses with large genomes and often have high levels of polymorphism, suggesting high mutation rates despite evidence of proofreading activity by the replication machinery. Here, we report an empirical estimate of the mutation rate per base per strand copying (s/n/r) of Autographa californica multiple nucleopolyhedrovirus (AcMNPV). To avoid biases due to selection, we analyzed mutations that occurred in a stable, non-functional genomic insert after five serial passages in Spodoptera exigua larvae. Population bottlenecks, viral mode of replication and thresholds for mutation detection likely affect mutation rate estimates, and we therefore used population genetic models that account for these processes to infer the mutation rate. We estimated a mutation rate of 1×10−7 s/n/r. This estimate was not sensitive to different model assumptions or including whole genome data. The rates at which different classes of mutations accumulate provide good evidence for neutrality of mutations occurring within the inserted region. We therefore present a robust approach for mutation rate estimation for viruses with stable genomes, and strong evidence of a much lower alphabaculovirus mutation rate than supposed based on the high levels of polymorphism observed.Author SummaryVirus populations can evolve rapidly, driven by the large number of mutations that occur during virus replication. It is challenging to measure mutation rates because selection will affect which mutations are observed: beneficial mutations are overrepresented in virus populations, while deleterious mutations are selected against and therefore underrepresented. Few mutation rates have been estimated for viruses with large DNA genomes, and there are no estimates for any insect virus. Here, we estimate the mutation rate for an alphabaculovirus, a virus that infects caterpillars and has a large, 134 kilobase pair DNA genome. To ensure that selection did not bias our estimate of mutation rate, we studied which mutations occurred in a large artificial region inserted into the virus genome, where mutations did not affect viral fitness. We deep sequenced evolved virus populations, and compared the distribution of observed mutants to predictions from a simulation model to estimate mutation rate. We found evidence for a relatively low mutation rate, of one mutation in every 10 million bases replicated. This estimate is in line with expectations for a virus with self-correcting replication machinery and a large genome.

Download Full-text

Probabilistic graph models for landscape genetics

10.7287/peerj.preprints.2225v5 ◽

2017 ◽

Author(s):

Brook G. Milligan

Keyword(s):

Population Genetics ◽

Landscape Genetics ◽

Genetic Factors ◽

Spatial Interaction ◽

Genomic Variation ◽

Graph Models ◽

Conceptual Foundation ◽

Probabilistic Graph ◽

Model Structures ◽

Flexible Models

Landscape genetics combines population genetics, landscape ecology, and spatial analysis to identify landscape and genetic factors that influence genetic and genomic variation. Progress in the field depends on a strong conceptual foundation and the means of identifying mechanistic connnections between environmental factors, landscape features, and genetic or genomic variation. Many existing approaches and much of the software commonly in use was developed for population genetics or statistics and is not entirely appropriate for landscape genetics. Probabilistic graph models provide a statistically rigorous and flexible means of constructing models directly applicable to landscape genetics. Probabilistic graph models also allow construction of mechanistic models, which are crucial elements in testing hypotheses. Sophisticated software exists for the analysis of graph models; however, much of it does not handle the types of data used for landscape genetics, model structures involving autoregressive spatial interaction between variables, or the scale of landscape genetics problems. Thus, an important priority for the field is to develop suitably flexible software tools for graph models that overcome these problems and allow landscape geneticists to explore meaningfully mechanistic and flexible models. We are developing such a library and applying it to examples in landscape genetics.

Download Full-text