scholarly journals Comparative analysis of gene prediction tools for viral genome annotation

2021 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Revathy Krishnamurthi ◽  
Heather E. Allison ◽  
Ian B. Goodhead ◽  
Chloe E. James

The number of newly available viral genomes and metagenomes has increased exponentially since the development of high throughput sequencing platforms and genome analysis tools. Bioinformatic annotation pipelines are largely based on open reading frame (ORF) calling software, which identifies genes independently of the sequence taxonomical background. Although ORF-calling programs provide a rapid genome annotation, they can misidentify ORFs and start codons; errors that might be perpetuated and propagated over time. This study evaluated the performance of multiple ORF-calling programs for viral genome annotation against the complete RefSeq viral database. Programs outputs varied when considering the viral nucleic acid type versus the viral host. According to the number of ORFs, Prodigal and Metaprodigal were the most accurate programs for DNA viruses, while FragGeneScan and Prodigal generated the most accurate outputs for RNA viruses. Similarly, Prodigal outperformed the benchmark for viruses infecting prokaryotes, and GLIMMER and GeneMarkS produced the most accurate annotations for viruses infecting eukaryotes. When the coordinates of the ORFs were considered, Prodigal scored high for all scenarios except for RNA viruses, where GeneMarkS generated the most reliable results. Overall, the quality of the coordinates predicted for RNA viruses was poorer than for DNA viruses, suggesting the need for improved ORF-calling programs to deal with RNA viruses. Moreover, none of the ORF-calling programs reached 90% accuracy for annotation of DNA viruses. Any automatic annotation can still be improved by manual curation, especially when the presence of ORFs is validated with wet-lab experiments. However, our evaluation of the current ORF-calling programs is expected to be useful for the improvement of viral genome annotation pipelines and highlights the need for more expression data to improve the rigor of reference genomes.

Viruses ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 779
Author(s):  
Man Teng ◽  
Yongxiu Yao ◽  
Venugopal Nair ◽  
Jun Luo

In recent years, the CRISPR/Cas9-based gene-editing techniques have been well developed and applied widely in several aspects of research in the biological sciences, in many species, including humans, animals, plants, and even in viruses. Modification of the viral genome is crucial for revealing gene function, virus pathogenesis, gene therapy, genetic engineering, and vaccine development. Herein, we have provided a brief review of the different technologies for the modification of the viral genomes. Particularly, we have focused on the recently developed CRISPR/Cas9-based gene-editing system, detailing its origin, functional principles, and touching on its latest achievements in virology research and applications in vaccine development, especially in large DNA viruses of humans and animals. Future prospects of CRISPR/Cas9-based gene-editing technology in virology research, including the potential shortcomings, are also discussed.


2021 ◽  
Author(s):  
Hanna Retallack ◽  
Katerina D. Popova ◽  
Matthew T. Laurie ◽  
Sara Sunshine ◽  
Joseph L. DeRisi

Narnaviruses are RNA viruses detected in diverse fungi, plants, protists, arthropods and nematodes. Though initially described as simple single-gene non-segmented viruses encoding RNA-dependent RNA polymerase (RdRp), a subset of narnaviruses referred to as “ambigrammatic” harbor a unique genomic configuration consisting of overlapping open reading frames (ORFs) encoded on opposite strands. Phylogenetic analysis supports selection to maintain this unusual genome organization, but functional investigations are lacking. Here, we establish the mosquito-infecting Culex narnavirus 1 (CxNV1) as a model to investigate the functional role of overlapping ORFs in narnavirus replication. In CxNV1, a reverse ORF without homology to known proteins covers nearly the entire 3.2 kb segment encoding the RdRp. Additionally, two opposing and nearly completely overlapping novel ORFs are found on the second putative CxNV1 segment, the 0.8 kb “Robin” RNA. We developed a system to launch CxNV1 in a naïve mosquito cell line, then showed that functional RdRp is required for persistence of both segments, and an intact reverse ORF is required on the RdRp segment for persistence. Mass spectrometry of persistently CxNV1-infected cells provided evidence for translation of this reverse ORF. Finally, ribosome profiling yielded a striking pattern of footprints for all four CxNV1 RNA strands that was distinct from actively-translating ribosomes on host mRNA or co-infecting RNA viruses. Taken together, these data raise the possibility that the process of translation itself is important for persistence of ambigrammatic narnaviruses, potentially by protecting viral RNA with ribosomes, thus suggesting a heretofore undescribed viral tactic for replication and transmission. IMPORTANCE Fundamental to our understanding of RNA viruses is a description of which strand(s) of RNA are transmitted as the viral genome, relative to which encode the viral proteins. Ambigrammatic narnaviruses break the mold. These viruses, found broadly in fungi, plants, and insects, have the unique feature of two overlapping genes encoded on opposite strands, comprising nearly the full length of the viral genome. Such extensive overlap is not seen in other RNA viruses, and comes at the cost of reduced evolutionary flexibility in the sequence. The present study is motivated by investigating the benefits which balance that cost. We show for the first time a functional requirement for the ambigrammatic genome configuration in Culex narnavirus 1, which suggests a model for how translation of both strands might benefit this virus. Our work highlights a new blueprint for viral persistence, distinct from strategies defined by canonical definitions of the coding strand.


2019 ◽  
Vol 6 (1) ◽  
pp. 275-296 ◽  
Author(s):  
Tami L. Coursey ◽  
Alison A. McBride

Persistent viral infections require a host cell reservoir that maintains functional copies of the viral genome. To this end, several DNA viruses maintain their genomes as extrachromosomal DNA minichromosomes in actively dividing cells. These viruses typically encode a viral protein that binds specifically to viral DNA genomes and tethers them to host mitotic chromosomes, thus enabling the viral genomes to hitchhike or piggyback into daughter cells. Viruses that use this tethering mechanism include papillomaviruses and the gammaherpesviruses Epstein-Barr virus and Kaposi's sarcoma-associated herpesvirus. This review describes the advantages and consequences of persistent extrachromosomal viral genome replication.


2020 ◽  
Author(s):  
Allison L. Didychuk ◽  
Stephanie N. Gates ◽  
Matthew R. Gardner ◽  
Lisa M. Strong ◽  
Andreas Martin ◽  
...  

Genome packaging in large double-stranded DNA viruses requires a powerful molecular motor to force the viral genome into nascent capsids. This process appears mechanistically similar in two evolutionarily distant viruses, the herpesviruses and the tailed bacteriophages, which infect different kingdoms of life. While the motor and mechanism as a whole are thought to be conserved, accessory factors that influence packaging are divergent and poorly understood, despite their essential roles. An accessory factor required for herpesviral packaging is encoded by ORF68 in the oncogenic virus Kaposi’s sarcoma-associated herpesvirus (KSHV), whose homolog in Epstein Barr Virus (EBV) is BFLF1. Here, we present structures of both KSHV ORF68 and EBV BFLF1, revealing that these proteins form a highly similar homopentameric ring. The central channel of this ring is positively charged, and we demonstrate that this region of KSHV ORF68 binds double-stranded DNA. Mutation of individual positively charged residues within but not outside the channel ablates DNA binding, and in the context of KSHV infection these mutants fail to package the viral genome or produce progeny virions. Thus, we propose a model in which ORF68 facilitates the transfer of newly replicated viral genomes to the packaging motor.


2020 ◽  
Author(s):  
Hanna Retallack ◽  
Katerina D. Popova ◽  
Matthew T. Laurie ◽  
Sara Sunshine ◽  
Joseph L. DeRisi

ABSTRACTNarnaviruses are RNA viruses detected in diverse fungi, plants, protists, arthropods and nematodes. Though initially described as simple single-gene non-segmented viruses encoding RNA-dependent RNA polymerase (RdRp), a subset of narnaviruses referred to as “ambigrammatic” harbor a unique genomic configuration consisting of overlapping open reading frames (ORFs) encoded on opposite strands. Phylogenetic analysis supports selection to maintain this unusual genome organization, but functional investigations are lacking. Here, we establish the mosquito-infecting Culex narnavirus 1 (CxNV1) as a model to investigate the functional role of overlapping ORFs in narnavirus replication. In CxNV1, a reverse ORF without homology to known proteins covers nearly the entire 3.2 kb segment encoding the RdRp. Additionally, two opposing and nearly completely overlapping novel ORFs are found on the second putative CxNV1 segment, the 0.8 kb “Robin” RNA. We developed a system to launch CxNV1 in a naïve mosquito cell line, then showed that functional RdRp is required for persistence of both segments, and an intact reverse ORF is required on the RdRp segment for persistence. Mass spectrometry of persistently CxNV1-infected cells provided evidence for translation of this reverse ORF. Finally, ribosome profiling yielded a striking pattern of footprints for all four CxNV1 RNA strands that was distinct from actively-translating ribosomes on host mRNA or co-infecting RNA viruses. Taken together, these data raise the possibility that the process of translation itself is important for persistence of ambigrammatic narnaviruses, potentially by protecting viral RNA with ribosomes, thus suggesting a heretofore undescribed viral tactic for replication and transmission.IMPORTANCEFundamental to our understanding of RNA viruses is a description of which strand(s) of RNA are transmitted as the viral genome, relative to which encode the viral proteins. Ambigrammatic narnaviruses break the mold. These viruses, found broadly in fungi, plants, and insects, have the unique feature of two overlapping genes encoded on opposite strands, comprising nearly the full length of the viral genome. Such extensive overlap is not seen in other RNA viruses, and comes at the cost of reduced evolutionary flexibility in the sequence. The present study is motivated by investigating the benefits which balance that cost. We show for the first time a functional requirement for the ambigrammatic genome configuration in Culex narnavirus 1, which suggests a model for how translation of both strands might benefit this virus. Our work highlights a new blueprint for viral persistence, distinct from strategies defined by canonical definitions of the coding strand.


2019 ◽  
Author(s):  
J. Pace ◽  
K. Youens-Clark ◽  
C. Freeman ◽  
B. Hurwitz ◽  
K. Van Doorslaer

ABSTRACTHigh-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on the genomic organization of that virus. There is a need to develop analytical approaches to reconstruct, annotate, and classify viral genomes. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate novel papillomaviruses. PuMA provides an accessible method for automated papillomavirus genome annotation. PuMA currently has a 98% accuracy when benchmarked against the 481 reference genomes in the papillomavirus episteme (PaVE). Finally, PuMA was used to annotate 168 newly isolated papillomaviruses, and successfully annotated 1424 viral features. To demonstrate its general applicability, we developed a version of PuMA that can annotate polyomaviruses.PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma)


2012 ◽  
Vol 86 (18) ◽  
pp. 10036-10046 ◽  
Author(s):  
Virginie Sauvage ◽  
Meriadeg Ar Gouilh ◽  
Justine Cheval ◽  
Erika Muth ◽  
Kevin Pariente ◽  
...  

During a study of the fecal microbiomes from two healthy piglets using high-throughput sequencing (HTS), we identified a viral genome containing an open reading frame encoding a predicted polyprotein of 2,133 amino acids. This novel viral genome displayed the typical organization of picornaviruses, containing three structural proteins (VP0, VP3, and VP1), followed by seven nonstructural proteins (2A, 2B, 2C, 3A, 3B, 3Cpro, and 3Dpol). Given its particular relationship withParechovirus, we propose to name it “Pasivirus” forParechosister clade virus, with “Swine pasivirus 1” (SPaV1) as the type species. Fecal samples collected at an industrial farm from healthy sows and piglets from the same herd (25 and 75, respectively) with ages ranging from 4 to 28 weeks were analyzed for the presence of SPaV1 by one-step reverse transcription (RT)-PCR targeting a 3D region of 151 bp. SPaV1 was detected in fecal samples from 51/75 healthy piglets (68% of the animals) and in none of the 25 fecal samples from healthy sows, indicating that SPaV1 circulates through enteric infection of healthy piglets. We propose that SPaV1 represents the first member of a novelPicornaviridaegenus related to parechoviruses.


mBio ◽  
2014 ◽  
Vol 5 (3) ◽  
Author(s):  
Jason T. Ladner ◽  
Brett Beitzel ◽  
Patrick S. G. Chain ◽  
Matthew G. Davenport ◽  
Eric Donaldson ◽  
...  

ABSTRACT Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.


2021 ◽  
Author(s):  
Adrian A. Pater ◽  
Michael S. Bosmeny ◽  
Mansi Parasrampuria ◽  
Seth B. Eddington ◽  
Katy N. Ovington ◽  
...  

ABSTRACTIn late 2019, a novel coronavirus began spreading in Wuhan, China, causing a potentially lethal respiratory viral infection. By early 2020, the novel coronavirus, called SARS-CoV-2, had spread globally, causing the COVID-19 pandemic. The infection and mutation rates of SARS-CoV-2 make it amenable to tracking movement and evolution by viral genome sequencing. Efforts to develop effective public health policies, therapeutics, or vaccines to treat or prevent COVID-19 are also expected to benefit from tracking mutations of the SARS-CoV-2 virus. Here we describe a set of comprehensive working protocols, from viral RNA extraction to analysis using online visualization tools, for high throughput sequencing of SARS-CoV-2 viral genomes using a MinION instrument. This set of protocols should serve as a reliable ‘how-to’ reference for generating quality SARS-CoV-2 genome sequences with ARTIC primer sets and next-generation nanopore sequencing technology. In addition, many of the preparation, quality control, and analysis steps will be generally applicable to other sequencing platforms.


2018 ◽  
Author(s):  
Enrique González-Tortuero ◽  
Thomas David Sean Sutton ◽  
Vimalkumar Velayudhan ◽  
Andrey Nikolaevich Shkoporov ◽  
Lorraine Anne Draper ◽  
...  

AbstractViral (meta)genomics is a rapidly growing field of study that is hampered by an inability to annotate the majority of viral sequences; therefore, the development of new bioinformatic approaches is very important. Here, we present a new automatic de novo genome annotation pipeline, called VIGA, to annotate prokaryotic and eukaryotic viral sequences from (meta)genomic studies. VIGA was benchmarked on a database of known viral genomes and a viral metagenomics case study. VIGA generated the most accurate outputs according to the number of coding sequences and their coordinates, outputs also had a lower number of non-informative annotations compared to other programs.


Sign in / Sign up

Export Citation Format

Share Document