viral annotation
Recently Published Documents


TOTAL DOCUMENTS

4
(FIVE YEARS 3)

H-INDEX

1
(FIVE YEARS 1)

2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Josh Pace ◽  
Ken Youens-Clark ◽  
Cordell Freeman ◽  
Bonnie Hurwitz ◽  
Koenraad Van Doorslaer

Abstract High-throughput sequencing technologies provide unprecedented power to identify novel viruses from a wide variety of (environmental) samples. The field of ‘viral metagenomics’ has dramatically expanded our understanding of viral diversity. Viral metagenomic approaches imply that many novel viruses will not be described by researchers who are experts on (the genomic organization of) that virus family. We have developed the papillomavirus annotation tool (PuMA) to provide researchers with a convenient and reproducible method to annotate and report novel papillomaviruses. PuMA currently correctly annotates 99% of the papillomavirus genes when benchmarked against the 655 reference genomes in the papillomavirus episteme. Compared to another viral annotation pipeline, PuMA annotates more viral features while being more accurate. To demonstrate its general applicability, we also developed a preliminary version of PuMA that can annotate polyomaviruses. PuMA is available on GitHub (https://github.com/KVD-lab/puma) and through the iMicrobe online environment (https://www.imicrobe.us/#/apps/puma).


2019 ◽  
Author(s):  
Alejandro A Schäffer ◽  
Eneida L Hatcher ◽  
Linda Yankie ◽  
Lara Shonkwiler ◽  
J Rodney Brister ◽  
...  

AbstractBackgroundGenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions.ResultsWe developed a system called VADR (Viral Annotation DefineR) that validates and annotates viral sequences in GenBank submissions. The annotation system is based on the analysis of the input nucleotide sequence using models built from curated RefSeqs. Hidden Markov models are used to classify sequences by determining the RefSeq they are most similar to, and feature annotation from the RefSeq is mapped based on a nucleotide alignment of the full sequence to a covariance model. Predicted proteins encoded by the sequence are validated with nucleotide-to-protein alignments using BLAST. The system identifies 43 types of “alerts” that (unlike the previous BLAST-based system) provide deterministic and rigorous feedback to researchers who submit sequences with unexpected characteristics. VADR has been integrated into GenBank’s submission processing pipeline allowing for viral submissions passing all tests to be accepted and annotated automatically, without the need for any human (GenBank indexer) intervention. Unlike the previous submission-checking system, VADR is freely available (https://github.com/nawrockie/vadr) for local installation and use. VADR has been used for Norovirus submissions since May 2018 and for Dengue virus submissions since January 2019. Other viruses with high numbers of submissions will be added incrementally.ConclusionVADR improves the speed with which non-flu virus submissions to GenBank can be checked and improves the content and quality of the GenBank annotations. The availability and portability of the software allow researchers to run the GenBank checks prior to submitting their viral sequences, and thereby gain confidence that their submissions will be accepted immediately without the need to correspond with GenBank staff. Reciprocally, the adoption of VADR frees GenBank staff to spend more time on services other than checking routine viral sequence submissions.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ryan C. Shean ◽  
Negar Makhsous ◽  
Graham D. Stoddard ◽  
Michelle J. Lin ◽  
Alexander L. Greninger

2018 ◽  
Author(s):  
Ryan C. Shean ◽  
Negar Makhsous ◽  
Graham D. Stoddard ◽  
Michelle J. Lin ◽  
Alexander L. Greninger

AbstractBackgroundWith sequencing technologies becoming cheaper and easier to use, more groups are able to obtain whole genome sequences of viruses of public health and scientific importance. Submission of genomic data to NCBI GenBank is a requirement prior to publication and plays a critical role in making scientific data publicly available.GenBank currently has automatic prokaryotic and eukaryotic genome annotation pipelines but has no viral annotation pipeline beyond influenza virus. Annotation and submission of viral genome sequence is a non-trivial task, especially for groups that do not routinely interact with GenBank for data submissions.ResultsWe present Viral Annotation Pipeline and iDentification (VAPiD), a portable and lightweight command-line tool for annotation and GenBank deposition of viral genomes. VAPiD supports annotation of nearly all unsegmented viral genomes. The pipeline has been validated on human immunodeficiency virus, human parainfluenza virus 1-4, human metapneumovirus, human coronaviruses (229E/OC43/NL63/HKU1/SARS/MERS), human enteroviruses/rhinoviruses, measles virus, mumps virus, Hepatitis A-E Virus, Chikungunya virus, dengue virus, and West Nile virus, as well the human polyomaviruses BK/JC/MCV, human adenoviruses, and human papillomaviruses. The program can handle individual or batch submissions of different viruses to GenBank and correctly annotates multiple viruses, including those that contain ribosomal slippage or RNA editing without prior knowledge of the virus to be annotated. VAPiD is programmed in Python and is compatible with Windows, Linux, and Mac OS systems.ConclusionsWe have created a portable, lightweight, user-friendly, internet-enabled, open-source, command-line genome annotation and submission package to facilitate virus genome submissions to NCBI GenBank. Instructions for downloading and installing VAPiD can be found athttps://github.com/rcs333/VAPiD.


Sign in / Sign up

Export Citation Format

Share Document