scholarly journals VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses

Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Jiarong Guo ◽  
Ben Bolduc ◽  
Ahmed A. Zayed ◽  
Arvind Varsani ◽  
Guillermo Dominguez-Huerta ◽  
...  

Abstract Background Viruses are a significant player in many biosphere and human ecosystems, but most signals remain “hidden” in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. Results Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2’s modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. Conclusion With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available (https://bitbucket.org/MAVERICLab/virsorter2), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse (https://de.cyverse.org/de).

2021 ◽  
Author(s):  
Benbo Gao ◽  
Jing Zhu ◽  
Soumya Negi ◽  
Xinmin Zhang ◽  
Stefka Gyoneva ◽  
...  

AbstractSummaryWe developed Quickomics, a feature-rich R Shiny-powered tool to enable biologists to fully explore complex omics data and perform advanced analysis in an easy-to-use interactive interface. It covers a broad range of secondary and tertiary analytical tasks after primary analysis of omics data is completed. Each functional module is equipped with customized configurations and generates both interactive and publication-ready high-resolution plots to uncover biological insights from data. The modular design makes the tool extensible with ease.AvailabilityResearchers can experience the functionalities with their own data or demo RNA-Seq and proteomics data sets by using the app hosted at http://quickomics.bxgenomics.com and following the tutorial, https://bit.ly/3rXIyhL. The source code under GPLv3 license is provided at https://github.com/interactivereport/[email protected], [email protected] informationSupplementary materials are available at https://bit.ly/37HP17g.


PeerJ ◽  
2019 ◽  
Vol 6 ◽  
pp. e6216 ◽  
Author(s):  
Kishor Dhaygude ◽  
Helena Johansson ◽  
Jonna Kulmuni ◽  
Liselotte Sundström

We present the genome organization and molecular characterization of the three Formica exsecta viruses, along with ORF predictions, and functional annotation of genes. The Formica exsecta virus-4 (FeV4; GenBank ID: MF287670) is a newly discovered negative-sense single-stranded RNA virus representing the first identified member of order Mononegavirales in ants, whereas the Formica exsecta virus-1 (FeV1; GenBank ID: KF500001), and the Formica exsecta virus-2 (FeV2; GenBank ID: KF500002) are positive single-stranded RNA viruses initially identified (but not characterized) in our earlier study. The new virus FeV4 was found by re-analyzing data from a study published earlier. The Formica exsecta virus-4 genome is 9,866 bp in size, with an overall G + C content of 44.92%, and containing five predicted open reading frames (ORFs). Our bioinformatics analysis indicates that gaps are absent and the ORFs are complete, which based on our comparative genomics analysis suggests that the genomes are complete. Following the characterization, we validate virus infection for FeV1, FeV2 and FeV4 for the first time in field-collected worker ants. Some colonies were infected by multiple viruses, and the viruses were observed to infect all castes, and multiple life stages of workers and queens. Finally, highly similar viruses were expressed in adult workers and queens of six other Formica species: F. fusca, F. pressilabris, F. pratensis, F. aquilonia, F. truncorum and F. cinerea. This research indicates that viruses can be shared between ant species, but further studies on viral transmission are needed to understand viral infection pathways.


1977 ◽  
Vol 13 (7) ◽  
pp. 713-720 ◽  
Author(s):  
Dharam V. Ablashi ◽  
Daniel R. Twardzik ◽  
John M. Easton ◽  
Gary R. Armstrong ◽  
Josef Luetzeler ◽  
...  
Keyword(s):  

1984 ◽  
Vol 81 (11) ◽  
pp. 3263-3267 ◽  
Author(s):  
G. Sauer ◽  
E. Amtmann ◽  
K. Melber ◽  
A. Knapp ◽  
K. Muller ◽  
...  

2016 ◽  
Author(s):  
Brent S. Pedersen ◽  
Ryan M. Layer ◽  
Aaron R. Quinlan

ABSTRACTBackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.ResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel “chromosome sweeping” algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.ConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.AvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.


1999 ◽  
Vol 73 (1) ◽  
pp. 297-306 ◽  
Author(s):  
Sean P. J. Whelan ◽  
Gail W. Wertz

ABSTRACT The RNA-dependent RNA polymerase of vesicular stomatitis virus (VSV), a nonsegmented negative-strand RNA virus, directs two discrete RNA synthetic processes, transcription and replication. Available evidence suggests that the two short extragenic regions at the genomic termini, the 3′ leader (Le) and the complement of the 5′ trailer (TrC), contain essential signals for these processes. We examined the roles in transcription and replication of sequences in Le and TrC by monitoring the effects of alterations to the termini of subgenomic replicons, or infectious viruses, on these RNA synthetic processes. Distinct elements in Le were found to be required for transcription that were not required for replication. The promoter for mRNA transcription was shown to include specific sequence elements within Le at positions 19 to 29 and 34 to 46, a separate element at nucleotides 47 to 50, the nontranscribed leader-N gene junction. The sequence requirements for transcription within the Le region could not be supplied by sequences found at the equivalent positions in TrC. In contrast, sequences from either Le or TrC functioned well to signal replication, indicating that within the confines of the VSV termini, the sequence requirements for replication were less stringent. Deletions engineered at the termini showed that the terminal 15 nucleotides of either Le or TrC allowed a minimal level of replication. Within these confines, levels of replication were affected by both the extent of complementarity between the genomic termini and the involvement of the template in transcription. In agreement with our previous observations, increasing the extent of complementarity between the natural termini increased levels of replication, and this effect was most operative at the extreme genome ends. In addition, abolishing the use of Le as a promoter for transcription enhanced replication. These analyses (i) identified signals at the termini required for transcription and replication and (ii) showed that Le functions as a less efficient promoter for replication than TrC at least in part because of its essential role in transcription. Consequently, these observations help explain the asymmetry of VSV replication which results in the synthesis of more negative- than positive-sense replication products in infected cells.


2020 ◽  
Author(s):  
Sandeep Chakraborty

The development of a vaccine for Covid19 is being expedited [1]. The underlying technology for the vaccines are varied: ‘nucleic acid (DNA and RNA), virus-like particle, peptide, viral vector (replicating and non- replicating), recombinant protein, live attenuated virus and inactivated virus’ [2]. Among these, ChAdOx1, a genetically modified, weakened version of a common cold virus (adenovirus) is now in human clinical trials [3]. The ChAd vector (Chimpanzee adenovirus) was introduced in 2012 Chimpanzee adenovirus Y25 [4]. A large proportion of human adults possess significant titres of neutralising antibodies to human Adv, hence the requirement for a different adenovirus. The deletion of a single transcriptional unit, E1, ensures these viruses cant replicate. Other genes like the E3 region may also be deleted. Now, in the Covid19 vaccine ChAdOx1, the spike protein gene from MERS-CoV strain Camel/Qatar/2/2014 ‘was inserted into the E1 locus of a genomic clone of ChAdOx1 using site-specific recombination’ [5].One of the theories about the genesis of SARS-Cov2 is recombination with coronaviruses from pan- golins [6]. Whether or not it happened in SARS-Cov2, there is no denying that such recombinations do happen.How do we know that the spike protein wont be inserted into a human adenovirus using recombination?Human adenovirus shares 95% homology to ChAd. The spike protein may be inserted after the E1 protein in a viable human virus. What will happen after that to the virus is anyone’s guess. Note, that there is precedence for such recombinant adenoviruses - using ‘ping-pong” zoonosis and anthroponosis’, where the genome of a promiscuous pathogen is ‘embedded with evidence of unprecedented multiple, multidirectional, stable, and reciprocal cross-species infections of hosts from three species (human, chimpanzee, and bonobo)’ [7].Another critique - co-stimulation in host cellsA spike protein from SARS-Cov2, which is supposed to bind to ACE2 and CD147 [8], has been inserted in an adenovirus. The adenovirus has its own host-cell receptor preferences [9] - what will be the consequences of co-stimulation in those cells in which both these receptors are expressed?


2021 ◽  
Vol 8 ◽  
Author(s):  
Hanyu Qin ◽  
Jinmin Peng ◽  
Ling Liu ◽  
Jing Wu ◽  
Lingai Pan ◽  
...  

Objectives: To evaluate the performance of metagenomic next generation sequencing (mNGS) using adequate criteria for the detection of pathogens in lower respiratory tract (LRT) samples with a paired comparison to conventional microbiology tests (CMT).Methods: One hundred sixty-seven patients were reviewed from four different intensive care units (ICUs) in mainland China during 2018 with both mNGS and CMT results of LRT samples available. The reads per million ratio (RPMsample/RPMnon−template−control ratio) and standardized strictly mapped reads number (SDSMRN) were the two criteria chosen for identifying positive pathogens reported from mNGS. A McNemar test was used for a paired comparison analysis between mNGS and CMT.Results: One hundred forty-nine cases were counted into the final analysis. The RPMsample/RPMNTC ratio criterion performed better with a higher accuracy for bacteria, fungi, and virus than SDSMRN criterion [bacteria (RPMsample/RPMNTC ratio vs. SDSMRN), 65.1 vs. 55.7%; fungi, 75.8 vs. 71.1%; DNA virus, 86.3 vs. 74.5%; RNA virus, 90.9 vs. 81.8%]. The mNGS was also superior in bacteria detection only if an SDSMRN ≥3 was used as a positive criterion with a paired comparison to culture [SDSMRN positive, 92/149 (61.7%); culture positive, 54/149 (36.2%); p < 0.001]; however, it was outperformed with significantly more fungi and DNA virus identification when choosing both criteria for positive outliers [fungi (RPMsample/RPMNTC ratio vs. SDSMRN vs. culture), 23.5 vs. 29.5 vs. 8.7%, p < 0.001; DNA virus (RPMsample/RPMNTC ratio vs. SDSMRN vs. PCR), 14.1 vs. 20.8 vs. 11.8%, p < 0.05].Conclusions: Metagenomic next generation sequencing may contribute to revealing the LRT infection etiology in hospitalized groups of potential fungal infections and in situations with less access to the multiplex PCR of LRT samples from the laboratory by choosing a wise criterion like the RPMsample/RPMNTC ratio.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Alexander Davis ◽  
Ruli Gao ◽  
Nicholas E. Navin

Abstract Background In single cell DNA and RNA sequencing experiments, the number of cells to sequence must be decided before running an experiment, and afterwards, it is necessary to decide whether sufficient cells were sampled. These questions can be addressed by calculating the probability of sampling at least a defined number of cells from each subpopulation (cell type or cancer clone). Results We developed an interactive web application called SCOPIT (Single-Cell One-sided Probability Interactive Tool), which calculates the required probabilities using a multinomial distribution (www.navinlab.com/SCOPIT). In addition, we created an R package called pmultinom for scripting these calculations. Conclusions Our tool for fast multinomial calculations provide a simple and intuitive procedure for prospectively planning single-cell experiments or retrospectively evaluating if sufficient numbers of cells have been sequenced. The web application can be accessed at navinlab.com/SCOPIT.


Sign in / Sign up

Export Citation Format

Share Document