scholarly journals Ribovirus classification by a polymerase barcode sequence

Author(s):  
Artem Babaian ◽  
Robert C. Edgar

Abstract RNA viruses encoding a polymerase gene (riboviruses) dominate the known eukaryotic virome. Next-generation sequencing is revealing a wealth of new riboviruses with uncharacterised phenotypes, precluding classification by traditional taxonomic methods. These are often classified on the basis of polymerase sequence identity, but standardised methods to support this approach are currently lacking. To address this need, we describe the polymerase palmprint, a well-defined segment of the palm sub-domain delineated by well-conserved catalytic motifs. We present a novel algorithm, Palmscan, which identifies palmprints in nucleotide and amino acid sequences. We describe PALMdb, a reference database of palmprints derived from public sequence databases. Palmscan source code and PALMdb data are deposited at https://github.com/rcedgar/palmscan and https://github.com/rcedgar/palmdb, respectively.

2018 ◽  
Author(s):  
Tim Hsiau ◽  
David Conant ◽  
Nicholas Rossi ◽  
Travis Maures ◽  
Kelsey Waite ◽  
...  

AbstractEfficient precision genome editing requires a quick, quantitative, and inexpensive assay of editing outcomes. Here we present ICE (Inference of CRISPR Edits), which enables robust analysis of CRISPR edits using Sanger data. ICE proposes potential outcomes for editing with guide RNAs (gRNAs) and then determines which are supported by the data via regression. Additionally, we develop a score called ICE-D (Discordance) that can provide information on large or unexpected edits. We empirically confirm through over 1,800 edits that the ICE algorithm is robust, reproducible, and can analyze CRISPR experiments within days after transfection. We also confirm that ICE strongly correlates with next-generation sequencing of amplicons (Amp-Seq). The ICE tool is free to use and offers several improvements over current analysis tools. For instance, ICE can analyze individual experiments as well as multiple experiments simultaneously (batch analysis). ICE can also detect a wider variety of outcomes, including multi-guide edits (multiple gRNAs per target) and edits resulting from homology-directed repair (HDR), such as knock-ins and base edits. ICE is a reliable analysis tool that can significantly expedite CRISPR editing workflows. It is available online at ice.synthego.com, and the source code is at github.com/synthego-open/ice


2004 ◽  
Vol 70 (6) ◽  
pp. 3700-3705 ◽  
Author(s):  
C. P. D. Brussaard ◽  
S. M. Short ◽  
C. M. Frederickson ◽  
C. A. Suttle

ABSTRACT Viruses infecting the harmful bloom-causing alga Phaeocystis globosa (Prymnesiophyceae) were readily isolated from Dutch coastal waters (southern North Sea) in 2000 and 2001. Our data show a large increase in the abundance of putative P. globosa viruses during blooms of P. globosa, suggesting that viruses are an important source of mortality for this alga. In order to examine genetic relatedness among viruses infecting P. globosa and other phytoplankton, DNA polymerase gene (pol) fragments were amplified and the inferred amino acid sequences were phylogenetically analyzed. The results demonstrated that viruses infecting P. globosa formed a closely related monophyletic group within the family Phycodnaviridae, with at least 96.9% similarity to each other. The sequences grouped most closely with others from viruses that infect the prymnesiophyte algae Chrysochromulina brevifilum and Chrysochromulina strobilus. Whether the P. globosa viruses belong to the genus Prymnesiovirus or form a separate group needs further study. Our data suggest that, like their phytoplankton hosts, the Chrysochromulina and Phaeocystis viruses share a common ancestor and that these prymnesioviruses and their algal host have coevolved.


2021 ◽  
Author(s):  
Sebastiaan Valkiers ◽  
Max Van Houcke ◽  
Kris Laukens ◽  
Pieter Meysman

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


2020 ◽  
Vol 70 (12) ◽  
pp. 6418-6427
Author(s):  
Ahmet Adiguzel ◽  
Hilal Ay ◽  
Mustafa Ozkan Baltaci ◽  
Sumeyya Akbulut ◽  
Seyda Albayrak ◽  
...  

A novel Gram-stain-positive, rod-shaped, endospore-forming, motile, aerobic bacterium, designated as P2T, was isolated from a hot spring water sample collected from Ilica-Erzurum, Turkey. Phylogenetic analyses based on 16S rRNA gene sequence comparisons affiliated strain P2T with the genus Bacillus , and the strain showed the highest sequence identity to Bacillus azotoformans NBRC 15712T (96.7 %). However, the pairwise sequence comparisons of the 16S rRNA genes revealed that strain P2T shared only 94.7 % sequence identity with Bacillus subtilis subsp. subtilis NCIB 3610T, indicating that strain P2T might not be a member of the genus Bacillus . The digital DNA–DNA hybridization and average nucleotide identity values between strain P2T and B. azotoformans NBRC 15712T were 19.8 and 74.2 %, respectively. The cell-wall peptidoglycan of strain P2T contained meso-diaminopimelic acid. The polar lipid profile consisted of diphosphatidylglycerol, phosphatidylglycerol, phosphatidylethanolamine, an aminophospholipid, five unidentified phospholipids and two unidentified lipids while the predominant isoprenoid quinone was MK-7. The major fatty acids were iso-C15 : 0 and iso-C16 : 0. The draft genome of strain P2T was composed of 82 contigs and found to be 3.5 Mb with 36.1 mol% G+C content. The results of phylogenomic and phenotypic analyses revealed that strain P2T represents a novel genus in the family Bacillaceae , for which the name Calidifontibacillus erzurumensis gen. nov., sp. nov. is proposed. The type strain of Calidifontibacillus erzurumensis is P2T (=CECT 9886T=DSM 107530T=NCCB 100675T). Based on the results of the present study, it is also suggested that Bacillus azotoformans and Bacillus oryziterrae should be transferred to this novel genus as Calidifontibacillus azotoformans comb. nov. and Calidifontibacillus oryziterrae comb. nov., respectively.


2020 ◽  
Author(s):  
N Goonasekera ◽  
A Mahmoud ◽  
J Chilton ◽  
E Afgan

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner


2019 ◽  
Vol 15 (1) ◽  
Author(s):  
Martí Cortey ◽  
Ivan Díaz ◽  
Anna Vidal ◽  
Gerard Martín-Valls ◽  
Giovanni Franzo ◽  
...  

Abstract Background Diarrhoea is a major cause of death in neonate pigs and most of the viruses that cause it are RNA viruses. Next Generation Sequencing (NGS) deeply characterize the genetic diversity among rapidly mutating virus populations at the interspecific as well as the intraspecific level. The diversity of RNA viruses present in faeces of neonatal piglets suffering from diarrhoea in 47 farms, plus 4 samples from non-diarrhoeic piglets has been evaluated by NGS. Samples were selected among the cases submitted to the Veterinary Diagnostic Laboratories of Infectious Diseases of the Universitat Autònoma de Barcelona (Barcelona, Spain) and Universidad de León (León, Spain). Results The analyses identified the presence of 12 virus species corresponding to 8 genera of RNA viruses. Most samples were co-infected by several viruses. Kobuvirus and Rotavirus were more commonly reported, with Sapovirus, Astrovirus 3, 4 and 5, Enterovirus G, Porcine epidemic diarrhoea virus, Pasivirus and Posavirus being less frequently detected. Most sequences showed a low identity with the sequences deposited in GenBank, allowing us to propose several new VP4 and VP7 genotypes for Rotavirus B and Rotavirus C. Conclusions Among the cases analysed, Rotaviruses were the main aetiological agents of diarrhoea in neonate pigs. Besides, in a small number of cases Kobuvirus and Sapovirus may also have an aetiological role. Even most animals were co-infected in early life, the association with enteric disease among the other examined viruses was unclear. The NGS method applied successfully characterized the RNA virome present in faeces and detected a high level of unreported intraspecific diversity.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abu Sayed Chowdhury ◽  
Sarah M. Reehl ◽  
Kylene Kehn-Hall ◽  
Barney Bishop ◽  
Bobbie-Jo M. Webb-Robertson

Abstract The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR.


mSphere ◽  
2019 ◽  
Vol 4 (2) ◽  
Author(s):  
Marli Vlok ◽  
Andrew S. Lang ◽  
Curtis A. Suttle

ABSTRACTRNA viruses, particularly genetically diverse members of thePicornavirales, are widespread and abundant in the ocean. Gene surveys suggest that there are spatial and temporal patterns in the composition of RNA virus assemblages, but data on their diversity and genetic variability in different oceanographic settings are limited. Here, we show that specific RNA virus genomes have widespread geographic distributions and that the dominant genotypes are under purifying selection. Genomes from three previously unknown picorna-like viruses (BC-1, -2, and -3) assembled from a coastal site in British Columbia, Canada, as well as marine RNA viruses JP-A, JP-B, andHeterosigma akashiwoRNA virus exhibited different biogeographical patterns. Thus, biotic factors such as host specificity and viral life cycle, and not just abiotic processes such as dispersal, affect marine RNA virus distribution. Sequence differences relative to reference genomes imply that virus quasispecies are under purifying selection, with synonymous single-nucleotide variations dominating in genomes from geographically distinct regions resulting in conservation of amino acid sequences. Conversely, sequences from coastal South Africa that mapped to marine RNA virus JP-A exhibited more nonsynonymous mutations, probably representing amino acid changes that accumulated over a longer separation. This biogeographical analysis of marine RNA viruses demonstrates that purifying selection is occurring across oceanographic provinces. These data add to the spectrum of known marine RNA virus genomes, show the importance of dispersal and purifying selection for these viruses, and indicate that closely related RNA viruses are pathogens of eukaryotic microbes across oceans.IMPORTANCEVery little is known about aquatic RNA virus populations and genome evolution. This is the first study that analyzes marine environmental RNA viral assemblages in an evolutionary and broad geographical context. This study contributes the largest marine RNA virus metagenomic data set to date, substantially increasing the sequencing space for RNA viruses and also providing a baseline for comparisons of marine RNA virus diversity. The new viruses discovered in this study are representative of the most abundant family of marine RNA viruses, theMarnaviridae, and expand our view of the diversity of this important group. Overall, our data and analyses provide a foundation for interpreting marine RNA virus diversity and evolution.


Sign in / Sign up

Export Citation Format

Share Document