Viral sequence identification SOP with VirSorter2 v3 (protocols.io.bwm5pc86)

protocols.io ◽  
2021 ◽  
Author(s):  
Jiarong Guo ◽  
Dean Vik ◽  
Akbar Adjie ◽  
Simon Roux ◽  
Matthew Sullivan
2021 ◽  
Vol 12 ◽  
Author(s):  
Kai Song

Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus–host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.


protocols.io ◽  
2021 ◽  
Author(s):  
Jiarong Guo ◽  
Dean Vik ◽  
Akbar Adjie ◽  
Simon Roux ◽  
Matthew Sullivan

2020 ◽  
Vol 12 (s1) ◽  
Author(s):  
Rami Kantor ◽  
John P. Fulton ◽  
Jon Steingrimsson ◽  
Vladimir Novitsky ◽  
Mark Howison ◽  
...  

AbstractGreat efforts are devoted to end the HIV epidemic as it continues to have profound public health consequences in the United States and throughout the world, and new interventions and strategies are continuously needed. The use of HIV sequence data to infer transmission networks holds much promise to direct public heath interventions where they are most needed. As these new methods are being implemented, evaluating their benefits is essential. In this paper, we recognize challenges associated with such evaluation, and make the case that overcoming these challenges is key to the use of HIV sequence data in routine public health actions to disrupt HIV transmission networks.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yiren Wang ◽  
Mashari Alangari ◽  
Joshua Hihath ◽  
Arindam K. Das ◽  
M. P. Anantram

Abstract Background The all-electronic Single Molecule Break Junction (SMBJ) method is an emerging alternative to traditional polymerase chain reaction (PCR) techniques for genetic sequencing and identification. Existing work indicates that the current spectra recorded from SMBJ experimentations contain unique signatures to identify known sequences from a dataset. However, the spectra are typically extremely noisy due to the stochastic and complex interactions between the substrate, sample, environment, and the measuring system, necessitating hundreds or thousands of experimentations to obtain reliable and accurate results. Results This article presents a DNA sequence identification system based on the current spectra of ten short strand sequences, including a pair that differs by a single mismatch. By employing a gradient boosted tree classifier model trained on conductance histograms, we demonstrate that extremely high accuracy, ranging from approximately 96 % for molecules differing by a single mismatch to 99.5 % otherwise, is possible. Further, such accuracy metrics are achievable in near real-time with just twenty or thirty SMBJ measurements instead of hundreds or thousands. We also demonstrate that a tandem classifier architecture, where the first stage is a multiclass classifier and the second stage is a binary classifier, can be employed to boost the single mismatched pair’s identification accuracy to 99.5 %. Conclusions A monolithic classifier, or more generally, a multistage classifier with model specific parameters that depend on experimental current spectra can be used to successfully identify DNA strands.


1989 ◽  
Vol 9 (9) ◽  
pp. 3614-3620 ◽  
Author(s):  
S M Aldritt ◽  
J T Joseph ◽  
D F Wirth

We have identified a gene that encodes the polypeptide cytochrome b in the avian malarial parasite Plasmodium gallinaceum. The gene containing the open reading frame was found to be located on a 6.2-kilobase multimeric extrachromosomal element. The amino acid translation from this gene demonstrated significant similarities to cytochrome b sequences from yeast, mammal, and fungus genomes. We present evidence that the P. gallinaceum cytochrome b transcript is part of a larger primary transcript from the element that is subsequently processed. The message for P. gallinaceum cytochrome b was found to be 1.2 kilobases in size. This is the first report identifying a mitochondrial nucleic acid sequence in malaria-causing organisms and suggests that a functional cytochrome system may exist in these parasites.


2012 ◽  
Vol 102 (10) ◽  
pp. 937-947 ◽  
Author(s):  
S. H. De Boer ◽  
X. Li ◽  
L. J. Ward

Pectobacterium atrosepticum, P. carotovorum subsp. brasiliensis, P. carotovorum subsp. carotovorum, and P. wasabiae were detected in potato stems with blackleg symptoms using species- and subspecies-specific polymerase chain reaction (PCR). The tests included a new assay for P. wasabiae based on the phytase gene sequence. Identification of isolates from diseased stems by biochemical or physiological characterization, PCR, and multi-locus sequence typing (MLST) largely confirmed the PCR detection of Pectobacterium spp. in stem samples. P. atrosepticum was most commonly present but was the sole Pectobacterium sp. detected in only 52% of the diseased stems. P. wasabiae was most frequently present in combination with P. atrosepticum and was the sole Pectobacterium sp. detected in 13% of diseased stems. Pathogenicity of P. wasabiae on potato and its capacity to cause blackleg disease were demonstrated by stem inoculation and its isolation as the sole Pectobacterium sp. from field-grown diseased plants produced from inoculated seed tubers. Incidence of P. carotovorum subsp. brasiliensis was low in diseased stems, and the ability of Canadian strains to cause blackleg in plants grown from inoculated tubers was not confirmed. Canadian isolates of P. carotovorum subsp. brasiliensis differed from Brazilian isolates in diagnostic biochemical tests but conformed to the subspecies in PCR specificity and typing by MLST.


10.5219/892 ◽  
2018 ◽  
Vol 12 (1) ◽  
Author(s):  
Jana Žiarovská ◽  
Lucia Zeleňáková ◽  
Miroslava Kačániová ◽  
Eloy Fernández Cusimamani

Author(s):  
Lina Wang ◽  
Fengzhen Chen ◽  
Xueqin Guo ◽  
Lijin You ◽  
Xiaoxia Yang ◽  
...  

AbstractMotivationThe Coronavirus Disease 2019 (COVID-19) pandemic poses a huge threat to human public health. Viral sequence data plays an important role in the scientific prevention and control of epidemics. A comprehensive virus database will be vital useful for virus data retrieval and deep analysis. To promote sharing of virus data, several virus databases and related analyzing tools have been created.ResultsTo facilitate virus research and promote the global sharing of virus data, we present here VirusDIP, a one-stop service platform for archive, integration, access, analysis of virus data. It accepts the submission of viral sequence data from all over the world and currently integrates data resources from the National GeneBank Database (CNGBdb), Global initiative on sharing all influenza data (GISAID), and National Center for Biotechnology Information (NCBI). Moreover, based on the comprehensive data resources, BLAST sequence alignment tool and multi-party security computing tools are deployed for multi-sequence alignment, phylogenetic tree building and global trusted sharing. VirusDIP is gradually establishing cooperation with more databases, and paving the way for the analysis of virus origin and evolution. All public data in VirusDIP are freely available for all researchers worldwide.Availabilityhttps://db.cngb.org/virus/[email protected]


Author(s):  
Zilong Zhang ◽  
Danlei Liu ◽  
Zilei Zhang ◽  
Peng Tian ◽  
Shenwei Li ◽  
...  

AbstractNorovirus is recognized as one of the leading causes of acute gastroenteritis outbreaks. Genotype GII.9 was first detected in Norfolk, VA, USA, in 1997. However, the complete genome sequence of this genotype has not yet been determined. In this study, a complete genome sequence of GII.9[P7] norovirus, SCD1878_GII.9[P7], from a patient was determined using high-throughput sequencing and rapid amplification of cDNA ends (RACE) technology. The complete genome sequence of SCD1878_GII.9[P7] is 7544 nucleotides (nt) in length with a 3’ poly(A) tail and contains three open reading frames. Sequence comparisons indicated that SCD1878_GII.9[P7] shares 92.1%-92.3% nucleotide sequence identity with GII.P7 (AB258331 and AB039777) and 96.7%-97.4% identity with GII.9 (AY038599 and DQ379715). The results suggested that SCD1878_GII.9[P7] is a member of P genotype GII.P7 and G genotype GII.9. This viral sequence fills a gap at the whole-genome level for the GII.9 genotype.


Sign in / Sign up

Export Citation Format

Share Document