Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes

Abstract Summary Genome detective is a web-based, user-friendly software application to quickly and accurately assemble all known virus genomes from next-generation sequencing datasets. This application allows the identification of phylogenetic clusters and genotypes from assembled genomes in FASTA format. Since its release in 2019, we have produced a number of typing tools for emergent viruses that have caused large outbreaks, such as Zika and Yellow Fever Virus in Brazil. Here, we present the Genome Detective Coronavirus Typing Tool that can accurately identify the novel severe acute respiratory syndrome (SARS)-related coronavirus (SARS-CoV-2) sequences isolated in China and around the world. The tool can accept up to 2000 sequences per submission and the analysis of a new whole-genome sequence will take approximately 1 min. The tool has been tested and validated with hundreds of whole genomes from 10 coronavirus species, and correctly classified all of the SARS-related coronavirus (SARSr-CoV) and all of the available public data for SARS-CoV-2. The tool also allows tracking of new viral mutations as the outbreak expands globally, which may help to accelerate the development of novel diagnostics, drugs and vaccines to stop the COVID-19 disease. Availability and implementation https://www.genomedetective.com/app/typingtool/cov Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes

10.1101/2020.01.31.928796 ◽

2020 ◽

Cited By ~ 8

Author(s):

Sara Cleemput ◽

Wim Dumon ◽

Vagner Fonseca ◽

Wasim Abdool Karim ◽

Marta Giovanetti ◽

...

Keyword(s):

Yellow Fever Virus ◽

Rapid Identification ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Software Application ◽

Public Data ◽

Novel Coronavirus ◽

User Friendly ◽

Virus Genomes ◽

Typing Tool

ABSTRACTSummaryGenome Detective is a web-based, user-friendly software application to quickly and accurately assemble all known virus genomes from next generation sequencing datasets. This application allows the identification of phylogenetic clusters and genotypes from assembled genomes in FASTA format. Since its release in 2019, we have produced a number of typing tools for emergent viruses that have caused large outbreaks, such as Zika and Yellow Fever Virus in Brazil. Here, we present The Genome Detective Coronavirus Typing Tool that can accurately identify novel coronavirus (2019-nCoV) sequences isolated in China and around the world. The tool can accept up to 2,000 sequences per submission and the analysis of a new whole genome sequence will take approximately one minute. The tool has been tested and validated with hundreds of whole genomes from ten coronavirus species, and correctly classified all of the SARS-related coronavirus (SARSr-CoV) and all of the available public data for 2019-nCoV. The tool also allows tracking of new viral mutations as the outbreak expands globally, which may help to accelerate the development of novel diagnostics, drugs and vaccines.AvailabilityAvailable online: https://www.genomedetective.com/app/typingtool/cov*[email protected] and [email protected] informationSupplementary data is available online.

Download Full-text

SARS2020: an integrated platform for identification of novel coronavirus by a consensus sequence-function model

Bioinformatics ◽

10.1093/bioinformatics/btaa767 ◽

2020 ◽

Author(s):

Dachuan Zhang ◽

Tong Zhang ◽

Sheng Liu ◽

Dandan Sun ◽

Shaozhen Ding ◽

...

Keyword(s):

Biological Function ◽

Consensus Sequence ◽

Rapid Identification ◽

Data Driven ◽

Supplementary Information ◽

Respiratory Syndrome Virus ◽

The Novel ◽

Function Model ◽

Catalytic Function ◽

Novel Coronavirus

Abstract Motivation The 2019 novel coronavirus outbreak has significantly affected global health and society. Thus, predicting biological function from pathogen sequence is crucial and urgently needed. However, little work has been conducted to identify viruses by the enzymes that they encode, and which are key to pathogen propagation. Results We built a comprehensive scientific resource, SARS2020, which integrates coronavirus-related research, genomic sequences and results of anti-viral drug trials. In addition, we built a consensus sequence-catalytic function model from which we identified the novel coronavirus as encoding the same proteinase as the severe acute respiratory syndrome virus. This data-driven sequence-based strategy will enable rapid identification of agents responsible for future epidemics. Availabilityand implementation SARS2020 is available at http://design.rxnfinder.org/sars2020/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TIGER: inferring DNA replication timing from whole-genome sequence data

Bioinformatics ◽

10.1093/bioinformatics/btab166 ◽

2021 ◽

Cited By ~ 1

Author(s):

Amnon Koren ◽

Dashiell J Massey ◽

Alexa N Bracci

Keyword(s):

Dna Replication ◽

Genome Sequence ◽

Genomic Dna ◽

Sequence Data ◽

Replication Timing ◽

Whole Genome Sequence ◽

Supplementary Information ◽

Whole Genome ◽

Genome Sequence Data ◽

Dna Replication Timing

Abstract Motivation Genomic DNA replicates according to a reproducible spatiotemporal program, with some loci replicating early in S phase while others replicate late. Despite being a central cellular process, DNA replication timing studies have been limited in scale due to technical challenges. Results We present TIGER (Timing Inferred from Genome Replication), a computational approach for extracting DNA replication timing information from whole genome sequence data obtained from proliferating cell samples. The presence of replicating cells in a biological specimen leads to non-uniform representation of genomic DNA that depends on the timing of replication of different genomic loci. Replication dynamics can hence be observed in genome sequence data by analyzing DNA copy number along chromosomes while accounting for other sources of sequence coverage variation. TIGER is applicable to any species with a contiguous genome assembly and rivals the quality of experimental measurements of DNA replication timing. It provides a straightforward approach for measuring replication timing and can readily be applied at scale. Availability and Implementation TIGER is available at https://github.com/TheKorenLab/TIGER. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

metaXplor: an interactive viral and microbial metagenomic data manager

GigaScience ◽

10.1093/gigascience/giab001 ◽

2021 ◽

Vol 10 (2) ◽

Author(s):

Guilhem Sempéré ◽

Adrien Pétel ◽

Magsen Abbé ◽

Pierre Lefeuvre ◽

Philippe Roumagnac ◽

...

Keyword(s):

Heterogeneous Data ◽

Metagenomic Data ◽

Online Data ◽

Data Repositories ◽

Ongoing Research ◽

Efficient Management ◽

Public Data ◽

Reference Databases ◽

Interactive Data ◽

User Friendly

Abstract Background Efficiently managing large, heterogeneous data in a structured yet flexible way is a challenge to research laboratories working with genomic data. Specifically regarding both shotgun- and metabarcoding-based metagenomics, while online reference databases and user-friendly tools exist for running various types of analyses (e.g., Qiime, Mothur, Megan, IMG/VR, Anvi'o, Qiita, MetaVir), scientists lack comprehensive software for easily building scalable, searchable, online data repositories on which they can rely during their ongoing research. Results metaXplor is a scalable, distributable, fully web-interfaced application for managing, sharing, and exploring metagenomic data. Being based on a flexible NoSQL data model, it has few constraints regarding dataset contents and thus proves useful for handling outputs from both shotgun and metabarcoding techniques. By supporting incremental data feeding and providing means to combine filters on all imported fields, it allows for exhaustive content browsing, as well as rapid narrowing to find specific records. The application also features various interactive data visualization tools, ways to query contents by BLASTing external sequences, and an integrated pipeline to enrich assignments with phylogenetic placements. The project home page provides the URL of a live instance allowing users to test the system on public data. Conclusion metaXplor allows efficient management and exploration of metagenomic data. Its availability as a set of Docker containers, making it easy to deploy on academic servers, on the cloud, or even on personal computers, will facilitate its adoption.

Download Full-text

Two User-Friendly Molecular Markers Developed for the Identification of Hybrid Lethality Genes in Brassica oleracea

Agronomy ◽

10.3390/agronomy11050982 ◽

2021 ◽

Vol 11 (5) ◽

pp. 982

Author(s):

Zhiliang Xiao ◽

Congcong Kong ◽

Fengqing Han ◽

Limei Yang ◽

Mu Zhuang ◽

...

Keyword(s):

Molecular Markers ◽

Brassica Oleracea ◽

Rapid Identification ◽

Inbred Lines ◽

Hybrid Lethality ◽

Specific Pcr ◽

Allele Specific ◽

User Friendly ◽

Allele Specific Pcr ◽

Kasp Marker

Cabbage (Brassica oleracea) is an important vegetable crop that is cultivated worldwide. Previously, we reported the identification of two dominant complementary hybrid lethality (HL) genes in cabbage that could result in the death of hybrids. To avoid such losses in the breeding process, we attempted to develop molecular markers to identify HL lines. Among 54 previous mapping markers closely linked to BoHL1 or BoHL2, only six markers for BoHL2 were available in eight cabbage lines (two BoHL1 lines; three BoHL2 lines; three lines without BoHL); however, they were neither universal nor user-friendly in more inbred lines. To develop more accurate markers, these cabbage lines were resequenced at an ~20× depth to obtain more nucleotide variations in the mapping regions. Then, an InDel in BoHL1 and a single-nucleotide polymorphism (SNP) in BoHL2 were identified, and the corresponding InDel marker MBoHL1 and the competitive allele-specific PCR (KASP) marker KBoHL2 were developed and showed 100% accuracy in eight inbred lines. Moreover, we identified 138 cabbage lines using the two markers, among which one inbred line carried BoHL1 and 11 inbred lines carried BoHL2. All of the lethal line genotypes obtained with the two markers matched the phenotype. Two markers were highly reliable for the rapid identification of HL genes in cabbage.

Download Full-text

Ribo-ODDR: Oligo design pipeline for experiment-specific rRNA depletion in ribo-seq

Bioinformatics ◽

10.1093/bioinformatics/btab171 ◽

2021 ◽

Author(s):

Ferhat Alkan ◽

Joana Silva ◽

Eric Pintó Barberà ◽

William J Faller

Keyword(s):

Ribosome Profiling ◽

Supplementary Information ◽

Experimental Conditions ◽

Computational Framework ◽

Rna Translation ◽

Rrna Depletion ◽

Selection For ◽

Nucleotide Resolution ◽

User Friendly ◽

Oligo Design

Abstract Motivation Ribosome Profiling (Ribo-seq) has revolutionized the study of RNA translation by providing information on ribosome positions across all translated RNAs with nucleotide-resolution. Yet several technical limitations restrict the sequencing depth of such experiments, the most common of which is the overabundance of rRNA fragments. Various strategies can be employed to tackle this issue, including the use of commercial rRNA depletion kits. However, as they are designed for more standardized RNAseq experiments, they may perform suboptimally in Ribo-seq. In order to overcome this, it is possible to use custom biotinylated oligos complementary to the most abundant rRNA fragments, however currently no computational framework exists to aid the design of optimal oligos. Results Here, we first show that a major confounding issue is that the rRNA fragments generated via Ribo-seq vary significantly with differing experimental conditions, suggesting that a “one-size-fits-all” approach may be inefficient. Therefore we developed Ribo-ODDR, an oligo design pipeline integrated with a user-friendly interface that assists in oligo selection for efficient experiment-specific rRNA depletion. Ribo-ODDR uses preliminary data to identify the most abundant rRNA fragments, and calculates the rRNA depletion efficiency of potential oligos. We experimentally show that Ribo-ODDR designed oligos outperform commercially available kits and lead to a significant increase in rRNA depletion in Ribo-seq. Availability Ribo-ODDR is freely accessible at https://github.com/fallerlab/Ribo-ODDR Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CPVA: a web-based metabolomic tool for chromatographic peak visualization and annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa200 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3913-3915

Author(s):

Hemi Luan ◽

Xingen Jiang ◽

Fenfen Ji ◽

Zhangzhang Lan ◽

Zongwei Cai ◽

...

Keyword(s):

False Positive ◽

Supplementary Information ◽

Liquid Chromatography Mass Spectrometry ◽

Targeted Metabolomics ◽

Metabolomics Data ◽

Web Based ◽

Tremendous Amount ◽

Chromatographic Peaks ◽

User Friendly

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

mSystems ◽

10.1128/msystems.00202-17 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 15

Author(s):

Gabriel A. Al-Ghalith ◽

Benjamin Hillmann ◽

Kaiwei Ang ◽

Robin Shields-Cutler ◽

Dan Knights

Keyword(s):

Quality Control ◽

Dna Sequences ◽

Sequence Data ◽

Background Knowledge ◽

Sequencing Technology ◽

Data Set ◽

Short Read ◽

Dna Quality ◽

Public Data ◽

User Friendly

ABSTRACT Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced “shizen”), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.

Download Full-text

MetaADEDB 2.0: a comprehensive database on adverse drug events

Bioinformatics ◽

10.1093/bioinformatics/btaa973 ◽

2020 ◽

Author(s):

Zhuohang Yu ◽

Zengrui Wu ◽

Weihua Li ◽

Guixia Liu ◽

Yun Tang

Keyword(s):

Safety Assessment ◽

Adverse Drug Events ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Supplementary Information ◽

Online Database ◽

Web Interface ◽

Drug Discovery And Development ◽

Comprehensive Information ◽

User Friendly

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Download Full-text