Towards routine employment of computational tools for antimicrobial resistance determination via high-throughput sequencing

Antimicrobial resistance (AMR) is a growing threat to public health and farming at large. Without appropriate interventions, it can lead to millions of deaths per year and substantial economic loss worldwide. In clinical and veterinary practice, a timely characterization of the antibiotic susceptibility profile of bacterial infections is a crucial step in optimizing treatment. Fast turnaround of AMR testing is also needed in food safety and infection control surveillance (e.g., contamination of healthcare or long-term nursing facilities). High-throughput sequencing is a promising option for clinical point-of-care and ecological surveillance, opening the opportunity to develop genotyping-based AMR determination as a possibly faster alternative to phenotypic testing. In the present work, we compare the performance of state-of-the-art methods for detection of AMR from high-throughput sequencing data in healthcare settings. We consider five complementary computational approaches --alignment (AMRPlusPlus), deep learning (DeepARG), k-mer genomic signatures (KARGA, ResFinder), and hidden Markov models (Meta-MARC). We use an extensive collection of clinical studies never employed for model training. To do so, we assemble data from multiple, independent AMR high-throughput sequencing experiments collected in a variety of hospital settings, comprising of 585 isolates with a available AMR resistance profiles determined by phenotypic tests across nine antibiotic classes. We show how the prediction landscape of AMR classifiers is highly heterogeneous, with balanced accuracy varying from 0.4 to 0.92. Although some algorithms---ResFinder, KARGA, and AMRPlusPlus-- exhibit overall better balanced accuracy than others, the high per-AMR-class variance and related findings suggest that: (1) all algorithms might be subject to sampling bias present both in data repositories used for training and experimental/clinical settings; and (2) a portion of clinical samples might contain uncharacterized AMR genes that the algorithms---mostly trained on known AMR genes---fail to generalize upon. These results lead us to formulate practical advice for software configuration and application, as well as give suggestions for future study design to further develop AMR prediction tools from proof-of-concept to bedside.

Download Full-text

DisCVR: Rapid viral diagnosis from high-throughput sequencing data

Virus Evolution ◽

10.1093/ve/vez033 ◽

2019 ◽

Vol 5 (2) ◽

Cited By ~ 3

Author(s):

Maha Maabar ◽

Andrew J Davison ◽

Matej Vučak ◽

Fiona Thorburn ◽

Pablo R Murcia ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Clinical Sample ◽

High Sensitivity ◽

Clinical Samples ◽

Upper Respiratory Tract ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Tract Infections ◽

Human Viruses

Abstract High-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance, and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample k-mers (twenty-two nucleotide sequences) to k-mers from taxonomically labeled viral genomes. DisCVR was validated using published HTS data for eighty-nine clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from http://bioinformatics.cvr.ac.uk/discvr.php.

Download Full-text

V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput sequencing data

10.1101/2020.06.09.142919 ◽

2020 ◽

Cited By ~ 4

Author(s):

Susana Posada-Céspedes ◽

David Seifert ◽

Ivan Topolsky ◽

Karin J. Metzner ◽

Niko Beerenwinkel

Keyword(s):

Genetic Diversity ◽

High Throughput ◽

High Throughput Sequencing ◽

Viral Infections ◽

Markov Models ◽

Large Data ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Bioinformatics Pipeline ◽

The Impact

AbstractHigh-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. V-pipe is freely available at https://github.com/cbg-ethz/V-pipe.

Download Full-text

LABRADOR—A Computational Workflow for Virus Detection in High-Throughput Sequencing Data

Viruses ◽

10.3390/v13122541 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2541

Author(s):

Izabela Fabiańska ◽

Stefan Borutzki ◽

Benjamin Richter ◽

Hon Q. Tran ◽

Andreas Neubert ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

De Novo ◽

Virus Detection ◽

Third Party ◽

Reference Sequence ◽

Clinical Samples ◽

Sequencing Data

High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR—an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals’ safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.

Download Full-text

BacCapSeq: a Platform for Diagnosis and Characterization of Bacterial Infections

mBio ◽

10.1128/mbio.02007-18 ◽

2018 ◽

Vol 9 (5) ◽

Cited By ~ 11

Author(s):

Orchid M. Allicock ◽

Cheng Guo ◽

Anne-Catrin Uhlemann ◽

Susan Whittier ◽

Lokendra V. Chauhan ◽

...

Keyword(s):

Antimicrobial Resistance ◽

High Throughput ◽

Bacterial Infections ◽

High Throughput Sequencing ◽

Limit Of Detection ◽

Bacterial Species ◽

Fold Increase ◽

Virulence Determinants ◽

Antimicrobial Resistance Genes

ABSTRACT We report a platform that increases the sensitivity of high-throughput sequencing for detection and characterization of bacteria, virulence determinants, and antimicrobial resistance (AMR) genes. The system uses a probe set comprised of 4.2 million oligonucleotides based on the Pathosystems Resource Integration Center (PATRIC) database, the Comprehensive Antibiotic Resistance Database (CARD), and the Virulence Factor Database (VFDB), representing 307 bacterial species that include all known human-pathogenic species, known antimicrobial resistance genes, and known virulence factors, respectively. The use of bacterial capture sequencing (BacCapSeq) resulted in an up to 1,000-fold increase in bacterial reads from blood samples and lowered the limit of detection by 1 to 2 orders of magnitude compared to conventional unbiased high-throughput sequencing, down to a level comparable to that of agent-specific real-time PCR with as few as 5 million total reads generated per sample. It detected not only the presence of AMR genes but also biomarkers for AMR that included both constitutive and differentially expressed transcripts. IMPORTANCE BacCapSeq is a method for differential diagnosis of bacterial infections and defining antimicrobial sensitivity profiles that has the potential to reduce morbidity and mortality, health care costs, and the inappropriate use of antibiotics that contributes to the development of antimicrobial resistance.

Download Full-text

DisCVR: Rapid viral diagnosis from high-throughput sequencing data

10.1101/527127 ◽

2019 ◽

Author(s):

Maha Maabar ◽

Andrew J. Davison ◽

Massimo Palmarini ◽

Joseph Hughes

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Clinical Sample ◽

High Sensitivity ◽

Clinical Samples ◽

Upper Respiratory Tract ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Tract Infections ◽

Human Viruses

AbstractHigh-throughput sequencing (HTS) enables most pathogens in a clinical sample to be detected from a single analysis, thereby providing novel opportunities for diagnosis, surveillance and epidemiology. However, this powerful technology is difficult to apply in diagnostic laboratories because of its computational and bioinformatic demands. We have developed DisCVR, which detects known human viruses in clinical samples by matching sample k-mers (22 nucleotide sequences) to k-mers from taxonomically labelled viral genomes. DisCVR was validated using published HTS data for 89 clinical samples from adults with upper respiratory tract infections. These samples had been tested for viruses metagenomically and also by real-time polymerase chain reaction assay, which is the standard diagnostic method. DisCVR detected human viruses with high sensitivity (79%) and specificity (100%), and was able to detect mixed infections. Moreover, it produced results comparable to those in a published metagenomic analysis of 177 blood samples from patients in Nigeria. DisCVR has been designed as a user-friendly tool for detecting human viruses from HTS data using computers with limited RAM and processing power, and includes a graphical user interface to help users interpret and validate the output. It is written in Java and is publicly available from http://bioinformatics.cvr.ac.uk/discvr.php.Issue SectionResources

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Emerging Options for the Diagnosis of Bacterial Infections and the Characterization of Antimicrobial Resistance

International Journal of Molecular Sciences ◽

10.3390/ijms22010456 ◽

2021 ◽

Vol 22 (1) ◽

pp. 456

Author(s):

Simone Rentschler ◽

Lars Kaiser ◽

Hans-Peter Deigner

Keyword(s):

Antimicrobial Resistance ◽

Bacterial Infections ◽

Point Of Care ◽

Rapid Identification ◽

Adequate Treatment ◽

Resistance Patterns ◽

Detection And Diagnosis ◽

Patient Prognosis ◽

Point Of Care Detection

Precise and rapid identification and characterization of pathogens and antimicrobial resistance patterns are critical for the adequate treatment of infections, which represent an increasing problem in intensive care medicine. The current situation remains far from satisfactory in terms of turnaround times and overall efficacy. Application of an ineffective antimicrobial agent or the unnecessary use of broad-spectrum antibiotics worsens the patient prognosis and further accelerates the generation of resistant mutants. Here, we provide an overview that includes an evaluation and comparison of existing tools used to diagnose bacterial infections, together with a consideration of the underlying molecular principles and technologies. Special emphasis is placed on emerging developments that may lead to significant improvements in point of care detection and diagnosis of multi-resistant pathogens, and new directions that may be used to guide antibiotic therapy.

Download Full-text

INSIGHT: A population-scale COVID-19 testing strategy combining point-of-care diagnosis with centralized high-throughput sequencing

Science Advances ◽

10.1126/sciadv.abe5054 ◽

2021 ◽

Vol 7 (7) ◽

pp. eabe5054

Author(s):

Qianxin Wu ◽

Chenqu Suo ◽

Tom Brown ◽

Tengyao Wang ◽

Sarah A. Teichmann ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Limit Of Detection ◽

Point Of Care ◽

Population Level ◽

Reaction Products ◽

Testing Strategy ◽

Asymptomatic Patients ◽

Testing Stage ◽

Population Scale

We present INSIGHT [isothermal NASBA (nucleic acid sequence–based amplification) sequencing–based high-throughput test], a two-stage coronavirus disease 2019 testing strategy, using a barcoded isothermal NASBA reaction. It combines point-of-care diagnosis with next-generation sequencing, aiming to achieve population-scale testing. Stage 1 allows a quick decentralized readout for early isolation of presymptomatic or asymptomatic patients. It gives results within 1 to 2 hours, using either fluorescence detection or a lateral flow readout, while simultaneously incorporating sample-specific barcodes. The same reaction products from potentially hundreds of thousands of samples can then be pooled and used in a highly multiplexed sequencing–based assay in stage 2. This second stage confirms the near-patient testing results and facilitates centralized data collection. The 95% limit of detection is <50 copies of viral RNA per reaction. INSIGHT is suitable for further development into a rapid home-based, point-of-care assay and is potentially scalable to the population level.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text