TETyper: a bioinformatic pipeline for classifying variation and genetic contexts of transposable elements from short-read whole-genome sequencing data

Mapping Intimacies ◽

10.1101/288001 ◽

2018 ◽

Author(s):

Anna E Sheppard ◽

Nicole Stoesser ◽

Ian German-Mesner ◽

Kasi Vegesana ◽

A Sarah Walker ◽

...

Keyword(s):

Antibiotic Resistance ◽

Transposable Elements ◽

Genome Sequencing ◽

Resistance Genes ◽

Whole Genome Sequencing Data ◽

Sequence Variants ◽

Whole Genome ◽

Sequencing Data ◽

Bioinformatic Pipeline ◽

Short Read

ABSTRACTMuch of the worldwide dissemination of antibiotic resistance has been driven by resistance gene associations with mobile genetic elements (MGEs), such as plasmids and transposons. Although increasing, our understanding of resistance spread remains relatively limited, as methods for tracking mobile resistance genes through multiple species, strains and plasmids are lacking. We have developed a bioinformatic pipeline for tracking variation within, and mobility of, specific transposable elements (TEs), such as transposons carrying antibiotic resistance genes. TETyper takes short-read whole-genome sequencing data as input and identifies single-nucleotide mutations and deletions within the TE of interest, to enable tracking of specific sequence variants, as well as the surrounding genetic context(s), to enable identification of transposition events. To investigate global dissemination of Klebsiella pneumoniae carbapenemase (KPC) and its associated transposon Tn4401, we applied TETyper to a collection of >3000 publicly available Illumina datasets containing blaKPC. This revealed surprising diversity, with >200 distinct flanking genetic contexts for Tn4401, indicating high levels of transposition. Integration of sample metadata revealed insights into associations between geographic locations, host species, Tn4401 sequence variants and flanking genetic contexts. To demonstrate the ability of TETyper to cope with high copy number TEs and to track specific short-term evolutionary changes, we also applied it to the insertion sequence IS26 within a defined K. pneumoniae outbreak. TETyper is implemented in python and is freely available at https://github.com/aesheppard/TETyper.

Download Full-text

TETyper: a bioinformatic pipeline for classifying variation and genetic contexts of transposable elements from short-read whole-genome sequencing data

Microbial Genomics ◽

10.1099/mgen.0.000232 ◽

2018 ◽

Vol 4 (12) ◽

Cited By ~ 7

Author(s):

Anna E. Sheppard ◽

Nicole Stoesser ◽

Ian German-Mesner ◽

Kasi Vegesana ◽

A. Sarah Walker ◽

...

Keyword(s):

Transposable Elements ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Bioinformatic Pipeline ◽

Short Read

Download Full-text

Integrating Culture-based Antibiotic Resistance Profiles with Whole-genome Sequencing Data for 11,087 Clinical Isolates

Genomics Proteomics & Bioinformatics ◽

10.1016/j.gpb.2018.11.002 ◽

2019 ◽

Vol 17 (2) ◽

pp. 169-182 ◽

Cited By ~ 2

Author(s):

Valentina Galata ◽

Cédric C. Laczny ◽

Christina Backes ◽

Georg Hemmrich-Stanisak ◽

Susanne Schmolke ◽

...

Keyword(s):

Antibiotic Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Clinical Isolates ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Rapid Diagnosis of Spinocerebellar Ataxia 36 in a Three‐Generation Family Using Short‐Read Whole‐Genome Sequencing Data

Movement Disorders ◽

10.1002/mds.28105 ◽

2020 ◽

Vol 35 (9) ◽

pp. 1675-1679

Author(s):

Haloom Rafehi ◽

David J. Szmulewicz ◽

Kate Pope ◽

Mathew Wallis ◽

John Christodoulou ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Spinocerebellar Ataxia ◽

Genome Sequencing ◽

Rapid Diagnosis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Generation Family

Download Full-text

Short-Read Whole-Genome Sequencing for Laboratory-Based Surveillance of Bordetella pertussis

Journal of Clinical Microbiology ◽

10.1128/jcm.02436-16 ◽

2017 ◽

Vol 55 (5) ◽

pp. 1446-1453 ◽

Cited By ~ 1

Author(s):

Alex Marchand-Austin ◽

Raymond S. W. Tsang ◽

Jennifer L. Guthrie ◽

Jennifer H. Ma ◽

Gillian H. Lim ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Bordetella Pertussis ◽

Vaccine Effectiveness ◽

Vaccine Antigen ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Content Type

ABSTRACTBordetella pertussisis a Gram-negative bacterium that causes respiratory infections in humans. Ongoing molecular surveillance ofB. pertussisacellular vaccine (aP) antigens is critical for understanding the interaction between evolutionary pressures, disease pathogenesis, and vaccine effectiveness. Methods currently used to characterize aP components are relatively labor-intensive and low throughput. To address this challenge, we sought to derive aP antigen genotypes from minimally processed short-read whole-genome sequencing data generated from 40 clinicalB. pertussisisolates and analyzed using the SRST2 bioinformatic package. SRST2 was able to identify aP antigen genotypes for all antigens with the exception of pertactin, possibly due to low read coverage in GC-rich low-complexity regions of variation. Two main genotypes were observed in addition to a singular third genotype that contained an 84-bp deletion that was identified by SRST2 despite the issues in allele calling. This method has the potential to generate large pools ofB. pertussismolecular data that can be linked to clinical and epidemiological information to facilitate research of vaccine effectiveness and disease severity in the context of emerging vaccine antigen-deficient strains.

Download Full-text

Genome-Wide Identification of Microsatellites and Transposable Elements in the Dromedary Camel Genome Using Whole-Genome Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2019.00692 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 1

Author(s):

Reza Khalkhali-Evrigh ◽

Nemat Hedayat-Evrigh ◽

Seyed Hasan Hafezian ◽

Ayoub Farhadi ◽

Mohammad Reza Bakhtiarizadeh

Keyword(s):

Transposable Elements ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Dromedary Camel ◽

Whole Genome ◽

Sequencing Data ◽

Genome Wide

Download Full-text

Determining Streptococcus suis serotype from short-read whole-genome sequencing data

BMC Microbiology ◽

10.1186/s12866-016-0782-8 ◽

2016 ◽

Vol 16 (1) ◽

Cited By ~ 27

Author(s):

Taryn B. T. Athey ◽

Sarah Teatero ◽

Sonia Lacouture ◽

Daisuke Takamatsu ◽

Marcelo Gottschalk ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Streptococcus Suis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Suis Serotype

Download Full-text

read_haps: using read haplotypes to detect same species contamination in DNA sequences

10.1101/2020.02.11.941773 ◽

2020 ◽

Author(s):

Hannes P. Eggertsson ◽

Bjarni V. Halldorsson

Keyword(s):

Data Analysis ◽

Genome Sequencing ◽

Dna Sequences ◽

Diploid Species ◽

Reliable Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Polymorphic Snps

AbstractMotivationData analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.ResultsIn human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data.Availabilitygithub.com/DecodeGenetics/[email protected]

Download Full-text

Prediction of antimicrobial resistance in clinical Campylobacter jejuni isolates from whole-genome sequencing data

European Journal of Clinical Microbiology & Infectious Diseases ◽

10.1007/s10096-020-04043-y ◽

2020 ◽

Author(s):

Louise Gade Dahl ◽

Katrine Grimstrup Joensen ◽

Mark Thomas Østerlund ◽

Kristoffer Kiil ◽

Eva Møller Nielsen

Keyword(s):

Antimicrobial Resistance ◽

Whole Genome Sequencing ◽

Campylobacter Jejuni ◽

Genome Sequencing ◽

Resistance Genes ◽

Point Mutations ◽

23S Rrna ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Abstract Campylobacter jejuni is recognised as the leading cause of bacterial gastroenteritis in industrialised countries. Although the majority of Campylobacter infections are self-limiting, antimicrobial treatment is necessary in severe cases. Therefore, the development of antimicrobial resistance (AMR) in Campylobacter is a growing public health challenge and surveillance of AMR is important for bacterial disease control. The aim of this study was to predict antimicrobial resistance in C. jejuni from whole-genome sequencing data. A total of 516 clinical C. jejuni isolates collected between 2014 and 2017 were subjected to WGS. Resistance phenotypes were determined by standard broth dilution, categorising isolates as either susceptible or resistant based on epidemiological cutoffs for six antimicrobials: ciprofloxacin, nalidixic acid, erythromycin, gentamicin, streptomycin, and tetracycline. Resistance genotypes were identified using an in-house database containing reference genes with known point mutations and the presence of resistance genes was determined using the ResFinder database and four bioinformatical methods (modified KMA, ABRicate, ARIBA, and ResFinder Batch Upload). We identified seven resistance genes including tet(O), tet(O/32/O), ant(6)-Ia, aph(2″)-If, blaOXA, aph(3′)-III, and cat as well as mutations in three genes: gyrA, 23S rRNA, and rpsL. There was a high correlation between phenotypic resistance and the presence of known resistance genes and/or point mutations. A correlation above 98% was seen for all antimicrobials except streptomycin with a correlation of 92%. In conclusion, we found that WGS can predict antimicrobial resistance with a high degree of accuracy and have the potential to be a powerful tool for AMR surveillance.

Download Full-text

read_haps: using read haplotypes to detect same species contamination in DNA sequences

Bioinformatics ◽

10.1093/bioinformatics/btaa936 ◽

2020 ◽

Author(s):

Hannes P Eggertsson ◽

Bjarni V Halldorsson

Keyword(s):

Data Analysis ◽

Genome Sequencing ◽

Dna Sequences ◽

Diploid Species ◽

Reliable Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Polymorphic Snps

Abstract Motivation Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology. Results In human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data. Availabilityand implementation github.com/DecodeGenetics/read_haps. Contact [email protected]

Download Full-text

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

Genes ◽

10.3390/genes9100486 ◽

2018 ◽

Vol 9 (10) ◽

pp. 486 ◽

Cited By ~ 22

Author(s):

Adam Ameur ◽

Huiwen Che ◽

Marcel Martin ◽

Ignas Bunikis ◽

Johan Dahlberg ◽

...

Keyword(s):

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Variant Calling ◽

Whole Genome Sequencing Data ◽

Personal Genome ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Population Scale

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

Download Full-text