scholarly journals TETyper: a bioinformatic pipeline for classifying variation and genetic contexts of transposable elements from short-read whole-genome sequencing data

2018 ◽  
Author(s):  
Anna E Sheppard ◽  
Nicole Stoesser ◽  
Ian German-Mesner ◽  
Kasi Vegesana ◽  
A Sarah Walker ◽  
...  

ABSTRACTMuch of the worldwide dissemination of antibiotic resistance has been driven by resistance gene associations with mobile genetic elements (MGEs), such as plasmids and transposons. Although increasing, our understanding of resistance spread remains relatively limited, as methods for tracking mobile resistance genes through multiple species, strains and plasmids are lacking. We have developed a bioinformatic pipeline for tracking variation within, and mobility of, specific transposable elements (TEs), such as transposons carrying antibiotic resistance genes. TETyper takes short-read whole-genome sequencing data as input and identifies single-nucleotide mutations and deletions within the TE of interest, to enable tracking of specific sequence variants, as well as the surrounding genetic context(s), to enable identification of transposition events. To investigate global dissemination of Klebsiella pneumoniae carbapenemase (KPC) and its associated transposon Tn4401, we applied TETyper to a collection of >3000 publicly available Illumina datasets containing blaKPC. This revealed surprising diversity, with >200 distinct flanking genetic contexts for Tn4401, indicating high levels of transposition. Integration of sample metadata revealed insights into associations between geographic locations, host species, Tn4401 sequence variants and flanking genetic contexts. To demonstrate the ability of TETyper to cope with high copy number TEs and to track specific short-term evolutionary changes, we also applied it to the insertion sequence IS26 within a defined K. pneumoniae outbreak. TETyper is implemented in python and is freely available at https://github.com/aesheppard/TETyper.

2019 ◽  
Vol 17 (2) ◽  
pp. 169-182 ◽  
Author(s):  
Valentina Galata ◽  
Cédric C. Laczny ◽  
Christina Backes ◽  
Georg Hemmrich-Stanisak ◽  
Susanne Schmolke ◽  
...  

2020 ◽  
Vol 35 (9) ◽  
pp. 1675-1679
Author(s):  
Haloom Rafehi ◽  
David J. Szmulewicz ◽  
Kate Pope ◽  
Mathew Wallis ◽  
John Christodoulou ◽  
...  

2017 ◽  
Vol 55 (5) ◽  
pp. 1446-1453 ◽  
Author(s):  
Alex Marchand-Austin ◽  
Raymond S. W. Tsang ◽  
Jennifer L. Guthrie ◽  
Jennifer H. Ma ◽  
Gillian H. Lim ◽  
...  

ABSTRACTBordetella pertussisis a Gram-negative bacterium that causes respiratory infections in humans. Ongoing molecular surveillance ofB. pertussisacellular vaccine (aP) antigens is critical for understanding the interaction between evolutionary pressures, disease pathogenesis, and vaccine effectiveness. Methods currently used to characterize aP components are relatively labor-intensive and low throughput. To address this challenge, we sought to derive aP antigen genotypes from minimally processed short-read whole-genome sequencing data generated from 40 clinicalB. pertussisisolates and analyzed using the SRST2 bioinformatic package. SRST2 was able to identify aP antigen genotypes for all antigens with the exception of pertactin, possibly due to low read coverage in GC-rich low-complexity regions of variation. Two main genotypes were observed in addition to a singular third genotype that contained an 84-bp deletion that was identified by SRST2 despite the issues in allele calling. This method has the potential to generate large pools ofB. pertussismolecular data that can be linked to clinical and epidemiological information to facilitate research of vaccine effectiveness and disease severity in the context of emerging vaccine antigen-deficient strains.


2016 ◽  
Vol 16 (1) ◽  
Author(s):  
Taryn B. T. Athey ◽  
Sarah Teatero ◽  
Sonia Lacouture ◽  
Daisuke Takamatsu ◽  
Marcelo Gottschalk ◽  
...  

2020 ◽  
Author(s):  
Hannes P. Eggertsson ◽  
Bjarni V. Halldorsson

AbstractMotivationData analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.ResultsIn human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data.Availabilitygithub.com/DecodeGenetics/[email protected]


Author(s):  
Louise Gade Dahl ◽  
Katrine Grimstrup Joensen ◽  
Mark Thomas Østerlund ◽  
Kristoffer Kiil ◽  
Eva Møller Nielsen

Abstract Campylobacter jejuni is recognised as the leading cause of bacterial gastroenteritis in industrialised countries. Although the majority of Campylobacter infections are self-limiting, antimicrobial treatment is necessary in severe cases. Therefore, the development of antimicrobial resistance (AMR) in Campylobacter is a growing public health challenge and surveillance of AMR is important for bacterial disease control. The aim of this study was to predict antimicrobial resistance in C. jejuni from whole-genome sequencing data. A total of 516 clinical C. jejuni isolates collected between 2014 and 2017 were subjected to WGS. Resistance phenotypes were determined by standard broth dilution, categorising isolates as either susceptible or resistant based on epidemiological cutoffs for six antimicrobials: ciprofloxacin, nalidixic acid, erythromycin, gentamicin, streptomycin, and tetracycline. Resistance genotypes were identified using an in-house database containing reference genes with known point mutations and the presence of resistance genes was determined using the ResFinder database and four bioinformatical methods (modified KMA, ABRicate, ARIBA, and ResFinder Batch Upload). We identified seven resistance genes including tet(O), tet(O/32/O), ant(6)-Ia, aph(2″)-If, blaOXA, aph(3′)-III, and cat as well as mutations in three genes: gyrA, 23S rRNA, and rpsL. There was a high correlation between phenotypic resistance and the presence of known resistance genes and/or point mutations. A correlation above 98% was seen for all antimicrobials except streptomycin with a correlation of 92%. In conclusion, we found that WGS can predict antimicrobial resistance with a high degree of accuracy and have the potential to be a powerful tool for AMR surveillance.


Author(s):  
Hannes P Eggertsson ◽  
Bjarni V Halldorsson

Abstract Motivation Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology. Results In human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data. Availabilityand implementation github.com/DecodeGenetics/read_haps. Contact [email protected]


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 486 ◽  
Author(s):  
Adam Ameur ◽  
Huiwen Che ◽  
Marcel Martin ◽  
Ignas Bunikis ◽  
Johan Dahlberg ◽  
...  

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.


Sign in / Sign up

Export Citation Format

Share Document