Nonsynonymous Polymorphism Counts in Bacterial Genomes: a Comparative Examination

ABSTRACT Genomic data reveal single-nucleotide polymorphisms (SNPs) that may carry information about the evolutionary history of bacteria. However, it remains unclear what inferences about selection can be made from genomic SNP data. Bacterial species are often sampled during epidemic outbreaks or within hosts during the course of chronic infections. SNPs obtained from genomic analysis of these data are not necessarily fixed. Treating them as fixed during analysis by using measures such as the ratio of nonsynonymous to synonymous evolutionary changes (dN/dS) may lead to incorrect inferences about the strength and direction of selection. In this study, we consider data from a range of whole-genome sequencing studies of bacterial pathogens and explore patterns of nonsynonymous variation to assess whether evidence of selection can be identified by investigating SNP counts alone across multiple WGS studies. We visualize these SNP data in ways that highlight their relationship to neutral baseline expectations. These neutral expectations are based on a simple model of mutation, from which we simulate SNP accumulation to investigate how SNP counts are distributed under alternative assumptions about positive and negative selection. We compare these patterns with empirical SNP data and illustrate the general difficulty of detecting positive selection from SNP data. Finally, we consider whether SNP counts observed at the between-host population level diﬀer from those observed at the within-host level and find some evidence that suggests that dynamics across these two scales are driven by diﬀerent underlying processes. IMPORTANCE Identifying selection from SNP data obtained from whole-genome sequencing studies is challenging. Some current measures used to identify and quantify selection acting on genomes rely on fixed diﬀerences; thus, these are inappropriate for SNP data where variants are not fixed. With the increase in whole-genome sequencing studies, it is important to consider SNP data in the context of evolutionary processes. How SNPs are counted and analyzed can help in understanding mutation accumulation and trajectories of strains. We developed a tool for identifying possible evidence of selection and for comparative analysis with other SNP data. We propose a model that provides a rule-of-thumb guideline and two new visualization techniques that can be used to interpret and compare SNP data. We quantify the expected proportion of nonsynonymous SNPs in coding regions under neutrality and demonstrate its use in identifying evidence of positive and negative selection from simulations and empirical data.

Download Full-text

Salmonella entericaPhylogeny Based on Whole-Genome Sequencing Reveals Two New Clades and Novel Patterns of Horizontally Acquired Genetic Elements

mBio ◽

10.1128/mbio.02303-18 ◽

2018 ◽

Vol 9 (6) ◽

Cited By ~ 24

Author(s):

Jay Worley ◽

Jianghong Meng ◽

Marc W. Allard ◽

Eric W. Brown ◽

Ruth E. Timme

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Foodborne Pathogens ◽

Bacterial Species ◽

Whole Genome Sequence ◽

Nucleotide Sequencing ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Content Type ◽

Genetic Elements

ABSTRACTUsing whole-genome sequence (WGS) data from the GenomeTrakr network, a globally distributed network of laboratories sequencing foodborne pathogens, we present a new phylogeny ofSalmonella entericacomprising 445 isolates from 266 distinct serovars and originating from 52 countries. This phylogeny includes two previously unidentifiedS. entericasubsp.entericaclades. Serovar Typhi is shown to be nested within clade A. Our findings are supported by both phylogenetic support, based on a core genome alignment, and Bayesian approaches, based on single-nucleotide polymorphisms. Serovar assignments were refined byin silicoanalysis using SeqSero. More than 10% of serovars were either polyphyletic or paraphyletic. We found variable genetic content in these isolates relating to gene mobilization and virulence factors which have different distributions within clades. Gifsy-1- and Gifsy-2-like phages appear more prevalent in clade A; other viruses are more evenly distributed. Our analyses reveal IncFII is the predominant plasmid replicon inS. enterica. Few core or clade-defining virulence genes are observed, and their distributions appear probabilistic in nature. Together, these patterns demonstrate that genetic exchange withinS. entericais more extensive and frequent than previously realized, which significantly alters how we view the genetic structure of the bacterial species.IMPORTANCERapid improvements in nucleotide sequencing access and affordability have led to a drastic increase in availability of genetic information. This information will improve the accuracy of molecular descriptions, including serovars, withinS. enterica. Although the concept of serovars continues to be useful, it may have more significant limitations than previously understood. Furthermore, the discrete absence or presence of specific genes can be an unstable indicator of phylogenetic identity. Whole-genome sequencing provides more rigorous tools for assessing the distributions of these genes. Our phylogenetic and genetic content analyses reveal how active genetic elements are dynamically distributed within a species, allowing us to better understand genetic reservoirs and underlying bacterial evolution.

Download Full-text

Whole-genome sequencing reveals rare off-target mutations in CRISPR/Cas9-edited grapevine

Horticulture Research ◽

10.1038/s41438-021-00549-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Xianhang Wang ◽

Mingxing Tu ◽

Ya Wang ◽

Wuchen Yin ◽

Yu Zhang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Editing ◽

Genome Sequencing ◽

Plant Biotechnology ◽

High Specificity ◽

Fruit Trees ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Indel Mutation ◽

Target Sites

AbstractThe CRISPR (clustered regularly interspaced short palindromic repeats)-associated protein 9 (Cas9) system is a powerful tool for targeted genome editing, with applications that include plant biotechnology and functional genomics research. However, the specificity of Cas9 targeting is poorly investigated in many plant species, including fruit trees. To assess the off-target mutation rate in grapevine (Vitis vinifera), we performed whole-genome sequencing (WGS) of seven Cas9-edited grapevine plants in which one of two genes was targeted by CRISPR/Cas9 and three wild-type (WT) plants. In total, we identified between 202,008 and 272,397 single nucleotide polymorphisms (SNPs) and between 26,391 and 55,414 insertions/deletions (indels) in the seven Cas9-edited grapevine plants compared with the three WT plants. Subsequently, 3272 potential off-target sites were selected for further analysis. Only one off-target indel mutation was identified from the WGS data and validated by Sanger sequencing. In addition, we found 243 newly generated off-target sites caused by genetic variants between the Thompson Seedless cultivar and the grape reference genome (PN40024) but no true off-target mutations. In conclusion, we observed high specificity of CRISPR/Cas9 for genome editing of grapevine.

Download Full-text

Risk prediction and marker selection in nonsynonymous single nucleotide polymorphisms using whole genome sequencing data

Animal Cells and Systems ◽

10.1080/19768354.2020.1860125 ◽

2020 ◽

Vol 24 (6) ◽

pp. 321-328

Author(s):

Young-Sup Lee ◽

KyeongHye Won ◽

Donghyun Shin ◽

Jae-Don Oh

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Risk Prediction ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Marker Selection

Download Full-text

A common protocol for the simultaneous processing of multiple clinically relevant bacterial species for whole genome sequencing

Scientific Reports ◽

10.1038/s41598-020-80031-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kathy E. Raven ◽

Sophia T. Girgis ◽

Asha Akram ◽

Beth Blane ◽

Danielle Leek ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Dna Extraction ◽

Genome Sequencing ◽

Clinical Microbiology ◽

Bacterial Species ◽

Outbreak Detection ◽

Whole Genome ◽

Library Preparation ◽

Simultaneous Processing ◽

Antimicrobial Resistance Gene

AbstractWhole-genome sequencing is likely to become increasingly used by local clinical microbiology laboratories, where sequencing volume is low compared with national reference laboratories. Here, we describe a universal protocol for simultaneous DNA extraction and sequencing of numerous different bacterial species, allowing mixed species sequence runs to meet variable laboratory demand. We assembled test panels representing 20 clinically relevant bacterial species. The DNA extraction process used the QIAamp mini DNA kit, to which different combinations of reagents were added. Thereafter, a common protocol was used for library preparation and sequencing. The addition of lysostaphin, lysozyme or buffer ATL (a tissue lysis buffer) alone did not produce sufficient DNA for library preparation across the species tested. By contrast, lysozyme plus lysostaphin produced sufficient DNA across all 20 species. DNA from 15 of 20 species could be extracted from a 24-h culture plate, while the remainder required 48–72 h. The process demonstrated 100% reproducibility. Sequencing of the resulting DNA was used to recapitulate previous findings for species, outbreak detection, antimicrobial resistance gene detection and capsular type. This single protocol for simultaneous processing and sequencing of multiple bacterial species supports low volume and rapid turnaround time by local clinical microbiology laboratories.

Download Full-text

Fast genetic mapping using insertion-deletion polymorphisms in Caenorhabditis elegans

Scientific Reports ◽

10.1038/s41598-021-90190-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ho-Yon Hwang ◽

Jiou Wang

Keyword(s):

Caenorhabditis Elegans ◽

Genetic Mapping ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Material ◽

Mapping Method ◽

Forward Genetics ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Large Populations

AbstractGenetic mapping is used in forward genetics to narrow the list of candidate mutations and genes corresponding to the mutant phenotype of interest. Even with modern advances in biology such as efficient identification of candidate mutations by whole-genome sequencing, mapping remains critical in pinpointing the responsible mutation. Here we describe a simple, fast, and affordable mapping toolkit that is particularly suitable for mapping in Caenorhabditis elegans. This mapping method uses insertion-deletion polymorphisms or indels that could be easily detected instead of single nucleotide polymorphisms in commonly used Hawaiian CB4856 mapping strain. The materials and methods were optimized so that mapping could be performed using tiny amount of genetic material without growing many large populations of mutants for DNA purification. We performed mapping of previously known and unknown mutations to show strengths and weaknesses of this method and to present examples of completed mapping. For situations where Hawaiian CB4856 is unsuitable, we provide an annotated list of indels as a basis for fast and easy mapping using other wild isolates. Finally, we provide rationale for using this mapping method over other alternatives as a part of a comprehensive strategy also involving whole-genome sequencing and other methods.

Download Full-text

Mycobacterium chimaera genomics with regard to epidemiological and clinical investigations conducted for the open-chest post-surgical Mycobacterium chimaera infections outbreak

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab192 ◽

2021 ◽

Author(s):

Emmanuel Lecorche ◽

Côme Daniau ◽

Kevin La ◽

Faiza Mougari ◽

Hanaa Benmansour ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Clinical Isolates ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Healthcare Facilities ◽

Open Chest

Abstract Background Post-surgical infections due to Mycobacterium chimaera appeared as a novel nosocomial threat in 2015, with a worldwide outbreak due to contaminated heater-cooler units used in open chest surgery. We report the results of investigations conducted in France including whole genome sequencing comparison of patient and HCU isolates. Methods We sought M. chimaera infection cases from 2010 onwards through national epidemiological investigations in healthcare facilities performing cardiopulmonary bypass together with a survey on good practices and systematic heater-cooler unit microbial analyses. Clinical and HCU isolates were subjected to whole genome sequencing analyzed with regards to the reference outbreak strain Zuerich-1. Results Only two clinical cases were shown to be related to the outbreak, although 23% (41/175) heater-cooler units were declared positive for M. avium complex. Specific measures to prevent infection were applied in 89% (50/56) healthcare facilities although only 14% (8/56) of them followed the manufacturer maintenance recommendations. Whole genome sequencing comparison showed that the clinical isolates and 72% (26/36) of heater-cooler unit isolates belonged to the epidemic cluster. Within clinical isolates, 5 to 9 non-synonymous single nucleotide polymorphisms were observed, among which an in vivo mutation in a putative efflux pump gene observed in a clinical isolate obtained for one patient under antimicrobial treatment. Conclusions Cases of post-surgical M. chimaera infections were declared to be rare in France, although heater-cooler units were contaminated as in other countries. Genomic analyses confirmed the connection to the outbreak and identified specific single nucleotide polymorphisms, including one suggesting fitness evolution in vivo.

Download Full-text

Whole-Genome Sequencing for Bacterial Strain Typing Using the iSeq100 Platform

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.1098 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s434-s434

Author(s):

Grant Vestal ◽

Steven Bruzek ◽

Amanda Lasher ◽

Amorce Lima ◽

Suzane Silbert

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Outbreak Detection ◽

Epidemiological Surveillance ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Dna Libraries ◽

Patient Health ◽

Hospital Acquired ◽

Reference Genomes

Background: Hospital-acquired infections pose a significant threat to patient health. Laboratories are starting to consider whole-genome sequencing (WGS) as a molecular method for outbreak detection and epidemiological surveillance. The objective of this study was to assess the use of the iSeq100 platform (Illumina, San Diego, CA) for accurate sequencing and WGS-based outbreak detection using the bioMérieux EPISEQ CS, a novel cloud-based software for sequence assembly and data analysis. Methods: In total, 25 isolates, including 19 MRSA isolates and 6 ATCC strains were evaluated in this study: A. baumannii ATCC 19606, B. cepacia ATCC 25416, E. faecalis ATCC 29212, E. coli ATCC 25922, P. aeruginosa ATCC 27853 and S. aureus ATCC 25923. DNA extraction of all isolates was performed on the QIAcube (Qiagen, Hilden, Germany) using the DNEasy Ultra Clean Microbial kit extraction protocol. DNA libraries were prepared for WGS using the Nextera DNA Flex Library Prep Kit (Illumina) and sequenced at 2×150-bp on the iSeq100 according to the manufacturer’s instructions. The 19 MRSA isolates were previously characterized by the DiversiLab system (bioMérieux, France). Upon validation of the iSeq100 platform, a new outbreak analysis was performed using WGS analysis using EPISEQ CS. ATCC sequences were compared to assembled reference genomes from the NCBI GenBank to assess the accuracy of the iSeq100 platform. The FASTQ files were aligned via BowTie2 version 2.2.6 software, using default parameters, and FreeBayes version 1.1.0.46-0 was used to call homozygous single-nucleotide polymorphisms (SNPs) with a minimum coverage of 5 and an allele frequency of 0.87 using default parameters. ATCC sequences were analyzed using ResFinder version 3.2 and were compared in silico to the reference genome. Results: EPISEQ CS classified 8 MRSA isolates as unrelated and grouped 11 isolates into 2 separate clusters: cluster A (5 isolates) and cluster B (6 isolates) with similarity scores of ≥99.63% and ≥99.50%, respectively. This finding contrasted with the previous characterization by DiversiLab, which identified 3 clusters of 2, 8, and 11 isolates, respectively. The EPISEQ CS resistome data detected the mecA gene in 18 of 19 MRSA isolates. Comparative analysis of the ATCCsequences to the reference genomes showed 99.9986% concordance of SNPs and 100.00% concordance between the resistance genes present. Conclusions: The iSeq100 platform accurately sequenced the bacterial isolates and could be an affordable alternative in conjunction with EPISEQ CS for epidemiological surveillance analysis and infection prevention.Funding: NoneDisclosures: None

Download Full-text

Transmission of ESBL-producing Enterobacteriaceae and their mobile genetic elements—identification of sources by whole genome sequencing: study protocol for an observational study in Switzerland

BMJ Open ◽

10.1136/bmjopen-2018-021823 ◽

2018 ◽

Vol 8 (2) ◽

pp. e021823 ◽

Cited By ~ 12

Author(s):

Tanja Stadler ◽

Dominik Meinel ◽

Lisandra Aguilar-Bultet ◽

Jana S Huisman ◽

Ruth Schindler ◽

...

Keyword(s):

Observational Study ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Bacterial Species ◽

Mobile Genetic Elements ◽

University Hospital ◽

Whole Genome ◽

Hospital Acquired Infections ◽

Genetic Elements ◽

Hospital Acquired

IntroductionExtended-spectrum beta-lactamases (ESBL)-producing Enterobacteriaceae were first described in relation with hospital-acquired infections. In the 2000s, the epidemiology of ESBL-producing organisms changed as especially ESBL-producingEscherichia coliwas increasingly described as an important cause of community-acquired infections, supporting the hypothesis that in more recent years ESBL-producing Enterobacteriaceae have probably been imported into hospitals rather than vice versa. Transmission of ESBL-producing Enterobacteriaceae is complicated by ESBL genes being encoded on self-transmissible plasmids, which can be exchanged among the same and different bacterial species. The aim of this research project is to quantify hospital-wide transmission of ESBL-producing Enterobacteriaceae on both the level of bacterial species and the mobile genetic elements and to determine if hospital-acquired infections caused by ESBL producers are related to strains and mobile genetic elements predominantly circulating in the community or in the healthcare setting. This distinction is critical in prevention since the former emphasises the urgent need to establish or reinforce antibiotic stewardship programmes, and the latter would call for more rigorous infection control.Methods and analysisThis protocol presents an observational study that will be performed at the University Hospital Basel and in the city of Basel, Switzerland. ESBL-producing Enterobacteriaceae will be collected from any specimens obtained by routine clinical practice or by active screening in both inpatient and outpatient settings, as well as from wastewater samples and foodstuffs, both collected monthly over a 12-month period for analyses by whole genome sequencing. Bacterial chromosomal, plasmid and ESBL-gene sequences will be compared within the cohort to determine genetic relatedness and migration between humans and their environment.Ethics and disseminationThis study has been approved by the local ethics committee (Ethikkommission Nordwest-und Zentralschweiz) as a quality control project (Project-ID 2017–00100). The results of this study will be published in peer-reviewed medical journals, communicated to participants, the general public and all relevant stakeholders.

Download Full-text

Transmission of Multidrug/Rifampicin-Resistant Mycobacterium Tuberculosis in Chongqing, China: A Retrospective Observational Study Using Whole-Genome Sequencing

10.21203/rs.3.rs-717466/v1 ◽

2021 ◽

Author(s):

Bing Zhao ◽

Chunfa Liu ◽

Jiale Fan ◽

Aijing Ma ◽

Wencong He ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Resistance Mutations ◽

Treatment Information ◽

Recent Transmission ◽

Drugs Analysis ◽

Lineage 2 ◽

And Cluster Analysis

Abstract Background: Multidrug/rifampicin-resistant tuberculosis (MDR/RR-TB) is a global barrel for ‘Stop TB plan’. China has the second highest MDR/RR-TB burden in whole world wide. Understanding the transmission dynamic is facilitated for disease control. Methods: Whole genome sequencing (WGS) data from patients of Chongqing tuberculosis control institute were used for phylogenetic classifications, resistance predictions, and cluster analysis as indicator for recent transmission (RT). Factors associated with MDR/RR-TB were defined by a logistic regression model. Results: A total of 223 cases of MDR/RR-TB were recorded between Jan 1, 2018 and Dec 31, 2020, and 200 cases obtained relevant treatment information. The patients who are older than 55 year old were more likely to suffering from death. 178 MDR/RR strains were obtained WGS data, 152 were classified as lineage 2 strains. 80 (44.9%, 80 of 178) strains were in 20 genomic clusters that differed by 12 or fewer single nucleotide polymorphisms (SNPs), indicating RT. Patients who were infected with lineage 2 strains is a significant factor driving the epidemic towards MDR/RR-TB. Resistance mutations of first-line tuberculosis drugs analysis found that 79 (98.8%) of all 80 strains defined as RT have same mutations among each clusters totally. 55% (44 of 80) of the MDR/RR-TB strains accumulated additional drug resistance mutations along the transmission chain, especially fluoroquinolones (FQs) (63.6%, 28 of 44). Conclusions: The age is the most significant factor that causes death of MDR/RR-TB patients. RT of MDR/RR strains is not only drove the MDR/RR-TB epidemic, but also accumulated more serious resistance along the transmission chains.

Download Full-text

Application of Whole Genome Sequencing to Understand Diversity and Presence of Genes Associated with Sanitizer Tolerance in Listeria monocytogenes from Produce Handling Sources

Foods ◽

10.3390/foods10102454 ◽

2021 ◽

Vol 10 (10) ◽

pp. 2454

Author(s):

Rebecca N. Bland ◽

Jared D. Johnson ◽

Joy G. Waite-Cusic ◽

Alexandra J. Weisberg ◽

Elizabeth R. Riutta ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Phenotype Screening ◽

Screening Assays ◽

Contamination Events ◽

Sequence Types ◽

Potential Use

Recent listeriosis outbreaks linked to fresh produce suggest the need to better understand and mitigate L. monocytogenes contamination in packing and processing environments. Using whole genome sequencing (WGS) and phenotype screening assays for sanitizer tolerance, we characterized 48 L. monocytogenes isolates previously recovered from environmental samples in five produce handling facilities. Within the studied population there were 10 sequence types (STs) and 16 cgMLST types (CTs). Pairwise single nucleotide polymorphisms (SNPs) ranged from 0 to 3047 SNPs within a CT, revealing closely and distantly related isolates indicative of both sporadic and continuous contamination events within the facility. Within Facility 1, we identified a closely related cluster (0–2 SNPs) of isolates belonging to clonal complex 37 (CC37; CT9492), with isolates recovered during sampling events 1-year apart and in various locations inside and outside the facility. The accessory genome of these CC37 isolates varied from 94 to 210 genes. Notable genetic elements and mutations amongst the isolates included the bcrABC cassette (2/48), associated with QAC tolerance; mutations in the actA gene on the Listeria pathogenicity island (LIPI) 1 (20/48); presence of LIPI-3 (21/48) and LIPI-4 (23/48). This work highlights the potential use of WGS in tracing the pathogen within a facility and understanding properties of L. monocytogenes in produce settings.

Download Full-text