scholarly journals GeneMates: an R package for detecting horizontal gene co-transfer between bacteria using gene-gene associations controlled for population structure

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yu Wan ◽  
Ryan R. Wick ◽  
Justin Zobel ◽  
Danielle J. Ingle ◽  
Michael Inouye ◽  
...  

Abstract Background Horizontal gene transfer contributes to bacterial evolution through mobilising genes across various taxonomical boundaries. It is frequently mediated by mobile genetic elements (MGEs), which may capture, maintain, and rearrange mobile genes and co-mobilise them between bacteria, causing horizontal gene co-transfer (HGcoT). This physical linkage between mobile genes poses a great threat to public health as it facilitates dissemination and co-selection of clinically important genes amongst bacteria. Although rapid accumulation of bacterial whole-genome sequencing data since the 2000s enables study of HGcoT at the population level, results based on genetic co-occurrence counts and simple association tests are usually confounded by bacterial population structure when sampled bacteria belong to the same species, leading to spurious conclusions. Results We have developed a network approach to explore WGS data for evidence of intraspecies HGcoT and have implemented it in R package GeneMates (github.com/wanyuac/GeneMates). The package takes as input an allelic presence-absence matrix of interested genes and a matrix of core-genome single-nucleotide polymorphisms, performs association tests with linear mixed models controlled for population structure, produces a network of significantly associated alleles, and identifies clusters within the network as plausible co-transferred alleles. GeneMates users may choose to score consistency of allelic physical distances measured in genome assemblies using a novel approach we have developed and overlay scores to the network for further evidence of HGcoT. Validation studies of GeneMates on known acquired antimicrobial resistance genes in Escherichia coli and Salmonella Typhimurium show advantages of our network approach over simple association analysis: (1) distinguishing between allelic co-occurrence driven by HGcoT and that driven by clonal reproduction, (2) evaluating effects of population structure on allelic co-occurrence, and (3) direct links between allele clusters in the network and MGEs when physical distances are incorporated. Conclusion GeneMates offers an effective approach to detection of intraspecies HGcoT using WGS data.

Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 258
Author(s):  
Karim Karimi ◽  
Duy Ngoc Do ◽  
Mehdi Sargolzaei ◽  
Younes Miar

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.


2018 ◽  
Vol 69 (3) ◽  
pp. 428-437 ◽  
Author(s):  
Eelco Franz ◽  
Ovidiu Rotariu ◽  
Bruno S Lopes ◽  
Marion MacRae ◽  
James L Bono ◽  
...  

AbstractBackgroundShiga toxin–producing Escherchia coli (STEC) O157:H7 is a zoonotic pathogen that causes numerous food and waterborne disease outbreaks. It is globally distributed, but its origin and the temporal sequence of its geographical spread are unknown.MethodsWe analyzed whole-genome sequencing data of 757 isolates from 4 continents, and performed a pan-genome analysis to identify the core genome and, from this, extracted single-nucleotide polymorphisms. A timed phylogeographic analysis was performed on a subset of the isolates to investigate its worldwide spread.ResultsThe common ancestor of this set of isolates occurred around 1890 (1845–1925) and originated from the Netherlands. Phylogeographic analysis identified 34 major transmission events. The earliest were predominantly intercontinental, moving from Europe to Australia around 1937 (1909–1958), to the United States in 1941 (1921–1962), to Canada in 1960 (1943–1979), and from Australia to New Zealand in 1966 (1943–1982). This pre-dates the first reported human case of E. coli O157:H7, which was in 1975 from the United States.ConclusionsInter- and intra-continental transmission events have resulted in the current international distribution of E. coli O157:H7, and it is likely that these events were facilitated by animal movements (eg, Holstein Friesian cattle). These findings will inform policy on action that is crucial to reduce the further spread of E. coli O157:H7 and other (emerging) STEC strains globally.


2018 ◽  
Vol 5 (4) ◽  
pp. 171615 ◽  
Author(s):  
Matthew S. Leslie ◽  
Phillip A. Morin

Little is known about global patterns of genetic connectivity in pelagic dolphins, including how circumtropical pelagic dolphins spread globally following the rapid and recent radiation of the subfamily delphininae. In this study, we tested phylogeographic hypotheses for two circumtropical species, the spinner dolphin ( Stenella longirostris ) and the pantropical spotted dolphin ( Stenella attenuata ), using more than 3000 nuclear DNA single nucleotide polymorphisms (SNPs) in each species. Analyses for population structure indicated significant genetic differentiation between almost all subspecies and populations in both species. Bayesian phylogeographic analyses of spinner dolphins showed deep divergence between Indo-Pacific, Atlantic and eastern tropical Pacific Ocean (ETP) lineages. Despite high morphological variation, our results show very close relationships between endemic ETP spinner subspecies in relation to global diversity. The dwarf spinner dolphin is a monophyletic subspecies nested within a major clade of pantropical spinner dolphins from the Indian and western Pacific Ocean populations. Population-level division among the dwarf spinner dolphins was detected—with the northern Australia population being very different from that in Indonesia. In contrast to spinner dolphins, the major boundary for spotted dolphins is between offshore and coastal habitats in the ETP, supporting the current subspecies-level taxonomy. Comparing these species underscores the different scale at which population structure can arise, even in species that are similar in habitat (i.e. pelagic) and distribution.


2020 ◽  
Author(s):  
Pengfei Hu ◽  
Yongyan Deng ◽  
Hengxing Ba ◽  
chunyi li

Abstract Sika deer (Cervus nippon) constitutes one of the most valuable animal genetic resources in east Asia. The aim of this study was to identify and validate single nucleotide polymorphisms (SNPs) from antler growth-related genes of sika deer. The whole genome sequencing data of sika deer were used to identify SNP markers. Among them, 31 SNPs from antler growth-related genes exhibited significant polymorphism using genotyping by mass spectrometry. The observed and expected heterozygosities were ranged from 0.147 to 0.997 and 0.201 to 0.500, respectively. Significant deviation from the Hardy-Weinberg equilibrium was observed in 6 loci. These findings provide effective molecular detection markers for the study of variation in antler growth rate of sika deer.


2019 ◽  
Author(s):  
Laure Olazcuaga ◽  
Anne Loiseau ◽  
Hugues Parrinello ◽  
Mathilde Paris ◽  
Antoine Fraimout ◽  
...  

AbstractEvidence is accumulating that evolutionary changes are not only common during biological invasions but may also contribute directly to invasion success. The genomic basis of such changes is still largely unexplored. Yet, understanding the genomic response to invasion may help to predict the conditions under which invasiveness can be enhanced or suppressed. Here we characterized the genome response of the spotted wing drosophila Drosophila suzukii during the worldwide invasion of this pest insect species, by conducting a genome-wide association study to identify genes involved in adaptive processes during invasion. Genomic data from 22 population samples were analyzed to detect genetic variants associated with the status (invasive versus native) of the sampled populations based on a newly developed statistic, we called C2, that contrasts allele frequencies corrected for population structure. This new statistical framework has been implemented in an upgraded version of the program BayPass. We identified a relatively small set of single nucleotide polymorphisms (SNPs) that show a highly significant association with the invasive status of populations. In particular, two genes RhoGEF64C and cpo, the latter contributing to natural variation in several life-history traits (including diapause) in Drosophila melanogaster, contained SNPs significantly associated with the invasive status in the two separate main invasion routes of D. suzukii. Our methodological approaches can be applied to any other invasive species, and more generally to any evolutionary model for species characterized by non-equilibrium demographic conditions for which binary covariables of interest can be defined at the population level.


2021 ◽  
Author(s):  
Sophie Hoffman ◽  
Zena Lapp ◽  
Joyce Wang ◽  
Evan Snitkin

Increasing evidence of regional pathogen transmission networks highlights the importance of investigating the dissemination of multidrug-resistant organisms (MDROs) across a region to identify where transmission is happening and how pathogens move across regions. We developed a framework for investigating MDRO regional transmission dynamics using whole-genome sequencing data and created regentrans, an easy-to-use, open source R package that implements these methods (https://github.com/Snitkin-Lab-Umich/regentrans). Using a dataset of over 400 carbapenem-resistant Klebsiella pneumoniae isolates collected from patients in 21 long-term acute care hospitals (LTACHs) over a one-year period, we demonstrate how to use our framework to gain insights into differences in inter- and intra-facility transmission across different LTACHs and over time. These tools will allow investigators to better understand the origins and transmission patterns of MDROs, which is the first step in understanding how to stop transmission at the regional level.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Giulio Caravagna ◽  
Guido Sanguinetti ◽  
Trevor A. Graham ◽  
Andrea Sottoriva

Abstract Background The large-scale availability of whole-genome sequencing profiles from bulk DNA sequencing of cancer tissues is fueling the application of evolutionary theory to cancer. From a bulk biopsy, subclonal deconvolution methods are used to determine the composition of cancer subpopulations in the biopsy sample, a fundamental step to determine clonal expansions and their evolutionary trajectories. Results In a recent work we have developed a new model-based approach to carry out subclonal deconvolution from the site frequency spectrum of somatic mutations. This new method integrates, for the first time, an explicit model for neutral evolutionary forces that participate in clonal expansions; in that work we have also shown that our method improves largely over competing data-driven methods. In this Software paper we present mobster, an open source R package built around our new deconvolution approach, which provides several functions to plot data and fit models, assess their confidence and compute further evolutionary analyses that relate to subclonal deconvolution. Conclusions We present the mobster package for tumour subclonal deconvolution from bulk sequencing, the first approach to integrate Machine Learning and Population Genetics which can explicitly model co-existing neutral and positive selection in cancer. We showcase the analysis of two datasets, one simulated and one from a breast cancer patient, and overview all package functionalities.


Sign in / Sign up

Export Citation Format

Share Document