scholarly journals Genome-wide association, prediction and heritability in bacteria

2021 ◽  
Author(s):  
Sudaraka Mallawaarachchi ◽  
Gerry Tonkin-Hill ◽  
Nicholas J. Croucher ◽  
Paul Turner ◽  
Doug Speed ◽  
...  

AbstractAdvances in whole-genome genotyping and sequencing have allowed genome-wide analyses of association, prediction and heritability in many organisms. However, the application of such analyses to bacteria is still in its infancy, being limited by difficulties including the plasticity of bacterial genomes and their strong population structure. Here we propose a suite of genome-wide analyses for bacteria that combines methods from human genetics and previous bacterial studies, including linear mixed models, elastic net and LD-score regression. We introduce innovations such as frequency-based allele coding, testing for both insertion/deletion and nucleotide effects and partitioning heritability by genome region. Using a previously-published large cohort study, we analyse three phenotypes of a major human pathogen Streptococcus pneumoniae, including the first analyses of minimum inhibitory concentrations (MIC) for each of two antibiotics, penicillin and ceftriaxone. We show that these are very highly heritable leading to high prediction accuracy, which is explained by many genetic associations identified under good control of population structure effects. In the case of ceftriaxone MIC, these results are surprising because none of the isolates was resistant according to the inhibition zone diameter threshold. We estimate that just over half of the heritability of penicillin MIC is explained by a known drug-resistance region, which also contributes around a quarter of the heritability of ceftriaxone MIC. For the within-host survival phenotype carriage duration, no reliable associations were found but we observed moderate heritability and prediction accuracy, indicating a polygenic trait. While generating important new results for S. pneumoniae, we have critically assessed existing methods and introduced innovations that will be useful for future large-scale population genomics studies to help decipher the genetic architecture of bacterial traits.Author summaryGenome-wide association, prediction and heritability analyses in bacteria are beginning to help unravel the genetic underpinnings of traits such as antimicrobial resistance, virulence, within-host survival and transmissibility. Progress to date is limited by challenges including the effects of strong population structure and variable recombination, and the many gaps in sequence alignments including the absence of entire genes in many isolates. More work is required to critically asses and develop methods for bacterial genomics. We address this task here, using a range of existing methods from bacterial and human genetics, such as linear mixed models, elastic net and LD-score regression. We adapt these methods to introduce new analyses, including separate assessment of gap and nucleotide effects, a new allele coding for association analyses and a method to partition heritability into genome regions. We analyse within-host survival and two antimicrobial response traits of Streptococcus pneumoniae, identifying many novel associations while demonstrating good control of population structure and accurate prediction. We present both new results for an important pathogen and methodological advances that will be useful in guiding future studies in bacterial population genomics.

PLoS Genetics ◽  
2016 ◽  
Vol 12 (3) ◽  
pp. e1005849 ◽  
Author(s):  
Jae Hoon Sul ◽  
Michael Bilow ◽  
Wen-Yun Yang ◽  
Emrah Kostem ◽  
Nick Furlotte ◽  
...  

2019 ◽  
Author(s):  
Morteza M. Saber ◽  
Jesse Shapiro

AbstractGenome Wide Association Studies (GWASs) have the potential to reveal the genetics of microbial phenotypes such as antibiotic resistance and virulence. Capitalizing on the growing wealth of bacterial sequence data, microbial GWAS methods aim to identify causal genetic variants while ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure and genome-wide linkage, making it challenging to separate true “hits” (i.e. mutations that cause a phenotype) from non-causal linked mutations. GWAS methods attempt to correct for population structure in different ways, but their performance has not yet been systematically evaluated. Here we developed a bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination, and other evolutionary parameters, along with a subset of causal mutations underlying a phenotype of interest. We assessed the performance (recall and precision) of three widely-used univariate GWAS approaches (cluster-based, dimensionality-reduction, and linear mixed models, implemented in PLINK, pySEER, and GEMMA) and one relatively new whole-genome elastic net model implemented in pySEER, across a range of simulated sample sizes, recombination rates, and causal mutation effect sizes. As expected, all methods performed better with larger sample sizes and effect sizes. The performance of clustering and dimensionality reduction approaches to correct for population structure were considerably variable according to the choice of parameters. Notably, the elastic net whole-genome model was consistently amongst the highest-performing methods and had the highest power in detecting causal variants with both low and high effect sizes. Most methods reached good performance (Recall > 0.75) to identify causal mutations of strong effect size (log Odds Ratio >= 2) with a sample size of 2000 genomes. However, only elastic nets reached reasonable performance (Recall = 0.35) for detecting markers with weaker effects (log OR ∼1) in smaller samples. Elastic nets also showed superior precision and recall in controlling for genome-wide linkage, relative to univariate models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting room for improvement in method development. These findings show the potential for whole-genome models to improve bacterial GWAS performance. BacGWASim code and simulated data are publicly available to enable further comparisons and benchmarking of new methods.Author summaryMicrobial populations contain measurable phenotypic differences with important clinical and environmental consequences, such as antibiotic resistance, virulence, host preference and transmissibility. A major challenge is to discover the genes and mutations in bacterial genomes that control these phenotypes. Bacterial Genome-Wide Association Studies (GWASs) are family of methods to statistically associate phenotypes with genotypes, such as point mutations and other variants across the genome. However, compared to sexual organisms such as humans, bacteria reproduce clonally meaning that causal mutations tend to be strongly linked to other mutations on the same chromosome. This genome-wide linkage makes it challenging to statistically separate causal mutations from non-causal false-positive associations. Several GWAS methods are currently available, but it is not clear which is the most powerful and accurate for bacteria. To systematically evaluate these methods, we developed BacGWASim, a computational pipeline to simulate the evolution of bacterial genomes and phenotypes. Using simulated genomes, we found that GWAS methods varied widely in their performance. In general, causal mutations of strong effect (e.g. those under strong selection for antibiotic resistance) could be easily identified with relatively small samples sizes of around 1000 genomes, but more complex phenotypes controlled by mutations of weaker effect required 3000 genomes or more. We found that a recently-developed GWAS method called elastic net was particularly good at identifying causal mutations in highly clonal populations, with strong linkage between mutations – but there is still room for improvement. The BacGWASim computer code is publicly available to enable further comparisons and benchmarking of new methods.


2019 ◽  
Author(s):  
Yasin Kaymaz ◽  
Cliff I. Oduor ◽  
Ozkan Aydemir ◽  
Micah A. Luftig ◽  
Juliana A. Otieno ◽  
...  

AbstractEndemic Burkitt lymphoma (eBL), the most prevalent pediatric cancer in sub-Saharan Africa, is associated with malaria and Epstein Barr virus (EBV). In order to better understand the role of EBV in eBL, we improved viral DNA enrichment methods and generated a total of 98 new EBV genomes from both eBL cases (N=58) and healthy controls (N=40) residing in the same geographic region in Kenya. Comparing cases and controls, we found that EBV type 1 was significantly associated with eBL with 74.5% of patients (41/55) versus 47.5% of healthy children (19/40) carrying type 1 (OR=3.24, 95% CI=1.36 - 7.71,P=0.007). Controlling for EBV type, we also performed a genome-wide association study identifying 6 nonsynonymous variants in the genes EBNA1, EBNA2, BcLF1, and BARF1 that were enriched in eBL patients. Additionally, we observed that viruses isolated from plasma of eBL patients were identical to their tumor counterpart consistent with circulating viral DNA originating from the tumor. We also detected three intertypic recombinants carrying type 1 EBNA2 and type 2 EBNA3 regions as well as one novel genome with a 20 kb deletion resulting in the loss of multiple lytic and virion genes. Comparing EBV types, genes show differential variation rates as type 1 appears to be more divergent. Besides, type 2 demonstrates novel substructures. Overall, our findings address the complexities of EBV population structure and provide new insight into viral variation, which has the potential to influence eBL oncogenesis.Key PointsEBV type 1 is more prevalent in eBL patients compared to the geographically matched healthy control group.Genome-wide association analysis between cases and controls identifies 6 eBL-associated nonsynonymous variants in EBNA1, EBNA2, BcLF1, and BARF1 genes.Analysis of population structure reveals that EBV type 2 exists as two genomic sub groups.


2019 ◽  
Author(s):  
Maja Boczkowska ◽  
Katarzyna Bączek ◽  
Olga Kosakowska ◽  
Anna Rucińska ◽  
Wiesław Podyma ◽  
...  

Abstract Background: Valeriana officinalis L. is one of the most important medicinal plant with a mild sedative, nervine, antispasmodic and relaxant effect. Despite a substantial number of studies on this species, population genomics has not yet been analyzed. The main aim of this study was: characterization of genetic variation of natural populations of V. officinalis in Poland and comparison of variation of wild populations and the cultivated form using Next Generation Sequencing based DArTseq technique. We also would like to establish foundations for genetic monitoring of the species in the future and to develop genetic fingerprint profile for samples deposited in gene bank and in natural sites in order to assess the degree of their genetic integrity and population structure preservation in the future.Results: The major and also the most astounding result of our work is the low level of observed heterozygosity of individual plants from natural populations despite the fact that the species is widespread in the studied area. Inbreeding, in naturally outcrossing species such as valerian, decreases the reproductive success. The analysis of the population structure indicated the potential presence of metapopulation in a broad area of Poland and the formation of a distinct gene pool in Bieszczady Mountains. The results also indicate the presence of individuals of the cultivated form in natural populations in the region where the species is cultivated for the needs of the pharmaceutical industry and this could lead to structural and genetic imbalance in wild populations.Conclusions: The DArTseq technology can be applied effectively in genetic studies of V. officinalis. The genetic variability of wild populations is in fact significantly lower than assumed. Individuals from the cultivated population are found in the natural environment and their impact on wild populations should be monitored.


PLoS ONE ◽  
2019 ◽  
Vol 14 (10) ◽  
pp. e0224074 ◽  
Author(s):  
Namhee Jeong ◽  
Ki-Seung Kim ◽  
Seongmun Jeong ◽  
Jae-Yoon Kim ◽  
Soo-Kwon Park ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document