scholarly journals Perspective: Genomic inference using diffusion models and the allele frequency spectrum

2018 ◽  
Author(s):  
Aaron P. Ragsdale ◽  
Claudia Moreau ◽  
Simon Gravel

AbstractEvolutionary, biological, and demographic processes combine to shape the variation observed in populations. Understanding how these processes are expected to influence variation allows us to infer past demographic events and the nature of selection in human populations. Forward models such as the diffusion approximation provide a powerful tool for analyzing the distribution of allele frequencies in contemporary populations due to their computational tractability and model flexibility. Here, we discuss recent computational developments and their application to reconstructing human demographic history and patterns of selection at new mutations. We also reexamine how some classical assumptions that are still commonly used in inference studies fare when applied to modern data. We use whole-genome sequence data for 797 French Canadian individuals to examine the neutrality of synonymous sites. We find that selection can lead to strong biases in the inferred demography, mutation rate, and distributions of fitness effects. We use these distributions of fitness effects together with demographic and phenotype-fitness models to predict the relationship between effect size and allele frequency, and contrast those predictions to commonly used models in statistical genetics. Thus the simple evolutionary models investigated by Kimura and Ohta still provide important insight into modern genetic research.

eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
CJ Battey ◽  
Peter L Ralph ◽  
Andrew D Kern

Most organisms are more closely related to nearby than distant members of their species, creating spatial autocorrelations in genetic data. This allows us to predict the location of origin of a genetic sample by comparing it to a set of samples of known geographic origin. Here, we describe a deep learning method, which we call Locator, to accomplish this task faster and more accurately than existing approaches. In simulations, Locator infers sample location to within 4.1 generations of dispersal and runs at least an order of magnitude faster than a recent model-based approach. We leverage Locator’s computational efficiency to predict locations separately in windows across the genome, which allows us to both quantify uncertainty and describe the mosaic ancestry and patterns of geographic mixing that characterize many populations. Applied to whole-genome sequence data from Plasmodium parasites, Anopheles mosquitoes, and global human populations, this approach yields median test errors of 16.9km, 5.7km, and 85km, respectively.


2020 ◽  
Author(s):  
Jorge da Rocha ◽  
Houcemeddine Othman ◽  
Caroline T. Tiemessen ◽  
Gerrit Botha ◽  
Michèle Ramsay ◽  
...  

AbstractChloroquine/hydroxychloroquine have been proposed as potential treatments for COVID-19. These drugs have warning labels for use in individuals with glucose-6-phosphate dehydrogenase (G6PD) deficiency. Analysis of whole-genome sequence data of 458 individuals from sub-Saharan Africa showed significant G6PD variation across the continent. We identified nine variants, of which four are potentially deleterious to G6PD function, and one (rs1050828) that is known to cause G6PD deficiency. We supplemented data for the rs1050828 variant with genotype array data from over 11,000 Africans. Although this variant is common in Africans overall, large allele frequency differences exist between sub-populations. African sub-populations in the same country can show significant differences in allele frequency (e.g. 16.0% in Tsonga vs 0.8% in Xhosa, both in South Africa, p = 2.4 × 10−3). The high prevalence of variants in the G6PD gene found in this analysis suggests that it may be a significant interaction factor in clinical trials of chloroquine and hydrochloroquine for treatment of COVID-19 in Africans.


2019 ◽  
Vol 286 (1903) ◽  
pp. 20181976 ◽  
Author(s):  
Tanya N. Phung ◽  
Robert K. Wayne ◽  
Melissa A. Wilson ◽  
Kirk E. Lohmueller

The demographic history of dogs is complex, involving multiple bottlenecks, admixture events and artificial selection. However, existing genetic studies have not explored variance in the number of reproducing males and females, and whether it has changed across evolutionary time. While male-biased mating practices, such as male-biased migration and multiple paternity, have been observed in wolves, recent breeding practices could have led to female-biased mating patterns in breed dogs. For example, breed dogs are thought to have experienced a popular sire effect, where a small number of males father many offspring with a large number of females. Here we use genetic variation data to test how widespread sex-biased mating practices in canines are during different evolutionary time points. Using whole-genome sequence data from 33 dogs and wolves, we show that patterns of diversity on the X chromosome and autosomes are consistent with a higher number of reproducing males than females over ancient evolutionary history in both dogs and wolves, suggesting that mating practices did not change during early dog domestication. By contrast, since breed formation, we found evidence for a larger number of reproducing females than males in breed dogs, consistent with the popular sire effect. Our results confirm that canine demography has been complex, with opposing sex-biased processes occurring throughout their history. The signatures observed in genetic data are consistent with documented sex-biased mating practices in both the wild and domesticated populations, suggesting that these mating practices are pervasive.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Abraham Gihawi ◽  
Ghanasyam Rallapalli ◽  
Rachel Hurst ◽  
Colin S. Cooper ◽  
Richard M. Leggett ◽  
...  

Abstract Background Human tissue is increasingly being whole genome sequenced as we transition into an era of genomic medicine. With this arises the potential to detect sequences originating from microorganisms, including pathogens amid the plethora of human sequencing reads. In cancer research, the tumorigenic ability of pathogens is being recognized, for example, Helicobacter pylori and human papillomavirus in the cases of gastric non-cardia and cervical carcinomas, respectively. As of yet, no benchmark has been carried out on the performance of computational approaches for bacterial and viral detection within host-dominated sequence data. Results We present the results of benchmarking over 70 distinct combinations of tools and parameters on 100 simulated cancer datasets spiked with realistic proportions of bacteria. mOTUs2 and Kraken are the highest performing individual tools achieving median genus-level F1 scores of 0.90 and 0.91, respectively. mOTUs2 demonstrates a high performance in estimating bacterial proportions. Employing Kraken on unassembled sequencing reads produces a good but variable performance depending on post-classification filtering parameters. These approaches are investigated on a selection of cervical and gastric cancer whole genome sequences where Alphapapillomavirus and Helicobacter are detected in addition to a variety of other interesting genera. Conclusions We provide the top-performing pipelines from this benchmark in a unifying tool called SEPATH, which is amenable to high throughput sequencing studies across a range of high-performance computing clusters. SEPATH provides a benchmarked and convenient approach to detect pathogens in tissue sequence data helping to determine the relationship between metagenomics and disease.


2018 ◽  
Author(s):  
Flora Jay ◽  
Simon Boitard ◽  
Frédéric Austerlitz

AbstractSpecies generally undergo a complex demographic history, consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large datasets. Here we designed an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation (ABC), a simulation-based statistical framework that allows (i) identifying the best demographic scenario among several competing scenarios, and (ii) estimating the best-fitting parameters under the chosen scenario. ABC relies on the computation of summary statistics. Using a cross-validation approach, we showed that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (eg heterozygosity, Tajima’s D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrated the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally showed that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion was the most relevant for Eurasian populations.


2019 ◽  
Author(s):  
William Walton ◽  
Graham N Stone ◽  
Konrad Lohse

AbstractSignatures of changes in population size have been detected in genome-wide variation in many species. However, the causes of such changes and the extent to which they are shared across co-distributed species remain poorly understood. During Pleistocene glacial maxima, many temperate European species were confined to southern refugia. While vicariance and range expansion processes associated with glacial cycles have been widely studied, little is known about the demographic history of refugial populations, and the extent and causes of demographic variation among codistributed species. We used whole genome sequence data to reconstruct and compare demographic histories during the Quaternary for Iberian refuge populations in a single ecological guild (seven species of chalcid parasitoid wasps associated with oak cynipid galls). We find support for large changes in effective population size (Ne) through the Pleistocene that coincide with major climate change events. However, there is little evidence that the timing, direction and magnitude of demographic change are shared across species, suggesting that demographic histories are largely idiosyncratic. Our results are compatible with the idea that specialist parasitoids attacking a narrow range of hosts experience greater fluctuations in Ne than generalists.


2021 ◽  
Author(s):  
◽  
Leah Kemp

<p>Pseudocaranx georgianus is a commercially important fishery in New Zealand. Currently, the management of this fishery assumes that Quota Management Areas comprise single biological stocks of a single species. However, little is known regarding the population structure of New Zealand P. georgianus and morphological data suggests that a cryptic Pseudocaranx species is included within these fisheries.  Whole genome sequence data was used to assemble and describe the first P. georgianus mitogenome. Primers were developed to produce the first genetic sequence data for New Zealand P. georgianus. The cytochrome c oxidase subunit I (COI) gene was sequenced for fourteen P. georgianus from New Zealand waters. These were compared phylogenetically with existing COI sequence data for P. georgianus from Australia and other Pseudocaranx species from a world-wide distribution. The hyper-variable control region of 304 P. georgianus sampled throughout New Zealand’s North Island and 68 P. georgianus from three locations in Western Australia were also sequenced. These sequences were used to explore the population structure and demographic history of New Zealand P. georgianus using haplotype networks, AMOVA’s, genetic diversity measures, Tajima’s D, Fu’s F and Bayesian migration analyses.  The P. georgianus mitogenome is typical of Cartilaginous fish species showing no major gene rearrangements, typical gene region lengths and stop and start codons. While assembling the P. georgianus mitogenome, this thesis demonstrates the importance of key methodological choices made when assembling mitogenomes from whole genome sequence data in silco in Geneious version 11.1. The choice of reference mitogenome has the largest influence on the quality of the assembly, impacting the annotation of the final mitogenome and the resolution of uncertain DNA regions. Increasing the number of mapping iterations increased the quality of the assembly but has a limited ability to mitigate the effects of using a poor reference mitogenome. Overall, I demonstrate the need to investigate and report the quality of published mitogenomes.   All Pseudocaranx species were monophyletic on the COI gene, supporting the current taxonomy of the Pseudocaranx complex. P. georgianus from Western Australia and New Zealand’s North Island represent a monophyletic clade pending a taxonomic verification that two Pseudocaranx dentex sampled in Australia are in fact P. georgianus.   No evidence was found to suggest that either of the New Zealand or Western Australian populations of P. georgianus are isolated by distance or clearly structured as distinct stocks. However, some populations of New Zealand P. georgianus were genetically distinct, including fish sampled from Raglan and the Bay of Plenty (FST of 0.02698 (p-value: 0.00901+-0.0091) as well as the North Cape and North Taranaki Bight (FST: 0.02698, p-value: 0.00901+-0.0091).   Some evidence was found to support the claim that P. georgianus along the west coast of New Zealand’s North Island is structured and no evidence was found to refute the claim that fish from the Bay of Plenty are the same biological stock as fish from TRE2. Highly divergent control region sequences of fish sampled from Three Kings Islands and the Kermadec Islands suggest that these fish could be a species distinct from P. georgianus. Two genetically distinct populations of P. georgianus were identified in New Zealand’s North Island and Western Australia (FST: 0.03517, p-value < 0.001), but further research would be required to determine if they are distinct species or populations. One juvenile population sampled in Whangarei had a high level of genetic connectivity with adult P. georgianus throughout New Zealand’s North Island, likely reflecting the batch spawning and occasional long-distance migration behaviour of P. georgianus.  Negative Tajima’s D and Fu’s F statistics (D: -1.50612, p-value: 0.018; F: -23.54376, p-value: 0.011), unimodal mismatch distributions and skyline plots indicate that the New Zealand P. georgianus population has undergone a population expansion, possibly resulting from a geographic range expansion.The Western Australian population may also have undergone a population expansion (D: -1.27903, p-value: 0.086; F: -24.11497, p-value < 0.00001). However, a multimodal mismatch distribution (Harpending’s Raggedness index: 0.00454591, p-value: 0.02) indicated that there is some stability in the size of this population.   This thesis is a first genetic investigation into New Zealand P. georgianus and has provided important biological insights into this species. Valuable information is revealed which will inform the management of New Zealand P. georgianus fisheries as inputs for stock assessment models. Additionally, several future research directions have been revealed which will further extend our knowledge of this taonga. For example, future genetic and taxonomic analyses may reveal a cryptic Pseudocaranx species occurring in the Three Kings and Kermadec Islands.</p>


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 24-24
Author(s):  
Jicai Jiang ◽  
Li Ma ◽  
Jeffrey O’Connell

Abstract Partitioning SNP heritability by many functional annotations has been a successful tool for understanding the genetic architecture of complex traits in human genetic studies. Similar analyses are being extended to animal research, as (imputed) whole-genome sequence data of many individuals and various functional annotations have become available in livestock animals. Though many approaches have been developed for heritability partition (e.g., LDSC and HE-reg), they are mostly based on approximations tailored to human populations and few can produce statistically efficient estimates for animal genomic studies where individuals are often related. To tackle this issue, we present a stochastic MINQUE (Minimum Norm Quadratic Unbiased Estimation) approach for partitioning SNP heritability, which we refer to as MPH. We provide a theoretical analysis comparing LDSC and HE-reg with REML and MPH and demonstrate what LDSC and HE-reg (and similar methods) take advantage of in their approximations: sparse relationships between individuals and relatively weak linkage disequilibrium. We also show that our method is mathematically equivalent to the MC-REML approach implemented in BOLT. MPH has three key features. First, it is comparable to genomic REML in terms of accuracy, while being at least one order of magnitude faster than GCTA and BOLT and using only ~1/4 of memory as much as GCTA, when applied to sequence data and many variance components (or functional annotation categories). Second, it can do weighted analyses if residual variances are unequal (such as DYD). Third, it works for many overlapping functional annotations. Using simulations based on a human pedigree and a dairy cattle pedigree, we illustrate the benefits of our method for partitioning SNP heritability in pedigree-based studies. We also demonstrate that it is feasible to efficiently partition SNP heritability for animal genomes with strong, long-span LD. MPH is freely available at https://jiang18.github.io/mph.


2020 ◽  
Author(s):  
Mingliang Chen ◽  
Odile B. Harrison ◽  
Holly B. Bratcher ◽  
Zhiyan Bo ◽  
Keith A. Jolley ◽  
...  

AbstractThe expansion of quinolone-resistant Neisseria meningitidis clone ChinaCC4821-R1-C/B from ST-4821 clonal complex (cc4821) caused a serogroup shift from serogroup A to C in invasive meningococcal disease (IMD) in China. To establish the relationship among globally distributed cc4821 meningococci, we analysed whole genome sequence data from 173 cc4821 meningococci isolated in four continents from 1972-2019. These meningococci clustered into four sub-lineages (1-4), with sub-lineage 1 primarily comprising serogroup C IMD isolates (82%, 41/50). Most isolates from outside China formed a distinct sub-lineage (81.6%, 40/49, the Europe-USA cluster), with the typical strain designation B:P1.17-6,23:F3-36:ST-3200(cc4821) and harbouring mutations in penicillin-binding protein 2. These data show that the quinolone-resistant clone ChinaCC4821-R1-C/B has expanded to other countries. The increasing global distribution of B:cc4821 meningococci raises concern that cc4821 has the potential to cause a global pandemic and, this would be challenging to control though there is indirect evidence that Trumenba® vaccine might afford some protection.


Author(s):  
Jorge E. B. da Rocha ◽  
Houcemeddine Othman ◽  
Caroline T. Tiemessen ◽  
Gerrit Botha ◽  
Michèle Ramsay ◽  
...  

AbstractChloroquine/hydroxychloroquine have been proposed as potential treatments for COVID-19. These drugs have warning labels for use in individuals with glucose-6-phosphate dehydrogenase (G6PD) deficiency. Analysis of whole genome sequence data of 458 individuals from sub-Saharan Africa showed significant G6PD variation across the continent. We identified nine variants, of which four are potentially deleterious to G6PD function, and one (rs1050828) that is known to cause G6PD deficiency. We supplemented data for the rs1050828 variant with genotype array data from over 11,000 Africans. Although this variant is common in Africans overall, large allele frequency differences exist between sub-populations. African sub-populations in the same country can show significant differences in allele frequency (e.g. 16.0% in Tsonga vs 0.8% in Xhosa, both in South Africa, p = 2.4 × 10−3). The high prevalence of variants in the G6PD gene found in this analysis suggests that it may be a significant interaction factor in clinical trials of chloroquine and hydroxychloroquine for treatment of COVID-19 in Africans.


Sign in / Sign up

Export Citation Format

Share Document