scholarly journals AdmixSim 2: a forward-time simulator for modeling complex population admixture

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rui Zhang ◽  
Chang Liu ◽  
Kai Yuan ◽  
Xumin Ni ◽  
Yuwen Pan ◽  
...  

Abstract Background Computer simulations have been widely applied in population genetics and evolutionary studies. A great deal of effort has been made over the past two decades in developing simulation tools. However, there are not many simulation tools suitable for studying population admixture. Results We here developed a forward-time simulator, AdmixSim 2, an individual-based tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarios. Unlike its previous version, AdmixSim 2 is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation, which allow users to freely design and simulate any complex scenario involving population admixture. AdmixSim 2 can be used to simulate data of dioecious or monoecious populations, autosomes, or sex chromosomes. To our best knowledge, there are no similar tools available for the purpose of simulation of complex population admixture. Using empirical or previously simulated genomic data as input, AdmixSim 2 provides phased haplotype data for the convenience of further admixture-related analyses such as local ancestry inference, association studies, and other applications. We here evaluate the performance of AdmixSim 2 based on simulated data and validated functions via comparative analysis of simulated data and empirical data of African American, Mexican, and Uyghur populations. Conclusions AdmixSim 2 is a flexible simulation tool expected to facilitate the study of complex population admixture in various situations.

2016 ◽  
Author(s):  
Xiong Yang ◽  
Xumin Ni ◽  
Ying Zhou ◽  
Wei Guo ◽  
Kai Yuan ◽  
...  

Background: Population admixture has been a common phenomenon in human, animals and plants, and plays a very important role in shaping individual genetic architecture and population genetic diversity. Inference of population admixture, however, is challenging and typically relies on in silico simulation. We are aware of the lack of a computer tool for such a purpose, especially a simulator is not available for generating data under various and complex admixture scenarios. Results: Here we developed a forward-time simulator (AdmixSim) under standard Wright Fisher model, which can simulate admixed populations with: 1) multiple ancestral populations; 2) multiple waves of admixture events; 3) fluctuating population size; and 4) fluctuating admixture proportions. Results of analysis of the simulated data by AdmixSim show that our simulator can fast and accurately generate data resemble real one. We included in AdmixSim all possible parameters that allow users to modify and simulate any kinds of admixture scenarios easily so that it is very flexible. AdmixSim records recombination break points and trace of each chromosomal segment from different ancestral populations, with which users can easily do further analysis and comparative studies with empirical data. Conclusions: AdmixSim is expected to facilitate the study of population admixture by providing a simulation framework with flexible implementation of various admixture models and parameters.


2019 ◽  
Author(s):  
Johan Pensar ◽  
Santeri Puranen ◽  
Neil MacAlasdair ◽  
Juri Kuronen ◽  
Gerry Tonkin-Hill ◽  
...  

ABSTRACTDiscovery of polymorphisms under co-selective pressure or epistasis has received considerable recent attention in population genomics. Both statistical modeling of the population level co-variation of alleles across the chromosome and model-free testing of dependencies between pairs of polymorphisms have been shown to successfully uncover patterns of selection in bacterial populations. Here we introduce a model-free method, SpydrPick, whose computational efficiency enables analysis at the scale of pan-genomes of many bacteria. SpydrPick incorporates an efficient correction for population structure, which is demonstrated to maintain a very low rate of false positive findings among those SNP pairs highlighted to deviate significantly from the null hypothesis of neutral co-evolution in simulated data. We also introduce a new type of visualization of the results similar to the Manhattan plots used in genome-wide association studies, which enables rapid exploration of the identified signals of co-evolution. Application of the method to large population genomic data sets of two major human pathogens, Streptococcus pneumoniae and Neisseria meningitidis, revealed both previously identified and novel putative targets of co-selection related to virulence and antibiotic resistance, highlighting the potential of this approach to drive molecular discoveries, even in the absence of phenotypic data.


2020 ◽  
Vol 11 ◽  
Author(s):  
Xiong Yang ◽  
Kai Yuan ◽  
Xumin Ni ◽  
Ying Zhou ◽  
Wei Guo ◽  
...  

Background: Population admixture is a common phenomenon in humans, animals, and plants, and it plays a very important role in shaping individual genetic architecture and population genetic diversity. Inference of population admixture, however, is very challenging and typically relies on in silico simulation. We are aware of the lack of a computerized tool for such a purpose. A simulator capable of generating data under various complex admixture scenarios would facilitate the study of recombination, linkage disequilibrium, ancestry tracing, and admixture dynamics in admixed populations. We described such a simulator here.Results: We developed a forward-time simulator (AdmixSim) under the standard Wright Fisher model. It can simulate the following admixed populations: (1) multiple ancestral populations; (2) multiple waves of admixture events; (3) fluctuating population size; and (4) admixtures of fluctuating proportions. Analysis of the simulated data by AdmixSim showed that our simulator can quickly and accurately generate data resembling real-world values. We included in AdmixSim all possible parameters that would allow users to modify and simulate any kind of admixture scenario easily, so it is very flexible. AdmixSim records recombination break points and traces of each chromosomal segment from different ancestral populations, with which users can easily perform further analysis and comparative studies with empirical data.Conclusions:AdmixSim facilitates the study of population admixture by providing a simulation framework with the flexible implementation of various admixture models and parameters.


2016 ◽  
Author(s):  
Ying Zhou ◽  
Kai Yuan ◽  
Yaoliang Yu ◽  
Xumin Ni ◽  
Pengtao Xie ◽  
...  

AbstractTo infer the histories of population admixture, one important challenge with methods based on the admixture linkage disequilibrium (ALD) is to get rid of the effect of source LD (SLD) which is directly inherited from source populations. In previous methods, only the decay curve of weighted LD between pairs of sites whose genetic distance were larger than a certain starting distance was fitted by single or multiple exponential functions, for the inference of recent single- or multiple-wave of admixture. However, the effect of SLD has not been well defined and no tool has been developed to estimate the effect of SLD on weighted LD decay. In this study, we defined the SLD in the formularized weighted LD statistic under the two-way admixture model, and proposed polynomial spectrum (p-spectrum) to study the weighted SLD and weighted LD. We also found reference populations could be used to reduce the SLD in weighted LD statistic. We further developed a method, iMAAPs, to infer Multiple-wave Admixture by fitting ALD using Polynomial spectrum. We evaluated the performance of iMAAPs under various admixture models in simulated data and applied iMAAPs into analysis of genome-wide single nucleotide polymorphism data from the Human Genome Diversity Project (HGDP) and the HapMap Project. We showed that iMAAPs is a considerable improvement over other current methods and further facilitates the inference of the histories of complex population admixtures.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 258
Author(s):  
Karim Karimi ◽  
Duy Ngoc Do ◽  
Mehdi Sargolzaei ◽  
Younes Miar

Characterizing the genetic structure and population history can facilitate the development of genomic breeding strategies for the American mink. In this study, we used the whole genome sequences of 100 mink from the Canadian Centre for Fur Animal Research (CCFAR) at the Dalhousie Faculty of Agriculture (Truro, NS, Canada) and Millbank Fur Farm (Rockwood, ON, Canada) to investigate their population structure, genetic diversity and linkage disequilibrium (LD) patterns. Analysis of molecular variance (AMOVA) indicated that the variation among color-types was significant (p < 0.001) and accounted for 18% of the total variation. The admixture analysis revealed that assuming three ancestral populations (K = 3) provided the lowest cross-validation error (0.49). The effective population size (Ne) at five generations ago was estimated to be 99 and 50 for CCFAR and Millbank Fur Farm, respectively. The LD patterns revealed that the average r2 reduced to <0.2 at genomic distances of >20 kb and >100 kb in CCFAR and Millbank Fur Farm suggesting that the density of 120,000 and 24,000 single nucleotide polymorphisms (SNP) would provide the adequate accuracy of genomic evaluation in these populations, respectively. These results indicated that accounting for admixture is critical for designing the SNP panels for genotype-phenotype association studies of American mink.


Genetics ◽  
2001 ◽  
Vol 159 (3) ◽  
pp. 1319-1323
Author(s):  
Hong-Wen Deng

Abstract Association studies using random population samples are increasingly being applied in the identification and inference of genetic effects of genes underlying complex traits. It is well recognized that population admixture may yield false-positive identification of genetic effects for complex traits. However, it is less well appreciated that population admixture can appear to mask, change, or reverse true genetic effects for genes underlying complex traits. By employing a simple population genetics model, we explore the effects and the conditions of population admixture in masking, changing, or even reversing true genetic effects of genes underlying complex traits.


2018 ◽  
Author(s):  
Paloma Medina ◽  
Bryan Thornlow ◽  
Rasmus Nielsen ◽  
Russell Corbett-Detig

ABSTRACTAdmixture, the mixing of genetically distinct populations, is increasingly recognized as a fundamental biological process. One major goal of admixture analyses is to estimate the timing of admixture events. Whereas most methods today can only detect the most recent admixture event, here we present coalescent theory and associated software that can be used to estimate the timing of multiple admixture events in an admixed population. We extensively validate this approach and evaluate the conditions under which it can succesfully distinguish one from two-pulse admixture models. We apply our approach to real and simulated data of Drosophila melanogaster. We find evidence of a single very recent pulse of cosmopolitan ancestry contributing to African populations as well as evidence for more ancient admixture among genetically differentiated populations in sub-Saharan Africa. These results suggest our method can quantify complex admixture histories involving genetic material introduced by multiple discrete admixture pulses. The new method facilitates the exploration of admixture and its contribution to adaptation, ecological divergence, and speciation.


2019 ◽  
Author(s):  
Nicolas C. Rochette ◽  
Angel G. Rivera-Colón ◽  
Julian M. Catchen

AbstractFor half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes – generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.


Author(s):  
Daniel L. Hartl

This chapter could as well be titled “Population Genomics,” although many aspects of population genomics are integrated throughout the other chapters. It includes estimates of mutational variance and standing variance, phenotypic evolution under directional selection as measured by the linear selection gradient, and phenotypic evolution under stabilizing selection. It explores the strengths and limitations of genome-wide association studies of quantitative trait loci (QTLs) and expression (eQTLs) to detect genetic influencing complex traits in natural populations and genetic risk factors for complex diseases such as heart disease or diabetes. The number of genes affecting complex traits is considered, as well as evidence bearing on the issue of whether complex diseases are primarily affected by a very large number of genes, almost all of small effect, and how this bears on direct-to-consumer and over-the-counter genetic testing. The population genomics of adaptation is considered, including drug resistance, domestication, and local selection versus gene flow. The chapter concludes with the population genomics of speciation as illustrated by reinforcement of mating barriers, the reproducibility of phenotypic and genetic changes, and the accumulation of genetic incompatibilities.


Sign in / Sign up

Export Citation Format

Share Document