The inference of ancestry has become a part of the services many forensic genetic laboratories provide. Interest in ancestry may be to provide investigative leads or identify the region of origin in cases of unidentified missing persons. There exist many biostatistical methods developed for the study of population structure in the area of population genetics. However, the challenges and questions are slightly different in the context of forensic genetics, where the origin of a specific sample is of interest compared to the understanding of population histories and genealogies. In this paper, the methodologies for modelling population admixture and inferring ancestral populations are reviewed with a focus on their strengths and weaknesses in relation to ancestry inference in the forensic context.
The population genetics of digenic genotypes in diploid populations, genotypes based on alleles at two loci, have been studied theoretically for decades with relevant digenic traits of medical interest being known for over 25 years. Given the effects of linkage and linkage disequilibrium on two locus genotypes, it should be expected that these factors can change the expected frequencies of digenic genotypes in many, sometimes unexpected, ways. In particular, the combination of linkage disequilibrium and inbreeding can combine to increase the frequencies of double homozygotes and double heterozygotes significantly over outbred comparisons. Given the prevalence of linkage disequilibrium in recently admixed populations, this can lead to large shifts in trait prevalence such that it can sometimes exceed that of either original pre-admixed population with the combined effects of linkage disequilibrium and inbreeding. Here we investigate the frequencies of digenic genotypes under the combined effects of linkage, linkage disequilibrium, and inbreeding to analyze how these interact to increase or decrease the frequency of the genotypes across two loci.
Computer simulations have been widely applied in population genetics and evolutionary studies. A great deal of effort has been made over the past two decades in developing simulation tools. However, there are not many simulation tools suitable for studying population admixture.
We here developed a forward-time simulator, AdmixSim 2, an individual-based tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarios. Unlike its previous version, AdmixSim 2 is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation, which allow users to freely design and simulate any complex scenario involving population admixture. AdmixSim 2 can be used to simulate data of dioecious or monoecious populations, autosomes, or sex chromosomes. To our best knowledge, there are no similar tools available for the purpose of simulation of complex population admixture. Using empirical or previously simulated genomic data as input, AdmixSim 2 provides phased haplotype data for the convenience of further admixture-related analyses such as local ancestry inference, association studies, and other applications. We here evaluate the performance of AdmixSim 2 based on simulated data and validated functions via comparative analysis of simulated data and empirical data of African American, Mexican, and Uyghur populations.
AdmixSim 2 is a flexible simulation tool expected to facilitate the study of complex population admixture in various situations.
Cryptosporidium parvum is a global zoonoses and a major cause of diarrhoea in humans and ruminants. The parasite's life cycle comprises an obligatory sexual phase, during which genetic exchanges can occur between previously isolated lineages. Here, we compare 32 whole genome sequences from human- and ruminant-derived parasite isolates collected across Europe, Egypt and China. We identify three strongly supported clusters that comprise a mix of isolates from different host species, geographic origins, and subtypes. We show that: (1) recombination occurs between ruminant isolates into human isolates; (2) these recombinant regions can be passed on to other human subtypes through gene flow and population admixture; (3) there have been multiple genetic exchanges, and all are likely recent; (4) putative virulence genes are significantly enriched within these genetic exchanges, and (5) this results in an increase in their nucleotide diversity. We carefully dissect the phylogenetic sequence of two genetic exchanges, illustrating the long-term evolutionary consequences of these events. Our results suggest that increased globalisation and close human-animal contacts increase the opportunity for genetic exchanges between previously isolated parasite lineages, resulting in spillover and spillback events. We discuss how this can provide a novel substrate for natural selection at genes involved in host-parasite interactions, thereby potentially altering the dynamic coevolutionary equilibrium in the Red Queens arms race.
We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry.
The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture.
Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.
qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. Using qpAdm, it is possible to identify plausible models of admixture that fit the population history of a group of interest and to calculate the relative proportion of ancestry that can be ascribed to each source population in the model. Although qpAdm is widely used in studies of population history of human (and nonhuman) groups, relatively little has been done to assess its performance. We performed a simulation study to assess the behavior of qpAdm under various scenarios in order to identify areas of potential weakness and establish recommended best practices for use. We find that qpAdm is a robust tool that yields accurate results in many cases, including when data coverage is low, there are high rates of missing data or ancient DNA damage, or when diploid calls cannot be made. However, we caution against co-analyzing ancient and present-day data, the inclusion of an extremely large number of reference populations in a single model, and analyzing population histories involving extended periods of gene flow. We provide a user guide suggesting best practices for the use of qpAdm.
Background: Population admixture is a common phenomenon in humans, animals, and plants, and it plays a very important role in shaping individual genetic architecture and population genetic diversity. Inference of population admixture, however, is very challenging and typically relies on in silico simulation. We are aware of the lack of a computerized tool for such a purpose. A simulator capable of generating data under various complex admixture scenarios would facilitate the study of recombination, linkage disequilibrium, ancestry tracing, and admixture dynamics in admixed populations. We described such a simulator here.Results: We developed a forward-time simulator (AdmixSim) under the standard Wright Fisher model. It can simulate the following admixed populations: (1) multiple ancestral populations; (2) multiple waves of admixture events; (3) fluctuating population size; and (4) admixtures of fluctuating proportions. Analysis of the simulated data by AdmixSim showed that our simulator can quickly and accurately generate data resembling real-world values. We included in AdmixSim all possible parameters that would allow users to modify and simulate any kind of admixture scenario easily, so it is very flexible. AdmixSim records recombination break points and traces of each chromosomal segment from different ancestral populations, with which users can easily perform further analysis and comparative studies with empirical data.Conclusions:AdmixSim facilitates the study of population admixture by providing a simulation framework with the flexible implementation of various admixture models and parameters.
AbstractWe present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and used to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations to 1,001 sections of a genotype using 10 CPU). We test ARCHes on public data from the 1,000 Genomes Project and HGDP as well as simulated examples of known admixture. Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at regional levels regardless of the amount of population admixture.
Avocado is an important cash crop in Tanzania, however its genetic diversity is not thoroughly investigated. This study was undertaken to explore the genetic diversity of avocado in the southern highlands using microsatellite markers. A total of 226 local avocado trees originating from seeds were sampled in eight districts of the Mbeya, Njombe and Songwe regions. Each district was considered as a population. The diversity at 10 microsatellite loci was investigated.
A total of 167 alleles were detected across the 10 loci with an average of 16.7 ± 1.3 alleles per locus. The average expected and observed heterozygosity were 0.84 ± 0.02 and 0.65 ± 0.04, respectively. All but two loci showed a significant deviation from the Hardy-Weinberg principle. Analysis of molecular variance showed that about 6% of the variation was partitioned among the eight geographic populations. Population FST pairwise comparisons revealed lack of genetic differentiation for the seven of 28 population pairs tested. The principal components analysis (PCA) and hierarchical cluster analysis showed a mixing of avocado trees from different districts. The model-based STRUCTURE subdivided the trees samples into four major genetic clusters.
High diversity detected in the analysed avocado germplasm implies that this germplasm is a potentially valuable source of variable alleles that might be harnessed for genetic improvement of this crop in Tanzania. The mixing of avocado trees from different districts observed in the PCA and dendrogram points to strong gene flow among the avocado populations, which led to population admixture revealed in the STRUCTURE analysis. However, there is still significant differentiation among the tree populations from different districts that can be utilized in the avocado breeding program.
AbstractA detailed derivation of the f–statistics formalism is made from a geometrical framework. It is shown that the f–statistics appear when a genetic distance matrix is constrained to describe a four population phylogenetic tree. The choice of genetic metric is crucial and plays an outstanding role as regards the tree–like–ness criterion. The case of lack of treeness is interpreted in the formalism as presence of population admixture. In this respect, four formulas are given to estimate the admixture proportions. One of them is the so–called f4–ratio estimate and we show that a second one is related to a known result developed in terms of the fixation index FST. An illustrative numerical simulation of admixture proportion estimates is included. Relationships of the formalism with coalescence times and pairwise sequence differences are also provided.