Ancestry Inference Using Reference Labeled Clusters of Haplotypes

AbstractWe present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and used to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations to 1,001 sections of a genotype using 10 CPU). We test ARCHes on public data from the 1,000 Genomes Project and HGDP as well as simulated examples of known admixture. Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at regional levels regardless of the amount of population admixture.

Download Full-text

Ancestry inference using reference labeled clusters of haplotypes

BMC Bioinformatics ◽

10.1186/s12859-021-04350-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yong Wang ◽

Shiya Song ◽

Joshua G. Schraiber ◽

Alisa Sedghifar ◽

Jake K. Byrnes ◽

...

Keyword(s):

Haplotype Diversity ◽

Population Admixture ◽

Genome Diversity ◽

1000 Genomes Project ◽

Local Ancestry ◽

1000 Genomes ◽

Public Data ◽

Population Information ◽

Test Sets ◽

Global And Local

Abstract Background We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. Results The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. Conclusions Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.

Download Full-text

Putting RFMix and ADMIXTURE to the test in a complex admixed population

10.1101/671727 ◽

2019 ◽

Author(s):

Caitlin Uren ◽

Eileen G. Hoal ◽

Marlo Möller

Keyword(s):

Association Studies ◽

Structured Populations ◽

Human Populations ◽

Computational Tools ◽

Local Ancestry ◽

Population Structure Analysis ◽

Admixed Population ◽

Ancestry Inference ◽

Global And Local ◽

Local Ancestry Inference

AbstractGlobal and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms, such as RFMix and ADMIXTURE. The accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios. Using simulations, we show that RFMix outperforms ADMIXTURE in determining global ancestry proportions in a complex 5-way admixed population. In addition, RFMix correctly assigns local ancestry with an accuracy of 89%. The increase in reported local ancestry inference accuracy in this population (as compared to previous studies) can largely be attributed to the recent availability of large-scale genotyping data for more representative reference populations. The ability of RFMix to determine global and local ancestry to a high degree of accuracy, allows for more reliable population structure analysis, scans for natural selection, admixture mapping and case-control association studies. This study highlights the utility of the extension of computational tools to become more relevant to genetically structured populations, as seen with RFMix. This is particularly noteworthy as modern-day societies are becoming increasingly genetically complex and some genetic tools are therefore less appropriate. We therefore suggest that RFMix be used for both global and local ancestry estimation in complex admixture scenarios.

Download Full-text

Putting RFMix and ADMIXTURE to the test in a complex admixed population

10.21203/rs.2.14878/v3 ◽

2020 ◽

Author(s):

Caitlin Uren ◽

Eileen G. Hoal ◽

Marlo Möller

Keyword(s):

World Wide ◽

Association Studies ◽

Structured Populations ◽

Human Populations ◽

Computational Tools ◽

Association Analyses ◽

Local Ancestry ◽

Diverse World ◽

Admixed Population ◽

Global And Local

Abstract Background Global and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms. The development and resulting accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios. Results Using simulations, we show that RFMix outperforms ADMIXTURE in determining global ancestry proportions even in a complex 5-way admixed population, in addition to assigning local ancestry with an accuracy of 89%. The ability of RFMix to determine global and local ancestry to a high degree of accuracy, particularly in admixed populations provides the opportunity for more accurate association analyses. Conclusion This study highlights the utility of the extension of computational tools to become more compatible to genetically structured populations, as well as the need to expand the sampling of diverse world-wide populations. This is particularly noteworthy as modern-day societies are becoming increasingly genetically complex and some genetic tools and commonly used ancestral populations are less appropriate. Based on these caveats and the results presented here, we suggest that RFMix be used for both global and local ancestry estimation in world-wide complex admixture scenarios particularly when including these estimates in association studies.

Download Full-text

Simulation-Based Evaluation of Three Methods for Local Ancestry Deconvolution of Non-model Crop Species Genomes

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400873 ◽

2019 ◽

Vol 10 (2) ◽

pp. 569-579

Author(s):

Aurélien Cottin ◽

Benjamin Penaud ◽

Jean-Christophe Glaszmann ◽

Nabila Yahiaoui ◽

Mathieu Gautier

Keyword(s):

R Package ◽

Source Population ◽

Crop Species ◽

Data Set ◽

Reproduction Mode ◽

Local Ancestry ◽

Ancestry Inference ◽

Source Populations ◽

Inference Methods ◽

Local Ancestry Inference

Hybridizations between species and subspecies represented major steps in the history of many crop species. Such events generally lead to genomes with mosaic patterns of chromosomal segments of various origins that may be assessed by local ancestry inference methods. However, these methods have mainly been developed in the context of human population genetics with implicit assumptions that may not always fit plant models. The purpose of this study was to evaluate the suitability of three state-of-the-art inference methods (SABER, ELAI and WINPOP) for local ancestry inference under scenarios that can be encountered in plant species. For this, we developed an R package to simulate genotyping data under such scenarios. The tested inference methods performed similarly well as far as representatives of source populations were available. As expected, the higher the level of differentiation between ancestral source populations and the lower the number of generations since admixture, the more accurate were the results. Interestingly, the accuracy of the methods was only marginally affected by i) the number of ancestries (up to six tested); ii) the sample design (i.e., unbalanced representation of source populations); and iii) the reproduction mode (e.g., selfing, vegetative propagation). If a source population was not represented in the data set, no bias was observed in inference accuracy for regions originating from represented sources and regions from the missing source were assigned differently depending on the methods. Overall, the selected ancestry inference methods may be used for crop plant analysis if all ancestral sources are known.

Download Full-text

Local ancestry inference gets faster and better

Nature Reviews Genetics ◽

10.1038/nrg3571 ◽

2013 ◽

Vol 14 (9) ◽

pp. 599-599

Keyword(s):

Local Ancestry ◽

Ancestry Inference ◽

Local Ancestry Inference

Download Full-text

Whole‐genome resequencing reveals diversity, global and local ancestry proportions in Yunling cattle

Journal of Animal Breeding and Genetics ◽

10.1111/jbg.12479 ◽

2020 ◽

Vol 137 (6) ◽

pp. 641-650 ◽

Cited By ~ 2

Author(s):

Qiuming Chen ◽

Jingxi Zhan ◽

Jiafei Shen ◽

Kaixing Qu ◽

Quratulain Hanif ◽

...

Keyword(s):

Whole Genome ◽

Genome Resequencing ◽

Local Ancestry ◽

Whole Genome Resequencing ◽

Global And Local ◽

Ancestry Proportions

Download Full-text

Determinant Factors of Pedestrian Volume in Different Land-Use Zones: Combining Space Syntax Metrics with GIS-Based Built-Environment Measures

Sustainability ◽

10.3390/su12208647 ◽

2020 ◽

Vol 12 (20) ◽

pp. 8647

Author(s):

Sugie Lee ◽

Chisun Yoo ◽

Kyung Wook Seo

Keyword(s):

Land Use ◽

Built Environment ◽

Space Syntax ◽

Global Integration ◽

Street Design ◽

Determinant Factors ◽

Environment Variables ◽

Public Data ◽

Local Integration ◽

Global And Local

This study combined space syntax metrics and geographic information systems (GIS)-based built-environment measures to analyze pedestrian volume in different land-use zones, as recorded in unique public data from a pedestrian volume survey of 10,000 locations in Seoul, Korea. The results indicate that most of the built-environment variables, such as density, land use, accessibility, and street design measures, showed statistically significant associations with pedestrian volume. Among the syntactic variables, global integration showed a statistically significant association with the average pedestrian volume in residential and commercial zones. In contrast, local integration turned out to be an important factor in the commercial zone. Therefore, this study concludes that the syntactic variables of global and local integration, as well as some built-environment variables, should be considered as determinant factors of pedestrian volume, though the effects of those variables varied by land-use zone. Therefore, planning and public policies should use tailored approaches to promote urban vitality through pedestrian volume in accordance with each land-use zone’s characteristics.

Download Full-text