scholarly journals Ancestry Inference Using Reference Labeled Clusters of Haplotypes

2020 ◽  
Author(s):  
Keith Noto ◽  
Yong Wang ◽  
Shiya Song ◽  
Joshua G. Schraiber ◽  
Alisa Sedghifar ◽  
...  

AbstractWe present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and used to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations to 1,001 sections of a genotype using 10 CPU). We test ARCHes on public data from the 1,000 Genomes Project and HGDP as well as simulated examples of known admixture. Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at regional levels regardless of the amount of population admixture.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yong Wang ◽  
Shiya Song ◽  
Joshua G. Schraiber ◽  
Alisa Sedghifar ◽  
Jake K. Byrnes ◽  
...  

Abstract Background We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual’s ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. Results The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. Conclusions Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


2019 ◽  
Author(s):  
Caitlin Uren ◽  
Eileen G. Hoal ◽  
Marlo Möller

AbstractGlobal and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms, such as RFMix and ADMIXTURE. The accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios. Using simulations, we show that RFMix outperforms ADMIXTURE in determining global ancestry proportions in a complex 5-way admixed population. In addition, RFMix correctly assigns local ancestry with an accuracy of 89%. The increase in reported local ancestry inference accuracy in this population (as compared to previous studies) can largely be attributed to the recent availability of large-scale genotyping data for more representative reference populations. The ability of RFMix to determine global and local ancestry to a high degree of accuracy, allows for more reliable population structure analysis, scans for natural selection, admixture mapping and case-control association studies. This study highlights the utility of the extension of computational tools to become more relevant to genetically structured populations, as seen with RFMix. This is particularly noteworthy as modern-day societies are becoming increasingly genetically complex and some genetic tools are therefore less appropriate. We therefore suggest that RFMix be used for both global and local ancestry estimation in complex admixture scenarios.


2020 ◽  
Author(s):  
Caitlin Uren ◽  
Eileen G. Hoal ◽  
Marlo Möller

Abstract Background Global and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms. The development and resulting accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios. Results Using simulations, we show that RFMix outperforms ADMIXTURE in determining global ancestry proportions even in a complex 5-way admixed population, in addition to assigning local ancestry with an accuracy of 89%. The ability of RFMix to determine global and local ancestry to a high degree of accuracy, particularly in admixed populations provides the opportunity for more accurate association analyses. Conclusion This study highlights the utility of the extension of computational tools to become more compatible to genetically structured populations, as well as the need to expand the sampling of diverse world-wide populations. This is particularly noteworthy as modern-day societies are becoming increasingly genetically complex and some genetic tools and commonly used ancestral populations are less appropriate. Based on these caveats and the results presented here, we suggest that RFMix be used for both global and local ancestry estimation in world-wide complex admixture scenarios particularly when including these estimates in association studies.


2019 ◽  
Vol 10 (2) ◽  
pp. 569-579
Author(s):  
Aurélien Cottin ◽  
Benjamin Penaud ◽  
Jean-Christophe Glaszmann ◽  
Nabila Yahiaoui ◽  
Mathieu Gautier

Hybridizations between species and subspecies represented major steps in the history of many crop species. Such events generally lead to genomes with mosaic patterns of chromosomal segments of various origins that may be assessed by local ancestry inference methods. However, these methods have mainly been developed in the context of human population genetics with implicit assumptions that may not always fit plant models. The purpose of this study was to evaluate the suitability of three state-of-the-art inference methods (SABER, ELAI and WINPOP) for local ancestry inference under scenarios that can be encountered in plant species. For this, we developed an R package to simulate genotyping data under such scenarios. The tested inference methods performed similarly well as far as representatives of source populations were available. As expected, the higher the level of differentiation between ancestral source populations and the lower the number of generations since admixture, the more accurate were the results. Interestingly, the accuracy of the methods was only marginally affected by i) the number of ancestries (up to six tested); ii) the sample design (i.e., unbalanced representation of source populations); and iii) the reproduction mode (e.g., selfing, vegetative propagation). If a source population was not represented in the data set, no bias was observed in inference accuracy for regions originating from represented sources and regions from the missing source were assigned differently depending on the methods. Overall, the selected ancestry inference methods may be used for crop plant analysis if all ancestral sources are known.


2020 ◽  
Vol 137 (6) ◽  
pp. 641-650 ◽  
Author(s):  
Qiuming Chen ◽  
Jingxi Zhan ◽  
Jiafei Shen ◽  
Kaixing Qu ◽  
Quratulain Hanif ◽  
...  

2020 ◽  
Vol 12 (20) ◽  
pp. 8647
Author(s):  
Sugie Lee ◽  
Chisun Yoo ◽  
Kyung Wook Seo

This study combined space syntax metrics and geographic information systems (GIS)-based built-environment measures to analyze pedestrian volume in different land-use zones, as recorded in unique public data from a pedestrian volume survey of 10,000 locations in Seoul, Korea. The results indicate that most of the built-environment variables, such as density, land use, accessibility, and street design measures, showed statistically significant associations with pedestrian volume. Among the syntactic variables, global integration showed a statistically significant association with the average pedestrian volume in residential and commercial zones. In contrast, local integration turned out to be an important factor in the commercial zone. Therefore, this study concludes that the syntactic variables of global and local integration, as well as some built-environment variables, should be considered as determinant factors of pedestrian volume, though the effects of those variables varied by land-use zone. Therefore, planning and public policies should use tailored approaches to promote urban vitality through pedestrian volume in accordance with each land-use zone’s characteristics.


2013 ◽  
Vol 93 (2) ◽  
pp. 278-288 ◽  
Author(s):  
Brian K. Maples ◽  
Simon Gravel ◽  
Eimear E. Kenny ◽  
Carlos D. Bustamante

BMC Genetics ◽  
2017 ◽  
Vol 18 (1) ◽  
Author(s):  
Daniel Hui ◽  
Zhou Fang ◽  
Jerome Lin ◽  
Qing Duan ◽  
Yun Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document