scholarly journals Comparing local ancestry inference models in populations of two- and three-way admixture

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10090
Author(s):  
Ryan Schubert ◽  
Angela Andaleon ◽  
Heather E. Wheeler

Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation that have been used in human complex trait genetics are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019). As users rather than developers, we sought to perform direct comparisons of accuracy, runtime, memory usage, and usability of these software tools to determine which is best for incorporation into association study pipelines. We find that in the majority of cases RFMix has the highest median accuracy with the ranking of the remaining software dependent on the ancestral architecture of the population tested. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software increase linearly with respect to sample size. The only exception is RFMix, which increases quadratically with respect to runtime and linearly with respect to memory. Effective local ancestry estimation tools are necessary to increase diversity and prevent population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform just as well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking.

2020 ◽  
Author(s):  
Ryan Schubert ◽  
Angela Andaleon ◽  
Heather E. Wheeler

Abstract Background: Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019), but direct comparisons of accuracy, runtime, and memory usage of all these software tools have not previously been reported across common patterns of human admixture. Results: We found that in cases of two-way admixture, RFMix and ELAI had the highest median accuracy depending on population structure, while in cases of three-way admixture, we found RFMix, MOSAIC, and LAMP-LD had the highest median accuracy. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software expand linearly with respect to sample size. The only exception is RFMix, which expands quadratically with respect to runtime and linearly with respect to memory. Conclusions: Effective local ancestry estimation tools are necessary to combat population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform similarly well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking .


2020 ◽  
Author(s):  
Arvind Kumar ◽  
Daniel Mas Montserrat ◽  
Carlos Bustamante ◽  
Alexander Ioannidis

AbstractGenomic medicine promises increased resolution for accurate diagnosis, for personalized treatment, and for identification of population-wide health burdens at rapidly decreasing cost (with a genotype now cheaper than an MRI and dropping). The benefits of this emerging form of affordable, data-driven medicine will accrue predominantly to those populations whose genetic associations have been mapped, so it is of increasing concern that over 80% of such genome-wide association studies (GWAS) have been conducted solely within individuals of European ancestry [1]. The severe under-representation of the majority of the world’s populations in genetic association studies stems in part from an addressable algorithmic weakness: lack of simple, accurate, and easily trained methods for identifying and annotating ancestry along the genome (local ancestry). Here we present such a method (XGMix) based on gradient boosted trees, which, while being accurate, is also simple to use, and fast to train, taking minutes on consumer-level laptops.


2019 ◽  
Author(s):  
Molly Schumer ◽  
Daniel L. Powell ◽  
Russ Corbett-Detig

AbstractIt is now clear that hybridization between species is much more common than previously recognized. As a result, we now know that the genomes of many modern species, including our own, are a patchwork of regions derived from past hybridization events. Increasingly researchers are interested in disentangling which regions of the genome originated from each parental species using local ancestry inference methods. Due to the diverse effects of admixture, this interest is shared across disparate fields, from human genetics to research in ecology and evolutionary biology. However, local ancestry inference methods are sensitive to a range of biological and technical parameters which can impact accuracy. Here we present paired simulation and ancestry inference pipelines, mixnmatch and ancestryinfer, to help researchers plan and execute local ancestry inference studies. mixnmatch can simulate arbitrarily complex demographic histories in the parental and hybrid populations, selection on hybrids, and technical variables such as coverage and contamination. ancestryinfer takes as input sequencing reads from simulated or real individuals, and implements an efficient local ancestry inference pipeline. We perform a series of simulations with mixnmatch to pinpoint factors that influence accuracy in local ancestry inference and highlight useful features of the two pipelines. Together, mixnmatch and ancestryinfer are powerful tools for predicting the performance of local ancestry inference methods on real data.


Pharmacology ◽  
2021 ◽  
pp. 1-9
Author(s):  
Vanessa Gonzalez-Covarrubias ◽  
Héctor Sánchez-Ibarra ◽  
Karla Lozano-Gonzalez ◽  
Sergio Villicaña ◽  
Tomas Texis ◽  
...  

<b><i>Introduction:</i></b> Genetic variants could aid in predicting antidiabetic drug response by associating them with markers of glucose control, such as glycated hemoglobin (HbA1c). However, pharmacogenetic implementation for antidiabetics is still under development, as the list of actionable markers is being populated and validated. This study explores potential associations between genetic variants and plasma levels of HbA1c in 100 patients under treatment with metformin. <b><i>Methods:</i></b> HbA1c was measured in a clinical chemistry analyzer (Roche), genotyping was performed in an Illumina-GSA array and data were analyzed using PLINK. Association and prediction models were developed using R and a 10-fold cross-validation approach. <b><i>Results:</i></b> We identified genetic variants on <i>SLC47A1, SLC28A1, ABCG2, TBC1D4,</i> and <i>ARID5B</i> that can explain up to 55% of the interindividual variability of HbA1c plasma levels in diabetic patients under treatment. Variants on <i>SLC47A1</i>, <i>SLC28A1</i>, and <i>ABCG2</i> likely impact the pharmacokinetics (PK) of metformin, while the role of the two latter can be related to insulin resistance and regulation of adipogenesis. <b><i>Conclusions:</i></b> Our results confirm previous genetic associations and point to previously unassociated gene variants for metformin PK and glucose control.


2019 ◽  
Vol 10 (2) ◽  
pp. 569-579
Author(s):  
Aurélien Cottin ◽  
Benjamin Penaud ◽  
Jean-Christophe Glaszmann ◽  
Nabila Yahiaoui ◽  
Mathieu Gautier

Hybridizations between species and subspecies represented major steps in the history of many crop species. Such events generally lead to genomes with mosaic patterns of chromosomal segments of various origins that may be assessed by local ancestry inference methods. However, these methods have mainly been developed in the context of human population genetics with implicit assumptions that may not always fit plant models. The purpose of this study was to evaluate the suitability of three state-of-the-art inference methods (SABER, ELAI and WINPOP) for local ancestry inference under scenarios that can be encountered in plant species. For this, we developed an R package to simulate genotyping data under such scenarios. The tested inference methods performed similarly well as far as representatives of source populations were available. As expected, the higher the level of differentiation between ancestral source populations and the lower the number of generations since admixture, the more accurate were the results. Interestingly, the accuracy of the methods was only marginally affected by i) the number of ancestries (up to six tested); ii) the sample design (i.e., unbalanced representation of source populations); and iii) the reproduction mode (e.g., selfing, vegetative propagation). If a source population was not represented in the data set, no bias was observed in inference accuracy for regions originating from represented sources and regions from the missing source were assigned differently depending on the methods. Overall, the selected ancestry inference methods may be used for crop plant analysis if all ancestral sources are known.


2019 ◽  
Vol 21 (5) ◽  
pp. 1837-1845
Author(s):  
Ephifania Geza ◽  
Nicola J Mulder ◽  
Emile R Chimusa ◽  
Gaston K Mazandu

Abstract Several thousand genomes have been completed with millions of variants identified in the human deoxyribonucleic acid sequences. These genomic variations, especially those introduced by admixture, significantly contribute to a remarkable phenotypic variability with medical and/or evolutionary implications. Elucidating local ancestry estimates is necessary for a better understanding of genomic variation patterns throughout modern human evolution and adaptive processes, and consequences in human heredity and health. However, existing local ancestry deconvolution tools are accessible as individual scripts, each requiring input and producing output in its own complex format. This limits the user’s ability to retrieve local ancestry estimates. We introduce a unified framework for multi-way local ancestry inference, FRANC, integrating eight existing state-of-the-art local ancestry deconvolution tools. FRANC is an adaptable, expandable and portable tool that manipulates tool-specific inputs, deconvolutes ancestry and standardizes tool-specific results. To facilitate both medical and population genetics studies, FRANC requires convenient and easy to manipulate input files and allows users to choose output formats to ease their use in further potential local ancestry deconvolution applications.


Genetics ◽  
2016 ◽  
Vol 202 (2) ◽  
pp. 377-379 ◽  
Author(s):  
Peter M. Visscher

Sign in / Sign up

Export Citation Format

Share Document