Abstract
Background: Local ancestry estimation infers the regional ancestral origin of chromosomal segments in admixed populations using reference populations and a variety of statistical models. Integrating local ancestry into complex trait genetics has the potential to increase detection of genetic associations and improve genetic prediction models in understudied admixed populations, including African Americans and Hispanics. Five methods for local ancestry estimation are LAMP-LD (2012), RFMix (2013), ELAI (2014), Loter (2018), and MOSAIC (2019), but direct comparisons of accuracy, runtime, and memory usage of all these software tools have not previously been reported across common patterns of human admixture. Results: We found that in cases of two-way admixture, RFMix and ELAI had the highest median accuracy depending on population structure, while in cases of three-way admixture, we found RFMix, MOSAIC, and LAMP-LD had the highest median accuracy. Additionally, we estimate the O(n) of both memory and runtime for each software and find that for both time and memory most software expand linearly with respect to sample size. The only exception is RFMix, which expands quadratically with respect to runtime and linearly with respect to memory. Conclusions: Effective local ancestry estimation tools are necessary to combat population disparities in human genetics studies. RFMix performs the best across methods, however, depending on application, other methods perform similarly well with the benefit of shorter runtimes. Scripts used to format data, run software, and estimate accuracy can be found at https://github.com/WheelerLab/LAI_benchmarking .