Modelling complex population structure using F-statistics and Principal Component Analysis
Human genetic diversity is shaped by our complex history. Population genetic tools to understand this variation can broadly be classified into data-driven methods such as Principal Component Analysis (PCA), and model-based approaches such as F -statistics. Here, I show that these two perspectives are closely related, and I derive explicit connections between the two approaches. I show that F-statistics have a simple geometrical interpretation in the context of PCA, and that orthogonal projections are the key concept to establish this link. I illustrate my results on two examples, one of local, and one of global human diversity. In both examples, I find that population structure is sparse, and only a few components contribute to most statistics. Based on these results, I develop novel visualizations that allow for investigating specific hypotheses, checking the assumptions of more sophisticated models. My results extend F-statistics to non-discrete populations, moving towards more complete and less biased descriptions of human genetic variation.