Trees, Population Structure, F-statistics!
Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose are F-statistics, that measure shared genetic drift between sets of two, three and four populations, and can be used to test simple and complex hypotheses about admixture between populations. Here, we put these statistics in context of phylogenetic and population genetic theory. We show how measures of genetic drift can be interpreted as branch lengths, paths through an admixture graph or in terms of the internal branches in coalescent trees. We show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing us to generalize applications for arbitrary phylogenetic trees. Furthermore, we derive novel expressions for the F-statistics, which enables us to explore the behavior of F-statistic under population structure models. In particular, we show that population substructure may complicate inference.