Towards an Information-Theoretic Approach to Population Structure
This paper uses an information-theoretic perspective to propose multi-locus informativeness measures for ancestry inference. These measures describe the potential for correct classification of unknown individuals to their source populations, given genetic data on population structure. Motivated by Shannon's axiomatic approach in deriving a unique information measure for communication (Shannon 1948), we first identify a set of intuitively justifiable criteria that any such quantitative information measure should satisfy, and then select measures that comply with these criteria. It is shown that standard information-theoretic measures such as multidimensional mutual information cannot completely account for informativeness when source populations differ in size, necessitating a decision-theoretic approach.