Inference of past demography, dormancy and self-fertilization rates from whole genome sequence data
AbstractSeveral methods based on the Sequential Markovian Coalescent (SMC) have been developed to use full genome sequence data to uncover population demographic history, which is of interest in its own right and a key requirement to generate a null model for selection tests. While these methods can be applied to all possible species, the underlying assumptions are sexual reproduction at each generation and no overlap of generations. However, in many plant, invertebrate, fungi and other species, those assumptions are often violated due to different ecological and life history traits, such as self-fertilization or long term dormant structures (seed or egg-banking). We develop a novel SMC-based method to infer 1) the rates of seed/egg-bank and of self-fertilization, and 2) the populations’ past demographic history. Using simulated data sets, we demonstrate the accuracy of our method for a wide range of demographic scenarios and for sequence lengths from one to 30 Mb using four sampled genomes. Finally, we apply our method to a Swedish and a German population of Arabidopsis thaliana demonstrating a selfing rate of ca. 0.8 and the absence of any detectable seed-bank. In contrast, we show that the water flea Daphnia pulex exhibits a long lived egg-bank of three to 18 generations. In conclusion, we here present a novel method to infer accurate demographies and life-history traits for species with selfing and/or seed/egg-banks. Finally, we provide recommendations on the use of SMC-based methods for non-model organisms, highlighting the importance of the per site and the effective ratios of recombination over mutation.