AbstractPopulation stratification is a strong confounding factor in human genetic association studies. In analyses of rare variants, the main correction strategies based on principal components (PC) and linear mixed models (LMM), may yield conflicting conclusions, due to both the specific type of structure induced by rare variants and the particular statistical features of association tests. Studies evaluating these approaches generally focused on specific situations with limited types of simulated structure and large sample sizes. We investigated the properties of several correction methods in the context of a large simulation study using real exome data, and several within- and between- continent stratification scenarios. We also considered different sample sizes, with situations including as few as 50 cases, to account for the analysis of rare disorders. In this context, we focused on a genetic model with a phenotype driven by rare deleterious variants well suited for a burden test. For analyses of large samples, we found that accounting for stratification was more difficult with a continental structure than with a worldwide structure. LMM failed to maintain a correct type I error in many scenarios, whereas PCs based on common variants failed only in the presence of extreme continental stratification. When a sample of 50 cases was considered, an inflation of type I errors was observed with PC for small numbers of controls (≤100), and with LMM for large numbers of controls (≥1000). We also tested a promising novel adapted local permutation method (LocPerm), which maintained a correct type I error in all situations. All approaches capable of correcting for stratification properly had similar powers for detecting actual associations pointing out that the key issue is to properly control type I errors. Finally, we found that adding a large panel of external controls (e.g. extracted from publicly available databases) was an efficient way to increase the power of analyses including small numbers of cases, provided an appropriate stratification correction was used.Author SummaryGenetic association studies focusing on rare variants using next generation sequencing (NGS) data have become a common strategy to overcome the shortcomings of classical genome-wide association studies for the analysis of rare and common diseases. The issue of population stratification remains however a substantial question that has not been fully resolved when analyzing NGS data. In this work, we propose a comprehensive evaluation of the main strategies to account for stratification, that are principal components and linear mixed model, along with a novel approach based on local permutations (LocPerm). We compared these correction methods in many different settings, considering several types of population structures, sample sizes or types of variants. Our results highlighted important limitations of some classical methods as those using principal components (in particular in small samples) and linear mixed models (in several situations). In contrast, LocPerm maintained a correct type I error in all situations. Also, we showed that adding a large panel of external controls, e.g coming from publicly available databases, is an efficient strategy to increase the power of an analysis including a low number of cases, as long as an appropriate stratification correction is used. Our findings provide helpful guidelines for many researchers working on rare variant association studies.