Abstract
Background: Taking a representative sample to determine prevalence of variables like disease is difficult when little is known about the target population. Several methods have been proposed, including a recent revision of the World Health Organization’s Extended Program on Immunization (EPI) surveys. The original method uses probability proportional to size to sample towns and a nearest neighbour approach to sampling households within towns. The new version samples from relatively small areas and conducts a probability sample of households within those areas. Other techniques sample within towns from circles around randomly identified points (‘Circles’) or from randomly sampled squares in a superimposed grid (‘Square’). We compared these sampling methods in multiple virtual populations using computer simulation.Methods: We constructed 50 virtual populations with varying characteristics. Populations comprised about a million people across 300 towns. We created three more populations with different prevalences of disease but with uniform characteristics across each population. We created a binary exposure variable and allocated disease statuses to individuals assuming different Relative Risks of exposure. We simulated thirteen methods of sampling: simple random sampling; the original EPI method and variants; the Square and Circle methods; and the new EPI method. For each population, each sampling method, and each of three sample sizes per cluster (7, 15, and 30), we simulated 1,000 samples. For most sampling methods, the clusters were towns. We conducted simulations using the same 30 clusters and using a freshly-chosen set of clusters. For each simulation we estimated prevalence and RRs and computed the Root Mean Squared Error for the 1,000 samples.Results: The Circle and Square methods produced almost identical results, so we report only the Square method results. The Root Mean Squared Error for the Square method was almost universally best relative to simple random sampling for estimating prevalence, and generally best when estimating Relative Risks. The revised EPI approach was less good, but generally better than the original EPI. Conclusions: The Square method is recommended as statistically optimal, unless practical considerations favour another approach.