Sampling Theory
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 13)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780198815792, 9780191853463

2019 ◽  
pp. 92-103
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

Similar to strata, population units may instead be grouped into clusters. Usually, units within clusters are geographically or genetically close to one another—all households on a city block, individuals within a single family. In (single-stage) equal size cluster sampling, the total population consists of N clusters, with equal numbers of population units within each cluster. A sample of n clusters is selected by SRS, y values of all population units within clusters are measured, and an unbiased estimator of the population mean is the simple average of cluster means in the sample. An ANOVA sum of squares partition can be used to show that this strategy will outperform SRS with mean-per-unit estimation whenever the mean square between clusters is less than the finite population variance. This means that it is desirable for clusters to have similar means but a great deal of variability within clusters, a contrast with the desirable characteristics of strata (little variability within strata, substantial difference between stratum means). Many methods of collection of samples in fisheries (seines, nets) and wildlife (mist nets, live traps) involve collection of individuals as clusters. Unfortunately, clusters (e.g., human families) often consist of closely related or similar individuals. Because within cluster variation is often relatively low, it is often advantageous and cost-effective to instead adopt two-stage cluster sampling (considered in Chapter 9) for which only a sample of units within each selected cluster are examined, thereby allowing more clusters to be examined for the same total survey cost.


2019 ◽  
pp. 23-47
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

This chapter presents a formal quantitative treatment of material covered conceptually in Chapter 2, all with respect to equal probability with replacement (SWR) and without replacement selection simple random sampling, (SRS) of samples of size n from a finite population of size N. Small sample space examples are used to illustrate unbiasedness of mean-per-unit estimators of the mean, total and proportion of the target variable, y, for SWR and SRS. Explicit formulas for sampling variance indicate how estimator uncertainty depends on finite population variance, sample size and sampling fraction. Measures of the relative performance of alternative sampling strategies (relative precision, relative efficiency, net relative efficiency) are introduced and applied to mean-per-unit estimators used for the SWR and SRS selection methods. Normality of the sampling distribution of the SRS mean-per-unit estimator depends on sample size but also on the shape of the distribution of the target variable, y, values over the finite population units. Normality of the sampling distribution is required to justify construction of valid 95% confidence intervals that may be constructed around sample estimates based on unbiased estimates of sampling variance. Methods to calculate sample size to achieve accuracy objectives are presented. Additional topics include Bernoulli sampling (a without replacement selection scheme for which sample size is a random variable), the Rao–Blackwell theorem (which allows improvement of estimators that are based on selection methods which may result in repeated selection of the same units), oversampling and nonresponse.


2019 ◽  
pp. 219-239
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

The abundance of rare species of plants and animals may often prove difficult to estimate due to the isolated patchy distribution of individuals. Adaptive sampling may prove more effective than other sampling strategies for such species. In adaptive cluster sampling an initial SRS of population units is selected. Further adaptive sampling in the neighborhood of these units is then carried out whenever the value of y in a selected unit meets or exceeds a criterion value, c, which may often be just a single individual. This sampling procedure can be shown to lead to selection of clusters of units for which, with the exception of edge units, all units in the selected clusters have y≥c. If the initial sample is large enough to encounter some isolated patches of individuals, this approach may outperform SRS with mean-per-unit estimation. Drawbacks of this approach include the facts that the eventual number of population units which will need to be measured is random and unknown prior to execution of the survey, and it is difficult to specify the magnitude of the adaptive sampling criterion, c. Therefore, the total cost and time needed to complete an adaptive sampling survey can be highly unpredictable. Nevertheless, the theory is intriguing and has obvious intuitive appeal. Once a very rare individual has been encountered, it makes good sense to search very carefully in the neighborhood of the location where that rare individual has been found.


2019 ◽  
pp. 200-218
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

Attention is restricted to two-phase or double sampling. A large first-phase sample is used to generate a very good estimate of the mean or total of an auxiliary variable, x, which is relatively cheap to measure. Then, a second-phase sample is selected, usually from the first-phase sample, and both auxiliary and target variables are measured in selected second-phase population units. Two-phase ratio or regression estimators can be used effectively in this context. Errors of estimation reflect first-phase uncertainty in the mean or total of the auxiliary variable, and second-phase errors reflect the nature of the relation and correlation between auxiliary and target variables. Accuracy of the two-phase estimator of a proportion depends on sensitivity and specificity. Sensitivity is the probability that a unit possessing a trait (y = 1) will be correctly classified as such whenever the auxiliary variable, x, has value 1, whereas specificity is the probability that a unit not possessing a trait (y = 0) will be correctly classified as such whenever the auxiliary variable, x, has value 0. Optimal allocation results for estimation of means, totals, and proportions allow the most cost-effective allocation of total sampling effort to the first- and second-phases. In double sampling with stratification, a large first-phase sample estimates stratum weights, a second-phase sample estimates stratum means, and a stratified estimator gives an estimate of the overall population mean or total.


2019 ◽  
pp. 140-172
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

Equal probability selection is a special case of the general theory of probability sampling in which population units may be selected with unequal probabilities. Unequal selection probabilities are often based on auxiliary variable values which are measures of the sizes of population units, thus leading to the acronym (PPS)—“Probability Proportional to Size”. The Horvitz–Thompson (1953) theorem provides a unifying framework for design-based sampling theory. A sampling design specifies the sample space (set of all possible samples) and associated first and second order inclusion probabilities (probabilities that unit i, or units i and j, respectively, are included in a sample of size n selected from N according to some selection method). A valid probability sampling scheme must have all first order inclusion probabilities > 00 (i.e., every population unit must have a chance of being in the sample). Unbiased variance estimation is possible only for those schemes that guarantee that all second order inclusion probabilities exceed zero, thus providing theoretical justification for the absence of unbiased estimators of sampling variance in systematic sampling and other schemes for which some second order inclusion probabilities are zero. Numerous generalized Horvitz–Thompson (HT) estimators can be formed and all are consistent estimators because they are functions of consistent HT estimators. Unequal probability systematic sampling and Poisson sampling (the unequal probability counterpart to Bernoulli sampling for which sample size is a random variable) are also considered. Several R programs for selecting unequal probability samples and for calculating first and second order inclusion probabilities are posted at http://global.oup.com/uk/companion/hankin.


2019 ◽  
pp. 173-199
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

In multi-stage sampling, there are two or more stages of sampling and the simplest version, which the chapter emphasizes is called two-stage sampling. In two-stage sampling, an initial first-stage sample of n primary units (or clusters) is selected. Then, at the second stage of sampling, m i subunits are selected from the M i subunits in the selected primary units. First- and second-stage units may be selected with equal or unequal probabilities and a wide variety of estimators may be used to estimate totals within selected primary units and to estimate the total of the target variable in the finite population. Illustrative sample spaces are provided for equal sized two-stage cluster sampling with SRS selection at both stages, and for two-stage unequal size cluster sampling, with clusters selected by PPSWOR and units within clusters selected by SRS. Sampling variance is shown to originate from two sources: variation between primary unit totals or means (first-stage variance), and errors of estimation of primary units totals (second-stage variance). Topics of optimal allocation and net relative efficiency are addressed in the two-stage context with equal and unequal size clusters. General expressions for sampling variance are presented for three or more stages of sampling. The multi-stage framework can take powerful advantage of all of the concepts and sampling designs considered in previous chapters and the ecologist or natural resource scientist can apply everything he/she knows about an ecological or natural resource setting to guide development of an intelligent multi-stage sampling strategy.


2019 ◽  
pp. 268-294
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

Many ecological research and resource monitoring programs must deliver good estimates of both current resource status and long-term trend. The simple two-occasion context frames the trade-offs in design of surveys to achieve these objectives. If the objective is to estimate change in status (trend), then most precise estimation is achieved by full retention of a random sample selected at time 1. If the objective is to estimate average status, then most precise estimation is achieved by selecting independent random samples. If a survey has both objectives, then a compromise design, involving partial retention and partial replacement of the initial sample, is optimal (i.e., will have intermediate performance for status and trend). Sampling designs for long-term monitoring (and before/after assessment monitoring) have two distinct components: a membership design which specifies selection of groups of units to be designated as sample panels, and a revisit design that specifies when these panels of units should be visited (revisited). For example, some randomly selected panels might be visited in years one to three, then dropped out of rotation for three years and then revisited in years–seven to nine, and so on. One panel might be revisited every year, and other panels might be visited only a single time. Design-based estimates of measures of status and trend are derived for some simple membership and revisit designs. The theory of dual frame sampling is applied to estimation of the number of active bald eagle nests on a wildlife refuge.


2019 ◽  
pp. 240-268
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

In many ecological and natural resource settings, there may be a high degree of spatial structure or pattern to the distribution of target variable values across the landscape. For example, the number of trees per hectare killed by a bark beetle infestation may be exceptionally high in one region of a national forest and near zero elsewhere. In such circumstances it may be highly desirable or even required that a sample survey directed at estimation of total tree mortality across a forest be based on selection of random locations that have good spatial balance, i.e., locations are well spread over the landscape with relatively even distances between them. A simple random sample cannot guarantee good spatial balance. We present two methods that have been proposed for selection of spatially balanced samples: GRTS (Generalized Random Tessellation Stratified Sampling) and BAS (Balanced Acceptance Sampling). Selection of samples using the GRTS approach involves a complicated series of sequential steps that allows generation of spatially balanced samples selected from finite populations or from infinite study areas. Selection of samples using BAS relies on the Halton sequence, is conceptually simpler, and produces samples that generally have better spatial balance than those produced by GRTS. Both approaches rely on use of software that is available in the R statistical/programming environment. Estimation relies on the Horvitz–Thompson estimator. Illustrative examples of running the SPSURVEY software package (used for GRTS) and links to the SDraw package (used for BAS) are provided at http://global.oup.com/uk/companion/hankin.


2019 ◽  
pp. 104-139
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

Inexpensive and/or readily available auxiliary variable, x, values may often be available at little or no cost. If these variables are highly correlated with the target variable, y, then use of ratio or regression estimators may greatly reduce sampling variance. These estimators are not unbiased, but bias is generally small compared to the target of estimation and contributes a very small proportion of overall mean square error, the relevant measure of accuracy for biased estimators. Ratio estimation can also be incorporated in the context of stratified designs, again possibly offering a reduction in overall sampling variance. Model-based prediction offers an alternative to the design-based ratio and regression estimators and we present an overview of this approach. In model-based prediction, the y values associated with population units are viewed as realizations of random variables which are assumed to be related to auxiliary variables according to specified models. The realized values of the target variable are known for the sample, but must be predicted using an assumed model dependency on the auxiliary variable for the non-sampled units in the population. Insights from model-based thinking may assist the design-based sampling theorist in selection of an appropriate estimator. Similarly, we show that insights from design-based estimation may improve estimation of uncertainty in model-based mark-recapture estimation.


2019 ◽  
pp. 68-91
Author(s):  
David G. Hankin ◽  
Michael S. Mohr ◽  
Ken B. Newman

In stratified sampling, the N population units are grouped into L strata, independent samples are selected from within each stratum, and unbiased estimation is achieved as a weighted average of stratum-specific estimates. Strata may be natural—pool, riffle, and run habitat unit types in a small stream—or strata may be constructed to ensure that some units from specific groups of population units will always be included in the sample. Within strata, any unbiased method of selection can be used. If SRS is used within strata, this is a stratified SRS design. Allocation of the total stratified sample of size n across the L strata can affect sampling variance of stratified estimators. Optimal allocation theory shows that optimal stratum-specific sample sizes depend on relative numbers of units in strata, and stratum-specific costs per unit of sampling and variances of y values. An ANOVA sums of squares partition can be used to show that a proportionally allocated stratified SRS strategy will outperform selection of a single SRS with mean-per-unit estimation whenever the average variation within strata is less than the finite population variance. Therefore, it is desirable to minimize variation within strata and maximize the variation in stratum means. For a variety of reasons, post-stratification, in which one large SRS is stratified after the sample has been selected, may often be a good alternative to selection of a (pre-) stratified sample.


Sign in / Sign up

Export Citation Format

Share Document