Circular Systematic Sampling with Varying Probabilities

1987 ◽  
Vol 36 (3-4) ◽  
pp. 193-196 ◽  
Author(s):  
Arijit Chaudhuri ◽  
Arun Kumar Adhikary

Certain conditions connecting the population size, sample size and the sampling interval in circular systematic sampling with equal probabilities are known. We present here a simple “condition” connecting the sample size, size-measures and the sampling interval in pps circular systematic sampling. The condition is important in noting limitations on sample-sizes when a sampling interval is pre-assigned.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9939
Author(s):  
Jessica F. McLaughlin ◽  
Kevin Winker

Sample size is a critical aspect of study design in population genomics research, yet few empirical studies have examined the impacts of small sample sizes. We used datasets from eight diverging bird lineages to make pairwise comparisons at different levels of taxonomic divergence (populations, subspecies, and species). Our data are from loci linked to ultraconserved elements and our analyses used one single nucleotide polymorphism per locus. All individuals were genotyped at all loci, effectively doubling sample size for coalescent analyses. We estimated population demographic parameters (effective population size, migration rate, and time since divergence) in a coalescent framework using Diffusion Approximation for Demographic Inference, an allele frequency spectrum method. Using divergence-with-gene-flow models optimized with full datasets, we subsampled at sequentially smaller sample sizes from full datasets of 6–8 diploid individuals per population (with both alleles called) down to 1:1, and then we compared estimates and their changes in accuracy. Accuracy was strongly affected by sample size, with considerable differences among estimated parameters and among lineages. Effective population size parameters (ν) tended to be underestimated at low sample sizes (fewer than three diploid individuals per population, or 6:6 haplotypes in coalescent terms). Migration (m) was fairly consistently estimated until <2 individuals per population, and no consistent trend of over-or underestimation was found in either time since divergence (T) or theta (Θ = 4Nrefμ). Lineages that were taxonomically recognized above the population level (subspecies and species pairs; that is, deeper divergences) tended to have lower variation in scaled root mean square error of parameter estimation at smaller sample sizes than population-level divergences, and many parameters were estimated accurately down to three diploid individuals per population. Shallower divergence levels (i.e., populations) often required at least five individuals per population for reliable demographic inferences using this approach. Although divergence levels might be unknown at the outset of study design, our results provide a framework for planning appropriate sampling and for interpreting results if smaller sample sizes must be used.


Author(s):  
SHASHIBHUSHAN B. MAHADIK

Variable sample size and sampling interval (VSSI) T2 charts are substantially more efficient than static T2 charts. However, the frequent switches between sample sizes and sampling interval lengths can be a complicating factor during the implementation of these charts. In this paper, runs rules are proposed for switching between sample sizes and sampling interval lengths of VSSI T2 charts in order to reduce the frequency of switches. The expressions for performance measures for the charts with these runs rules are developed. The effects of different runs rules on performances of the charts are evaluated numerically. In general, runs rules substantially reduce the frequency of switches. However, some runs rules significantly affect statistical performances of the charts.


Author(s):  
Jessica F. McLaughlin ◽  
Kevin Winker

AbstractSample size is a critical aspect of study design in population genomics research, yet few empirical studies have examined the impacts of small sample sizes. We used datasets from eight diverging bird lineages to make pairwise comparisons at different levels of taxonomic divergence (populations, subspecies, and species). Our data are from loci linked to ultraconserved elements (UCEs) and our analyses used one SNP per locus. All individuals were genotyped at all loci (McLaughlin et al. 2020). We estimated population demographic parameters (effective population size, migration rate, and time since divergence) in a coalescent framework using Diffusion Approximation for Demographic Inference (δaδi; Gutenkunst et al. 2009), an allele frequency spectrum (AFS) method. Using divergence-with-gene-flow models optimized with full datasets, we subsampled at sequentially smaller sample sizes from full datasets of 6 – 8 diploid individuals per population (with both alleles called) down to 1:1, and then we compared estimates and their changes in accuracy. Accuracy was strongly affected by sample size, with considerable differences among estimated parameters and among lineages. Effective population size parameters (ν) tended to be underestimated at low sample sizes (fewer than 3 diploid individuals per population, or 6:6 haplotypes in coalescent terms). Migration (m) was fairly consistently estimated until ≤ 2 individuals per population, and no consistent trend of over- or underestimation was found in either time since divergence (T) or Θ (4Nrefμ). Lineages that were taxonomically recognized above the population level (subspecies and species pairs; i.e., deeper divergences) tended to have lower variation in scaled root mean square error (SMRSE) of parameter estimation at smaller sample sizes than population-level divergences, and many parameters were estimated accurately down to 3 diploid individuals per population. Shallower divergence levels (i.e., populations) often required at least 5 individuals per population for reliable demographic inferences using this approach. Although divergence levels might be unknown at the outset of study design, our results provide a framework for planning appropriate sampling and for interpreting results if smaller sample sizes must be used.


2009 ◽  
Vol 39 (6) ◽  
pp. 1061-1068 ◽  
Author(s):  
Harry T. Valentine ◽  
David L.R. Affleck ◽  
Timothy G. Gregoire

Systematic sampling is easy, efficient, and widely used, though it is not generally recognized that a systematic sample may be drawn from the population of interest with or without restrictions on randomization. The restrictions or the lack of them determine which estimators are unbiased, when using the sampling design as the basis for inference. We describe the selection of a systematic sample, with and without restriction, from populations of discrete elements and from linear and areal continuums (continuous populations). We also provide unbiased estimators for both restricted and unrestricted selection. When the population size is known at the outset, systematic sampling with unrestricted selection is most likely the best choice. Restricted selection affords estimation of attribute totals for a population when the population size — for example, the area of an areal continuum — is unknown. Ratio estimation, however, is most likely a more precise option when the selection is restricted and the population size becomes known at the end of the sampling. There is no difference between restricted and unrestricted selection if the sampling interval or grid tessellates the frame in such a way that all samples contain an equal number of measurements. Moreover, all the estimators are unbiased and identical in this situation.


2010 ◽  
Vol 29 (1) ◽  
pp. 125-148 ◽  
Author(s):  
Lucas A. Hoogduin ◽  
Thomas W. Hall ◽  
Jeffrey J. Tsay

SUMMARY: Widely used probability-proportional-to-size (PPS) selection methods are not well adapted to circumstances requiring sample augmentation. Limitations include: (1) an inability to augment selections while maintaining PPS properties, (2) a failure to recognize changes in census stratum membership which result from sample augmentation, and (3) imprecise control over line item sample size. This paper presents a new method of PPS selection, a modified version of sieve sampling which overcomes these limitations. Simulations indicate the new method effectively maintains sampling stratum PPS properties in single- and multi-stage samples, appropriately recognizes changes in census stratum membership which result from sample augmentation, and provides precise control over line item sample sizes. In single-stage applications the method provides reliable control of sampling risk over varied tainting levels and error bunching patterns. Tightness and efficiency measures are comparable to randomized systematic sampling and superior to sieve sampling.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Georgia Kourlaba ◽  
Eleni Kourkouni ◽  
Stefania Maistreli ◽  
Christina-Grammatiki Tsopela ◽  
Nafsika-Maria Molocha ◽  
...  

Abstract Background Epidemiological data indicate that a large part of population needs to be vaccinated to achieve herd immunity. Hence, it is of high importance for public health officials to know whether people are going to get vaccinated for COVID-19. The objective of the present study was to examine the willingness of adult residents in Greece to receive a COVID-19 vaccine. Methods A cross-sectional was survey conducted among the adult general population of Greece between April 28, 2020 to May 03, 2020 (last week of lockdown), using a mixed methodology for data collection: Computer Assisted Telephone Interviewing (CATI) and Computer Assisted web Interviewing (CAWI). Using a sample size calculator, the target sample size was found to be around 1000 respondents. To ensure a nationally representative sample of the urban/rural population according to the Greek census 2011, a proportionate stratified by region systematic sampling procedure was used to recruit particpants. Data collection was guided through a structured questionnaire. Regarding willingness to COVID-19 vaccination, participants were asked to answer the following question: “If there was a vaccine available for the novel coronavirus, would you do it?” Results Of 1004 respondents only 57.7% stated that they are going to get vaccinated for COVID-19. Respondents aged > 65 years old, those who either themselves or a member of their household belonged to a vulnerable group, those believing that the COVID-19 virus was not developed in laboratories by humans, those believing that coronavirus is far more contagious and lethal compared to the H1N1 virus, and those believing that next waves are coming were statistically significantly more likely to be willing to get a COVID-19 vaccine. Higher knowledge score regarding symptoms, transmission routes and prevention and control measures against COVID-19 was significantly associated with higher willingness of respondents to get vaccinated. Conclusion A significant proportion of individuals in the general population are unwilling to receive a COVID-19 vaccine, stressing the need for public health officials to take immediate awareness-raising measures.


2013 ◽  
Vol 113 (1) ◽  
pp. 221-224 ◽  
Author(s):  
David R. Johnson ◽  
Lauren K. Bachan

In a recent article, Regan, Lakhanpal, and Anguiano (2012) highlighted the lack of evidence for different relationship outcomes between arranged and love-based marriages. Yet the sample size ( n = 58) used in the study is insufficient for making such inferences. This reply discusses and demonstrates how small sample sizes reduce the utility of this research.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Louis M. Houston

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.


Sign in / Sign up

Export Citation Format

Share Document