Modified Sieve Sampling: A Method for Single- and Multi-Stage Probability- Proportional-to-Size Sampling

2010 ◽  
Vol 29 (1) ◽  
pp. 125-148 ◽  
Author(s):  
Lucas A. Hoogduin ◽  
Thomas W. Hall ◽  
Jeffrey J. Tsay

SUMMARY: Widely used probability-proportional-to-size (PPS) selection methods are not well adapted to circumstances requiring sample augmentation. Limitations include: (1) an inability to augment selections while maintaining PPS properties, (2) a failure to recognize changes in census stratum membership which result from sample augmentation, and (3) imprecise control over line item sample size. This paper presents a new method of PPS selection, a modified version of sieve sampling which overcomes these limitations. Simulations indicate the new method effectively maintains sampling stratum PPS properties in single- and multi-stage samples, appropriately recognizes changes in census stratum membership which result from sample augmentation, and provides precise control over line item sample sizes. In single-stage applications the method provides reliable control of sampling risk over varied tainting levels and error bunching patterns. Tightness and efficiency measures are comparable to randomized systematic sampling and superior to sieve sampling.

1987 ◽  
Vol 36 (3-4) ◽  
pp. 193-196 ◽  
Author(s):  
Arijit Chaudhuri ◽  
Arun Kumar Adhikary

Certain conditions connecting the population size, sample size and the sampling interval in circular systematic sampling with equal probabilities are known. We present here a simple “condition” connecting the sample size, size-measures and the sampling interval in pps circular systematic sampling. The condition is important in noting limitations on sample-sizes when a sampling interval is pre-assigned.


Author(s):  
Lerato Moeti ◽  
Madira Litedu ◽  
Jacques Joubert

Abstract Background The aim of the study was to investigate the common deficiencies observed in the Finished Pharmaceutical Product (FPP) section of generic product applications submitted to SAHPRA. The study was conducted retrospectively over a 7-year period (2011–2017) for products that were finalised by the Pharmaceutical and Analytical pre-registration Unit. Methods There were 3148 finalised products in 2011–2017, 667 of which were sterile while 2089 were non-sterile. In order to attain a representative sample for the study, statistical sampling was conducted. Sample size was obtained using the statistical tables found in literature and confirmed by a sample size calculation with a 95% confidence level. The selection of the products was according to the therapeutic category using the multi-stage sampling method called stratified-systematic sampling. This resulted in the selection of 325 applications for non-sterile products and 244 applications for sterile products. Subsequently, all the deficiencies were collected and categorised according to Common Technical Document (CTD) subsections of the FPP section (3.2.P). Results A total of 3253 deficiencies were collected from 325 non-sterile applications while 2742 deficiencies were collected from 244 sterile applications. The most common deficiencies in the FPP section for non-sterile products were on the following sections: Specifications (15%), Description and Composition (14%), Description of the Manufacturing Process (13%), Stability Data (7.6%) and the Container Closure System (7.3%). The deficiencies applicable to the sterile products were quantified and the subsection, Validation and/or Evaluation (18%) has the most deficiencies. Comparison of the deficiencies with those reported by other agencies such as the USFDA, EMA, TFDA and WHOPQTm are discussed with similarities outlined. Conclusions The overall top five most common deficiencies observed by SAHPRA were extensively discussed for the generic products. The findings provide an overview on the submissions and regulatory considerations for generic applications in South Africa, which is useful for FPP manufacturers in the compilation of their dossiers and will assist in accelerating the registration process.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Georgia Kourlaba ◽  
Eleni Kourkouni ◽  
Stefania Maistreli ◽  
Christina-Grammatiki Tsopela ◽  
Nafsika-Maria Molocha ◽  
...  

Abstract Background Epidemiological data indicate that a large part of population needs to be vaccinated to achieve herd immunity. Hence, it is of high importance for public health officials to know whether people are going to get vaccinated for COVID-19. The objective of the present study was to examine the willingness of adult residents in Greece to receive a COVID-19 vaccine. Methods A cross-sectional was survey conducted among the adult general population of Greece between April 28, 2020 to May 03, 2020 (last week of lockdown), using a mixed methodology for data collection: Computer Assisted Telephone Interviewing (CATI) and Computer Assisted web Interviewing (CAWI). Using a sample size calculator, the target sample size was found to be around 1000 respondents. To ensure a nationally representative sample of the urban/rural population according to the Greek census 2011, a proportionate stratified by region systematic sampling procedure was used to recruit particpants. Data collection was guided through a structured questionnaire. Regarding willingness to COVID-19 vaccination, participants were asked to answer the following question: “If there was a vaccine available for the novel coronavirus, would you do it?” Results Of 1004 respondents only 57.7% stated that they are going to get vaccinated for COVID-19. Respondents aged > 65 years old, those who either themselves or a member of their household belonged to a vulnerable group, those believing that the COVID-19 virus was not developed in laboratories by humans, those believing that coronavirus is far more contagious and lethal compared to the H1N1 virus, and those believing that next waves are coming were statistically significantly more likely to be willing to get a COVID-19 vaccine. Higher knowledge score regarding symptoms, transmission routes and prevention and control measures against COVID-19 was significantly associated with higher willingness of respondents to get vaccinated. Conclusion A significant proportion of individuals in the general population are unwilling to receive a COVID-19 vaccine, stressing the need for public health officials to take immediate awareness-raising measures.


2013 ◽  
Vol 113 (1) ◽  
pp. 221-224 ◽  
Author(s):  
David R. Johnson ◽  
Lauren K. Bachan

In a recent article, Regan, Lakhanpal, and Anguiano (2012) highlighted the lack of evidence for different relationship outcomes between arranged and love-based marriages. Yet the sample size ( n = 58) used in the study is insufficient for making such inferences. This reply discusses and demonstrates how small sample sizes reduce the utility of this research.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Louis M. Houston

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.


2019 ◽  
Author(s):  
Peter E Clayson ◽  
Kaylie Amanda Carbine ◽  
Scott Baldwin ◽  
Michael J. Larson

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.


2019 ◽  
Author(s):  
Patrick Bergman ◽  
Maria Hagströmer

Abstract BACKGROUND Measuring physical activity and sedentary behavior accurately remains a challenge. When describing the uncertainty of mean values or when making group comparisons, minimising Standard Error of the Mean (SEM) is important. The sample size and the number of repeated observations within each subject influence the size of the SEM. In this study we have investigated how different combinations of sample sizes and repeated observations influence the magnitude of the SEM. METHODS A convenience sample were asked to wear an accelerometer for 28 consecutive days. Based on the within and between subject variances the SEM for the different combinations of sample sizes and number of monitored days was calculated. RESULTS Fifty subjects (67% women, mean±SD age 41±19 years) were included. The analyses showed, independent of which intensity level of physical activity or how measurement protocol was designed, that the largest reductions in SEM was seen as the sample size were increased. The same magnitude in reductions to SEM was not seen for increasing the number of repeated measurement days within each subject. CONCLUSION The most effective way of reducing the SEM is to have a large sample size rather than a long observation period within each individual. Even though the importance of reducing the SEM to increase the power of detecting differences between groups is well-known it is seldom considered when developing appropriate protocols for accelerometer based research. Therefore the results presented herein serves to highlight this fact and have the potential to stimulate debate and challenge current best practice recommendations of accelerometer based physical activity research.


2020 ◽  
Author(s):  
Miles D. Witham ◽  
James Wason ◽  
Richard M Dodds ◽  
Avan A Sayer

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.


Sign in / Sign up

Export Citation Format

Share Document