Modified Sieve Sampling: A Method for Single- and Multi-Stage Probability- Proportional-to-Size Sampling

SUMMARY: Widely used probability-proportional-to-size (PPS) selection methods are not well adapted to circumstances requiring sample augmentation. Limitations include: (1) an inability to augment selections while maintaining PPS properties, (2) a failure to recognize changes in census stratum membership which result from sample augmentation, and (3) imprecise control over line item sample size. This paper presents a new method of PPS selection, a modified version of sieve sampling which overcomes these limitations. Simulations indicate the new method effectively maintains sampling stratum PPS properties in single- and multi-stage samples, appropriately recognizes changes in census stratum membership which result from sample augmentation, and provides precise control over line item sample sizes. In single-stage applications the method provides reliable control of sampling risk over varied tainting levels and error bunching patterns. Tightness and efficiency measures are comparable to randomized systematic sampling and superior to sieve sampling.

Download Full-text

Circular Systematic Sampling with Varying Probabilities

Calcutta Statistical Association Bulletin ◽

10.1177/0008068319870310 ◽

1987 ◽

Vol 36 (3-4) ◽

pp. 193-196 ◽

Cited By ~ 1

Author(s):

Arijit Chaudhuri ◽

Arun Kumar Adhikary

Keyword(s):

Sample Size ◽

Population Size ◽

Sampling Interval ◽

Systematic Sampling ◽

Simple Condition ◽

Sample Sizes

Certain conditions connecting the population size, sample size and the sampling interval in circular systematic sampling with equal probabilities are known. We present here a simple “condition” connecting the sample size, size-measures and the sampling interval in pps circular systematic sampling. The condition is important in noting limitations on sample-sizes when a sampling interval is pre-assigned.

Download Full-text

Common deficiencies found in generic Finished Pharmaceutical Product (FPP) applications submitted for registration to the South African Health Products Regulatory Authority (SAHPRA)

Journal of Pharmaceutical Policy and Practice ◽

10.1186/s40545-021-00398-5 ◽

2022 ◽

Vol 15 (1) ◽

Author(s):

Lerato Moeti ◽

Madira Litedu ◽

Jacques Joubert

Keyword(s):

Sample Size ◽

Sample Size Calculation ◽

Pharmaceutical Product ◽

Systematic Sampling ◽

Multi Stage ◽

African Health ◽

Therapeutic Category ◽

Health Products ◽

Container Closure ◽

Selection Of

Abstract Background The aim of the study was to investigate the common deficiencies observed in the Finished Pharmaceutical Product (FPP) section of generic product applications submitted to SAHPRA. The study was conducted retrospectively over a 7-year period (2011–2017) for products that were finalised by the Pharmaceutical and Analytical pre-registration Unit. Methods There were 3148 finalised products in 2011–2017, 667 of which were sterile while 2089 were non-sterile. In order to attain a representative sample for the study, statistical sampling was conducted. Sample size was obtained using the statistical tables found in literature and confirmed by a sample size calculation with a 95% confidence level. The selection of the products was according to the therapeutic category using the multi-stage sampling method called stratified-systematic sampling. This resulted in the selection of 325 applications for non-sterile products and 244 applications for sterile products. Subsequently, all the deficiencies were collected and categorised according to Common Technical Document (CTD) subsections of the FPP section (3.2.P). Results A total of 3253 deficiencies were collected from 325 non-sterile applications while 2742 deficiencies were collected from 244 sterile applications. The most common deficiencies in the FPP section for non-sterile products were on the following sections: Specifications (15%), Description and Composition (14%), Description of the Manufacturing Process (13%), Stability Data (7.6%) and the Container Closure System (7.3%). The deficiencies applicable to the sterile products were quantified and the subsection, Validation and/or Evaluation (18%) has the most deficiencies. Comparison of the deficiencies with those reported by other agencies such as the USFDA, EMA, TFDA and WHOPQTm are discussed with similarities outlined. Conclusions The overall top five most common deficiencies observed by SAHPRA were extensively discussed for the generic products. The findings provide an overview on the submissions and regulatory considerations for generic applications in South Africa, which is useful for FPP manufacturers in the compilation of their dossiers and will assist in accelerating the registration process.

Download Full-text

Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

Remote Sensing ◽

10.3390/rs13030368 ◽

2021 ◽

Vol 13 (3) ◽

pp. 368

Author(s):

Christopher A. Ramezan ◽

Timothy A. Warner ◽

Aaron E. Maxwell ◽

Bradley S. Price

Keyword(s):

Machine Learning ◽

Sample Size ◽

Remotely Sensed ◽

Training Data ◽

Supervised Machine Learning ◽

Sample Sizes ◽

Remotely Sensed Data ◽

Large Area ◽

Training Set ◽

Set Size

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.

Download Full-text

An Experimental Analysis of the Use of Echelon Holding Costs and Single Stage Lot‐Sizing Procedures in Multi‐Stage Production/Inventory Systems

International Journal of Operations & Production Management ◽

10.1108/eb054680 ◽

1981 ◽

Vol 2 (2) ◽

pp. 42-54 ◽

Cited By ~ 3

Author(s):

Urban Wemmerlöv

Keyword(s):

Experimental Analysis ◽

Lot Sizing ◽

Single Stage ◽

Inventory Systems ◽

Holding Costs ◽

Multi Stage ◽

Production Inventory

Download Full-text

Willingness of Greek general population to get a COVID-19 vaccine

Global Health Research and Policy ◽

10.1186/s41256-021-00188-1 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Georgia Kourlaba ◽

Eleni Kourkouni ◽

Stefania Maistreli ◽

Christina-Grammatiki Tsopela ◽

Nafsika-Maria Molocha ◽

...

Keyword(s):

Public Health ◽

General Population ◽

Data Collection ◽

Sample Size ◽

Significant Proportion ◽

Control Measures ◽

Systematic Sampling ◽

Computer Assisted ◽

Public Health Officials ◽

To Receive

Abstract Background Epidemiological data indicate that a large part of population needs to be vaccinated to achieve herd immunity. Hence, it is of high importance for public health officials to know whether people are going to get vaccinated for COVID-19. The objective of the present study was to examine the willingness of adult residents in Greece to receive a COVID-19 vaccine. Methods A cross-sectional was survey conducted among the adult general population of Greece between April 28, 2020 to May 03, 2020 (last week of lockdown), using a mixed methodology for data collection: Computer Assisted Telephone Interviewing (CATI) and Computer Assisted web Interviewing (CAWI). Using a sample size calculator, the target sample size was found to be around 1000 respondents. To ensure a nationally representative sample of the urban/rural population according to the Greek census 2011, a proportionate stratified by region systematic sampling procedure was used to recruit particpants. Data collection was guided through a structured questionnaire. Regarding willingness to COVID-19 vaccination, participants were asked to answer the following question: “If there was a vaccine available for the novel coronavirus, would you do it?” Results Of 1004 respondents only 57.7% stated that they are going to get vaccinated for COVID-19. Respondents aged > 65 years old, those who either themselves or a member of their household belonged to a vulnerable group, those believing that the COVID-19 virus was not developed in laboratories by humans, those believing that coronavirus is far more contagious and lethal compared to the H1N1 virus, and those believing that next waves are coming were statistically significantly more likely to be willing to get a COVID-19 vaccine. Higher knowledge score regarding symptoms, transmission routes and prevention and control measures against COVID-19 was significantly associated with higher willingness of respondents to get vaccinated. Conclusion A significant proportion of individuals in the general population are unwilling to receive a COVID-19 vaccine, stressing the need for public health officials to take immediate awareness-raising measures.

Download Full-text

What can we Learn from Studies Based on Small Sample Sizes? Comment on Regan, Lakhanpal, and Anguiano (2012)

Psychological Reports ◽

10.2466/21.02.07.pr0.113x12z8 ◽

2013 ◽

Vol 113 (1) ◽

pp. 221-224 ◽

Cited By ~ 3

Author(s):

David R. Johnson ◽

Lauren K. Bachan

Keyword(s):

Sample Size ◽

The Probability That a Measurement Falls within a Range of Standard Deviations from an Estimate of the Mean

ISRN Applied Mathematics ◽

10.5402/2012/710806 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Louis M. Houston

Keyword(s):

Confidence Interval ◽

Sample Size ◽

General Equation ◽

Sample Sizes ◽

The Mean ◽

Standard Deviations ◽

Intermediate Value ◽

Theoretical Results

We derive a general equation for the probability that a measurement falls within a range of n standard deviations from an estimate of the mean. So, we provide a format that is compatible with a confidence interval centered about the mean that is naturally independent of the sample size. The equation is derived by interpolating theoretical results for extreme sample sizes. The intermediate value of the equation is confirmed with a computational test.

Download Full-text

Methodological Reporting Behavior, Sample Sizes, and Statistical Power in Studies of Event- Related Potentials: Barriers to Reproducibility and Replicability

10.31234/osf.io/kgv9z ◽

2019 ◽

Author(s):

Peter E Clayson ◽

Kaylie Amanda Carbine ◽

Scott Baldwin ◽

Michael J. Larson

Keyword(s):

Sample Size ◽

Statistical Power ◽

Event Related Potentials ◽

Reporting Guidelines ◽

Medium Effect ◽

Sample Sizes ◽

Reporting Behavior ◽

Average Sample Size ◽

Related Potentials ◽

Average Sample

Methodological reporting guidelines for studies of event-related potentials (ERPs) were updated in Psychophysiology in 2014. These guidelines facilitate the communication of key methodological parameters (e.g., preprocessing steps). Failing to report key parameters represents a barrier to replication efforts, and difficultly with replicability increases in the presence of small sample sizes and low statistical power. We assessed whether guidelines are followed and estimated the average sample size and power in recent research. Reporting behavior, sample sizes, and statistical designs were coded for 150 randomly-sampled articles from five high-impact journals that frequently publish ERP research from 2011 to 2017. An average of 63% of guidelines were reported, and reporting behavior was similar across journals, suggesting that gaps in reporting is a shortcoming of the field rather than any specific journal. Publication of the guidelines paper had no impact on reporting behavior, suggesting that editors and peer reviewers are not enforcing these recommendations. The average sample size per group was 21. Statistical power was conservatively estimated as .72-.98 for a large effect size, .35-.73 for a medium effect, and .10-.18 for a small effect. These findings indicate that failing to report key guidelines is ubiquitous and that ERP studies are primarily powered to detect large effects. Such low power and insufficient following of reporting guidelines represent substantial barriers to replication efforts. The methodological transparency and replicability of studies can be improved by the open sharing of processing code and experimental tasks and by a priori sample size calculations to ensure adequately powered studies.

Download Full-text

No one accelerometer-based physical activity data collection protocol can fit all research questions

10.21203/rs.2.11020/v2 ◽

2019 ◽

Author(s):

Patrick Bergman ◽

Maria Hagströmer

Keyword(s):

Physical Activity ◽

Sample Size ◽

Large Sample Size ◽

Intensity Level ◽

Sample Sizes ◽

Activity Data ◽

Convenience Sample ◽

Mean Values ◽

Repeated Observations ◽

Measurement Protocol

Abstract BACKGROUND Measuring physical activity and sedentary behavior accurately remains a challenge. When describing the uncertainty of mean values or when making group comparisons, minimising Standard Error of the Mean (SEM) is important. The sample size and the number of repeated observations within each subject influence the size of the SEM. In this study we have investigated how different combinations of sample sizes and repeated observations influence the magnitude of the SEM. METHODS A convenience sample were asked to wear an accelerometer for 28 consecutive days. Based on the within and between subject variances the SEM for the different combinations of sample sizes and number of monitored days was calculated. RESULTS Fifty subjects (67% women, mean±SD age 41±19 years) were included. The analyses showed, independent of which intensity level of physical activity or how measurement protocol was designed, that the largest reductions in SEM was seen as the sample size were increased. The same magnitude in reductions to SEM was not seen for increasing the number of repeated measurement days within each subject. CONCLUSION The most effective way of reducing the SEM is to have a large sample size rather than a long observation period within each individual. Even though the importance of reducing the SEM to increase the power of detecting differences between groups is well-known it is seldom considered when developing appropriate protocols for accelerometer based research. Therefore the results presented herein serves to highlight this fact and have the potential to stimulate debate and challenge current best practice recommendations of accelerometer based physical activity research.

Download Full-text

Developing a composite outcome measure for frailty prevention trials – rationale, derivation and sample size comparison with other candidate measures

10.21203/rs.2.13602/v2 ◽

2020 ◽

Author(s):

Miles D. Witham ◽

James Wason ◽

Richard M Dodds ◽

Avan A Sayer

Keyword(s):

Sample Size ◽

Transition Probabilities ◽

Adverse Outcomes ◽

Transition Rates ◽

Composite Outcome ◽

Composite Measure ◽

Sample Sizes ◽

Loss To Follow Up ◽

Prevention Trials

Abstract Introduction Frailty is the loss of ability to withstand a physiological stressor, and is associated with multiple adverse outcomes in older people. Trials to prevent or ameliorate frailty are in their infancy. A range of different outcome measures have been proposed, but current measures require either large sample sizes, long follow-up, or do not directly measure the construct of frailty. Methods We propose a composite outcome for frailty prevention trials, comprising progression to the frail state, death, or being too unwell to continue in a trial. To determine likely event rates, we used data from the English Longitudinal Study for Ageing, collected 4 years apart. We calculated transition rates between non-frail, prefrail, frail or loss to follow up due to death or illness. We used Markov state transition models to interpolate one- and two-year transition rates, and performed sample size calculations for a range of differences in transition rates using simple and composite outcomes. Results The frailty category was calculable for 4650 individuals at baseline (2226 non-frail, 1907 prefrail, 517 frail); at follow up, 1282 were non-frail, 1108 were prefrail, 318 were frail and 1936 had dropped out or were unable to complete all tests for frailty. Transition probabilities for those prefrail at baseline, measured at wave 4 were respectively 0.176, 0.286, 0.096 and 0.442 to non-frail, prefrail, frail and dead/dropped out. Interpolated transition probabilities were 0.159, 0.494, 0.113 and 0.234 at two years, and 0.108, 0.688, 0.087 and 0.117 at one year. Required sample sizes for a two-year outcome were between 1000 and 7200 for transition from prefrailty to frailty alone, 250 to 1600 for transition to the composite measure, and 75 to 350 using the composite measure with an ordinal logistic regression approach. Conclusion Use of a composite outcome for frailty trials offers reduced sample sizes and could ameliorate the effect of high loss to follow up inherent in such trials due to death and illness.

Download Full-text