scholarly journals Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links

2021 ◽  
Vol 37 (4) ◽  
pp. 865-905
Author(s):  
Martín Humberto Félix-Medina

Abstract We propose Horvitz-Thompson-like and Hájek-like estimators of the total and mean of a response variable associated with the elements of a hard-to-reach population, such as drug users and sex workers. A portion of the population is assumed to be covered by a frame of venues where the members of the population tend to gather. An initial cluster sample of elements is selected from the frame, where the clusters are the venues, and the elements in the sample are asked to name their contacts who belong to the population. The sample size is increased by including in the sample the named elements who are not in the initial sample. The proposed estimators do not use design-based inclusion probabilities, but model-based inclusion probabilities which are derived from a Rasch model and are estimated by maximum likelihood estimators. The inclusion probabilities are assumed to be heterogeneous, that is, they depend on the sampled people. Variance estimates are obtained by bootstrap and are used to construct confidence intervals. The performance of the proposed estimators and confidence intervals is evaluated by two numerical studies, one of them based on real data, and the results show that their performance is acceptable.

2014 ◽  
Vol 2 (2) ◽  
pp. 298-301 ◽  
Author(s):  
JACOB C. FISHER ◽  
M. GIOVANNA MERLI

Respondent-driven sampling (RDS) is an increasingly popular chain-referral sampling method. Although it has proved effective at generating samples of hard to reach populations—meaning populations for which sampling frames are not available because they are hidden or socially stigmatized like sex workers or injecting drug users—quickly and cost-effectively, the ease of collecting the sample comes with a cost: bias or inefficiency in the estimates of population parameters (Gile & Handcock, 2010; Goel & Salganik, 2010). One way that RDS can produce inefficient estimates is if one or more of the recruitment chains gets stuck among members of a cohesive subpopulation, preventing the RDS sampling process from exploring other areas of the network. If that happens, members of the population subgroup recruit one another repeatedly, leading to an increase in sample size without increasing the diversity of the sample. This type of stickiness is particularly likely when hidden populations are stratified, and the stratified groups are organized into venues that provide opportunities to recruit other members of the same stratum. Female sex workers (FSW) in China, who are stratified into tiers of sex work that are correlated with marital status, age, and risk behaviors, are a prime example (Merli et al., 2014; Yamanis et al., 2013). Chinese FSW recruit clients from venues such as karaoke bars, massage parlors, or street corners. At larger venues, sex workers who participate in an RDS study might recruit other members of the same venue into the study at a higher rate than expected, leading to inefficient estimates. In short, the chain could get stuck in a venue.


2020 ◽  
Author(s):  
Liu Chuchu ◽  
Cao Ziqiang ◽  
Lu Xin

Abstract Understanding the demographics of hidden population, such as men who have sex with men (MSM), sex workers, or injecting drug users, are of great importance for the adequately deployment of intervention strategies and public health decision making. However, due to the hard-to-access properties, e.g., lack of a sampling frame, sensitivity issue, reporting error, etc., traditional survey methods are largely limited when studying such populations. With data extracted from the very active online community of MSM in China, in this study we adopt and develop location inferring methods to achieve a high-resolution mapping of users in this community at national level. The performances of popular inference algorithms are compared to elucidate the most suitable approach. In addition, we propose a new hybrid model, which is proven to achieve the highest accuracy for inferring locations of online users only based on text content. This method is conducive to overcoming the sparse location labeling problem in user profiles, and can be extended to the inference of geo-statistics for other hidden populations.


2007 ◽  
Vol 59 (3-4) ◽  
pp. 223-238
Author(s):  
Amitava Saha

Abstract: The confidence intervals (CT’s) conventionally constructed in large‐scale sample surveys assuming asymptotic normality often leads to unsatisfactory results when the population under study is rare or clustered. Adaptive cluster sampling is a promising sampling technique to effectively catch rare, geographically clustered or localized population elements. Christman and Pontius (2000) applied several bootstrap techniques to construct confidence intervals when simple random samples are selected without replacement (SRSWOR) and adaptive cluster sampling is used to sample localized population units. Here we extend Sitter's (1992a, b) ‘mirror‐match’ (MM) bootstrap to a practical survey set‐up using varying selection probability. We also demonstrate using real data from the Indian Economic Census how the extended procedure can be applied to adaptive cluster sampling adopted for estimating simultaneously the numbers of carners engaged in a number of localized unorganized rural industries. AMS (2000) Subject Classification: 62D05.


Author(s):  
Roberto Benedetti ◽  
Maria Michela Dickson ◽  
Giuseppe Espa ◽  
Francesco Pantalone ◽  
Federica Piersimoni

AbstractBalanced sampling is a random method for sample selection, the use of which is preferable when auxiliary information is available for all units of a population. However, implementing balanced sampling can be a challenging task, and this is due in part to the computational efforts required and the necessity to respect balancing constraints and inclusion probabilities. In the present paper, a new algorithm for selecting balanced samples is proposed. This method is inspired by simulated annealing algorithms, as a balanced sample selection can be interpreted as an optimization problem. A set of simulation experiments and an example using real data shows the efficiency and the accuracy of the proposed algorithm.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 726
Author(s):  
Lamya A. Baharith ◽  
Wedad H. Aljuhani

This article presents a new method for generating distributions. This method combines two techniques—the transformed—transformer and alpha power transformation approaches—allowing for tremendous flexibility in the resulting distributions. The new approach is applied to introduce the alpha power Weibull—exponential distribution. The density of this distribution can take asymmetric and near-symmetric shapes. Various asymmetric shapes, such as decreasing, increasing, L-shaped, near-symmetrical, and right-skewed shapes, are observed for the related failure rate function, making it more tractable for many modeling applications. Some significant mathematical features of the suggested distribution are determined. Estimates of the unknown parameters of the proposed distribution are obtained using the maximum likelihood method. Furthermore, some numerical studies were carried out, in order to evaluate the estimation performance. Three practical datasets are considered to analyze the usefulness and flexibility of the introduced distribution. The proposed alpha power Weibull–exponential distribution can outperform other well-known distributions, showing its great adaptability in the context of real data analysis.


Animals ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 1445
Author(s):  
Mauro Giammarino ◽  
Silvana Mattiello ◽  
Monica Battini ◽  
Piero Quatto ◽  
Luca Maria Battaglini ◽  
...  

This study focuses on the problem of assessing inter-observer reliability (IOR) in the case of dichotomous categorical animal-based welfare indicators and the presence of two observers. Based on observations obtained from Animal Welfare Indicators (AWIN) project surveys conducted on nine dairy goat farms, and using udder asymmetry as an indicator, we compared the performance of the most popular agreement indexes available in the literature: Scott’s π, Cohen’s k, kPABAK, Holsti’s H, Krippendorff’s α, Hubert’s Γ, Janson and Vegelius’ J, Bangdiwala’s B, Andrés and Marzo’s ∆, and Gwet’s γ(AC1). Confidence intervals were calculated using closed formulas of variance estimates for π, k, kPABAK, H, α, Γ, J, ∆, and γ(AC1), while the bootstrap and exact bootstrap methods were used for all the indexes. All the indexes and closed formulas of variance estimates were calculated using Microsoft Excel. The bootstrap method was performed with R software, while the exact bootstrap method was performed with SAS software. k, π, and α exhibited a paradoxical behavior, showing unacceptably low values even in the presence of very high concordance rates. B and γ(AC1) showed values very close to the concordance rate, independently of its value. Both bootstrap and exact bootstrap methods turned out to be simpler compared to the implementation of closed variance formulas and provided effective confidence intervals for all the considered indexes. The best approach for measuring IOR in these cases is the use of B or γ(AC1), with bootstrap or exact bootstrap methods for confidence interval calculation.


2010 ◽  
Vol 26 (5) ◽  
pp. 605-608 ◽  
Author(s):  
Eric Sanders-Buell ◽  
Meera Bose ◽  
Abdul Nasir ◽  
Catherine S. Todd ◽  
M. Raza Stanekzai ◽  
...  

2017 ◽  
Vol 49 (4) ◽  
pp. 1067-1090 ◽  
Author(s):  
Nicolás García Trillos ◽  
Dejan Slepčev ◽  
James von Brecht

Abstract We investigate the estimation of the perimeter of a set by a graph cut of a random geometric graph. For Ω ⊆ D = (0, 1)d with d ≥ 2, we are given n random independent and identically distributed points on D whose membership in Ω is known. We consider the sample as a random geometric graph with connection distance ε > 0. We estimate the perimeter of Ω (relative to D) by the, appropriately rescaled, graph cut between the vertices in Ω and the vertices in D ∖ Ω. We obtain bias and variance estimates on the error, which are optimal in scaling with respect to n and ε. We consider two scaling regimes: the dense (when the average degree of the vertices goes to ∞) and the sparse one (when the degree goes to 0). In the dense regime, there is a crossover in the nature of the approximation at dimension d = 5: we show that in low dimensions d = 2, 3, 4 one can obtain confidence intervals for the approximation error, while in higher dimensions one can obtain only error estimates for testing the hypothesis that the perimeter is less than a given number.


Sign in / Sign up

Export Citation Format

Share Document