unequal sampling
Recently Published Documents


TOTAL DOCUMENTS

13
(FIVE YEARS 4)

H-INDEX

4
(FIVE YEARS 0)

Author(s):  
D. Orynbassar ◽  
N. Madani

This work addresses the problem of geostatistical simulation of cross-correlated variables by factorization approaches in the case when the sampling pattern is unequal. A solution is presented, based on a Co-Gibbs sampler algorithm, by which the missing values can be imputed. In this algorithm, a heterotopic simple cokriging approach is introduced to take into account the cross-dependency of the undersampled variable with the secondary variable that is more available over the entire region. A real gold deposit is employed to test the algorithm. The imputation results are compared with other Gibbs sampler techniques for which simple cokriging and simple kriging are used. The results show that heterotopic simple cokriging outperforms the other two techniques. The imputed values are then employed for the purpose of resource estimation by using principal component analysis (PCA) as a factorization technique, and the output compared with traditional factorization approaches where the heterotopic part of the data is removed. Comparison of the results of these two techniques shows that the latter leads to substantial losses of important information in the case of an unequal sampling pattern, while the former is capable of reproducing better recovery functions.


2021 ◽  
Author(s):  
Melissa Middleton ◽  
Cattram Nguyen ◽  
Margarita Moreno-Betancur ◽  
John B Carlin ◽  
Katherine J Lee

Abstract Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data.Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study.Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study.Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Melissa Middleton ◽  
Margarita Moreno-Betancur ◽  
John Carlin ◽  
Katherine J Lee

Abstract Background Multiple imputation (MI) is commonly used to address missing data in epidemiological studies, but valid use requires compatibility between the imputation and analysis models. Case-cohort studies use unequal sampling probabilities for cases and controls which are often accounted for during analyses through inverse probability weighting (IPW). It is unclear how to apply MI for missing covariates while achieving compatibility in this setting. Methods A simulation study was conducted with missingness in two covariates, motivated by a case-cohort investigation within the Barwon Infant Study. MI methods considered involved including interactions between the outcome (as a proxy for weights) and analysis variables, stratification by weights, and ignoring weights, within the context of an IPW analysis. Factors such as the target estimand, proportion of incomplete observations, missing data mechanism and subcohort selection probabilities were varied to assess performance of MI methods. Results There was similar performance in terms of bias and efficiency across the MI methods, with expected improvements compared to IPW applied to the complete cases. Precision tended to decrease as the subcohort selection probability decreased. Similar results were observed irrespective of the proportion of incomplete cases. Conclusions Our results suggest that it makes little difference how weights are incorporated in the MI model in the analysis of case-cohort studies, potentially due to only two weight classes in this setting. Key messages If and how the weights are incorporated in the imputation model may have little impact in the analysis of case-cohort studies with incomplete covariates


Author(s):  
Thiago R D Carvalho ◽  
Leandro J C L Moraes ◽  
Albertina P Lima ◽  
Antoine Fouquet ◽  
Pedro L V Peloso ◽  
...  

Abstract A large proportion of the biodiversity of Amazonia, one of the most diverse rainforest areas in the world, is yet to be formally described. One such case is the Neotropical frog genus Adenomera. We here evaluate the species richness and historical biogeography of the Adenomera heyeri clade by integrating molecular phylogenetic and species delimitation analyses with morphological and acoustic data. Our results uncovered ten new candidate species with interfluve-associated distributions across Amazonia. In this study, six of these are formally named and described. The new species partly correspond to previously identified candidate lineages ‘sp. F’ and ‘sp. G’ and also to previously unreported lineages. Because of their rarity and unequal sampling effort of the A. heyeri clade across Amazonia, conservation assessments for the six newly described species are still premature. Regarding the biogeography of the A. heyeri clade, our data support a northern Amazonian origin with two independent dispersals into the South American Dry Diagonal. Although riverine barriers have a relevant role as environmental filters by isolating lineages in interfluves, dispersal rather than vicariance must have played a central role in the diversification of this frog clade.


2013 ◽  
Vol 12 (08) ◽  
pp. 1341014 ◽  
Author(s):  
ROBERT J. PETRELLA

Physics-based computational approaches to predicting the structure of macromolecules such as proteins are gaining increased use, but there are remaining challenges. In the current work, it is demonstrated that in energy-based prediction methods, the degree of optimization of the sampled structures can influence the prediction results. In particular, discrepancies in the degree of local sampling can bias the predictions in favor of the oversampled structures by shifting the local probability distributions of the minimum sampled energies. In simple systems, it is shown that the magnitude of the errors can be calculated from the energy surface, and for certain model systems, derived analytically. Further, it is shown that for energy wells whose forms differ only by a randomly assigned energy shift, the optimal accuracy of prediction is achieved when the sampling around each structure is equal. Energy correction terms can be used in cases of unequal sampling to reproduce the total probabilities that would occur under equal sampling, but optimal corrections only partially restore the prediction accuracy lost to unequal sampling. For multiwell systems, the determination of the correction terms is a multibody problem; it is shown that the involved cross-correlation multiple integrals can be reduced to simpler integrals. The possible implications of the current analysis for macromolecular structure prediction are discussed.


The proposed IT acquisition model builds on the predictive behavior of Tier-I influencers and suggests that Tier-II influencers need to collectively contribute to attain organizational synergy. The most critical aspect of collectiveness is heterogeneous organizational behavior across the hierarchy in the organization. It is believed that strategic, tactical, and operational layers in the organization have different tasks, motivations, roles, and responsibilities. However, collective orientation of this heterogeneity needs to be achieved for this model for IT acquisition for its holistic success. Therefore, the model considers it important to identify the controlling agency in the hierarchy so that controlled elements contribute effectively in the IT acquisition process. Identification of “controlling” and “controlled” elements for assessment of collective contributions of users, information systems, and information technologies in the IT acquisition process needs in-depth studies through an appropriate stratified and unequal sampling plan for the proposed model. This chapter discusses validation of Tier-II influencers with quantitative methods.


2003 ◽  
Vol 02 (02) ◽  
pp. 299-311 ◽  
Author(s):  
TIMOTHY H. LEE ◽  
MING ZHANG

A credit scoring model is a statistical model that uses empirical data to predict the creditworthiness of credit applicants. A simple but very powerful approach to developing a credit scoring model is to employ logistic regression. Due to the heterogeneity among the population, segmentation into reasonably homogeneous subpopulations is desirable to enhance model performances. However, one often needs to use unequal sampling ratios across the segments to extract the development sample. Hence, the models developed will be biased unevenly and needed to be adjusted to make score comparisons across different segments meaningful. In this paper, we focused on the topic of detection of uneven bias and its correction for segmented scoring models. A statistical test based on the large-sample theory is proposed for detecting the uneven bias along with its mathematical derivation and the simulation results of the test. When uneven bias over different segments has been detected, a formula to alleviate the effects of the uneven bias is suggested along with its heuristic derivation.


2002 ◽  
Vol 39 (3) ◽  
pp. 261-270 ◽  
Author(s):  
James G. Booth ◽  
Brian S. Caffo

Genetics ◽  
2002 ◽  
Vol 160 (3) ◽  
pp. 1179-1189 ◽  
Author(s):  
Molly Przeworski

Abstract In Drosophila and humans, there are accumulating examples of loci with a significant excess of high-frequency-derived alleles or high levels of linkage disequilibrium, relative to a neutral model of a random-mating population of constant size. These are features expected after a recent selective sweep. Their prevalence suggests that positive directional selection may be widespread in both species. However, as I show here, these features do not persist long after the sweep ends: The high-frequency alleles drift to fixation and no longer contribute to polymorphism, while linkage disequilibrium is broken down by recombination. As a result, loci chosen without independent evidence of recent selection are not expected to exhibit either of these features, even if they have been affected by numerous sweeps in their genealogical history. How then can we explain the patterns in the data? One possibility is population structure, with unequal sampling from different subpopulations. Alternatively, positive selection may not operate as is commonly modeled. In particular, the rate of fixation of advantageous mutations may have increased in the recent past.


1994 ◽  
Vol 31 (4) ◽  
pp. 940-948 ◽  
Author(s):  
Chris A. J. Klaassen

At which (random) sample size will every population element have been drawn at least m times? This special coupon collector's problem is often referred to as the Dixie cup problem. Some asymptotic properties of the Dixie cup problem with unequal sampling probabilities are described.


Sign in / Sign up

Export Citation Format

Share Document