scholarly journals MISSING VALUE IMPUTATION VIA GRAPH COMPLETION IN QUESTIONNAIRE SCORES FROM PERSONS WITH DEMENTIA

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S972-S972
Author(s):  
Chen Kan ◽  
Won Hwa Kim ◽  
Ling Xu ◽  
Noelle L Fields

Abstract Background: Questionnaires are widely used to evaluate cognitive functions, depression, and loneliness of persons with dementia (PWDs). Successful assessment and treatment of dementia hinge on effective analysis of PWDs’ answers. However, many studies, especially pilot ones, are with small sample sizes. Further, most of them contain missing data as PWDs skip some study sessions due to their clinical conditions. Conventional imputation strategies are not well-suited as bias will be introduced because of insufficient samples. Method: A novel machine learning framework was developed based on harmonic analysis on graphs to robustly handle missing values. Participants were first embedded as nodes in the graph with edges derived by their similarities based on demographic information, activities of daily living, etc. Then, questionnaire scores with missing values were regarded as a function on the nodes, and they were estimated based on spectral analysis of the graph with a smoothness constraint. The proposed approach was evaluated using data from our pilot study of dementia subjects (N=15) with 15% data missing. Result: A few complete variables (binary or ordinal) were available for all participants. For each variable, we randomly removed 5 scores to mimic missing values. With our approach, we could recover all missing values with 90% accuracy on average. We were also able to impute the actual missing values in the dataset within reasonable ranges. Conclusion: Our proposed approach imputes missing values with high accuracy despite the small sample size. The proposed approach will significantly boost statistical power of various small-scale studies with missing data.

2019 ◽  
Vol 374 (1780) ◽  
pp. 20180076 ◽  
Author(s):  
Monique Borgerhoff Mulder ◽  
Mary C. Towner ◽  
Ryan Baldini ◽  
Bret A. Beheim ◽  
Samuel Bowles ◽  
...  

Persistent interest lies in gender inequality, especially with regard to the favouring of sons over daughters. Economists are concerned with how privilege is transmitted across generations, and anthropologists have long studied sex-biased inheritance norms. There has, however, been no focused cross-cultural investigation of how parent–offspring correlations in wealth vary by offspring sex. We estimate these correlations for 38 wealth measures, including somatic and relational wealth, from 15 populations ranging from hunter–gatherers to small-scale farmers. Although small sample sizes limit our statistical power, we find no evidence of ubiquitous male bias, at least as inferred from comparing parent–son and parent–daughter correlations. Rather we find wide variation in signatures of sex bias, with evidence of both son and daughter-biased transmission. Further, we introduce a model that helps pinpoint the conditions under which simple mid-point parent–offspring wealth correlations can reveal information about sex-biased parental investment. Our findings are relevant to the study of female-biased kinship by revealing just how little normative descriptors of kinship systems, such as patrilineal inheritance, capture intergenerational correlations in wealth, and how variable parent–son and parent–daughter correlations can be. This article is part of the theme issue ‘The evolution of female-biased kinship in humans and other mammals'.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.


1980 ◽  
Vol 95 (1) ◽  
pp. 29-34 ◽  
Author(s):  
J. A. Blackman ◽  
A. A. Gill

SummaryTwenty-five winter wheat varieties and breeders' lines including hard and soft texture, good or poor bread and biscuit-making types were grown at two locations in the U.K. in 1977 to provide the test samples. Small-scale tests of bread-making quality including extensometer, sodium dodecyl sulphate (SDS) sedimentation volume, residue protein, urea dispersible protein and Pelshenke tests, were compared with loaf volumes and loaf scores.Averaged over the two sites, a modified extensometer test and the SDS test gave the closest correlation with loaf volume and loaf score and were only poorly correlated with Hagberg Falling Number and percentage protein. The SDS test gave the closest correlation between sites followed by the extensometer readings; loaf volume and score had much lower values. The SDS values and extensometer readings give a better measure of the genetic differences in protein quality of varieties than loaf volume and score, being less affected by growing conditions. With its small sample size and high throughput, the SDS sedimentation volume is likely to be the most useful screening test for wheat breeding programmes.


1990 ◽  
Vol 47 (1) ◽  
pp. 2-15 ◽  
Author(s):  
Randall M. Peterman

Ninety-eight percent of recently surveyed papers in fisheries and aquatic sciences that did not reject some null hypothesis (H0) failed to report β, the probability of making a type II error (not rejecting H0 when it should have been), or statistical power (1 – β). However, 52% of those papers drew conclusions as if H0 were true. A false H0 could have been missed because of a low-power experiment, caused by small sample size or large sampling variability. Costs of type II errors can be large (for example, for cases that fail to detect harmful effects of some industrial effluent or a significant effect of fishing on stock depletion). Past statistical power analyses show that abundance estimation techniques usually have high β and that only large effects are detectable. I review relationships among β, power, detectable effect size, sample size, and sampling variability. I show how statistical power analysis can help interpret past results and improve designs of future experiments, impact assessments, and management regulations. I make recommendations for researchers and decision makers, including routine application of power analysis, more cautious management, and reversal of the burden of proof to put it on industry, not management agencies.


PEDIATRICS ◽  
1993 ◽  
Vol 92 (2) ◽  
pp. 300-301
Author(s):  
DOREN FREDRICKSON

To the Editor.— I wish to comment on the study reported by Cronenwett et al,1 which was a fascinating prospective study among married white women who planned to breast-feed. Women were randomly selected to perform either exdusive breast-feeding or partial breast-feeding with bottled human milk supplements to determine the impact of infant temperament and limited bottle-feeding on breast-feeding duration. The authors admit that small sample size and lack of statistical power make a false-negative possible.


2014 ◽  
Vol 26 (2) ◽  
pp. 598-614 ◽  
Author(s):  
Julia Poirier ◽  
GY Zou ◽  
John Koval

Cluster randomization trials, in which intact social units are randomized to different interventions, have become popular in the last 25 years. Outcomes from these trials in many cases are positively skewed, following approximately lognormal distributions. When inference is focused on the difference between treatment arm arithmetic means, existent confidence interval procedures either make restricting assumptions or are complex to implement. We approach this problem by assuming log-transformed outcomes from each treatment arm follow a one-way random effects model. The treatment arm means are functions of multiple parameters for which separate confidence intervals are readily available, suggesting that the method of variance estimates recovery may be applied to obtain closed-form confidence intervals. A simulation study showed that this simple approach performs well in small sample sizes in terms of empirical coverage, relatively balanced tail errors, and interval widths as compared to existing methods. The methods are illustrated using data arising from a cluster randomization trial investigating a critical pathway for the treatment of community acquired pneumonia.


Author(s):  
Marina Jankovic ◽  
Marija Milicic ◽  
Dimitrije Radisic ◽  
Dubravka Milic ◽  
Ante Vujic

With environmental pressures on the rise, the establishment of pro?tected areas is a key strategy for preserving biodiversity. The fact that many species are losing their battle against extinction despite being within protected areas raises the question of their effectiveness. The aim of this study was to evaluate established Priority Hoverfly Areas (PHAs) and areas that are not yet but could potentially be included in the PHA network, using data from new field surveys. Additionally, species distribution models have been created for two new species recognized as important and added to the list of key hoverfly species. Maps of potential distribution of these species were superimposed on maps of protected areas and PHAs to quantify percentages of overlap. The results of this study are not statisti?cally significant, which could be influenced by a small sample size. However, the results of species distribution models and the extent of overlap with PHAs confirm the utility of these expert-generated designations.


2020 ◽  
Author(s):  
Chia-Lung Shih ◽  
Te-Yu Hung

Abstract Background A small sample size (n < 30 for each treatment group) is usually enrolled to investigate the differences in efficacy between treatments for knee osteoarthritis (OA). The objective of this study was to use simulation for comparing the power of four statistical methods for analysis of small sample size for detecting the differences in efficacy between two treatments for knee OA. Methods A total of 10,000 replicates of 5 sample sizes (n=10, 15, 20, 25, and 30 for each group) were generated based on the previous reported measures of treatment efficacy. Four statistical methods were used to compare the differences in efficacy between treatments, including the two-sample t-test (t-test), the Mann-Whitney U-test (M-W test), the Kolmogorov-Smirnov test (K-S test), and the permutation test (perm-test). Results The bias of simulated parameter means showed a decreased trend with sample size but the CV% of simulated parameter means varied with sample sizes for all parameters. For the largest sample size (n=30), the CV% could achieve a small level (<20%) for almost all parameters but the bias could not. Among the non-parametric tests for analysis of small sample size, the perm-test had the highest statistical power, and its false positive rate was not affected by sample size. However, the power of the perm-test could not achieve a high value (80%) even using the largest sample size (n=30). Conclusion The perm-test is suggested for analysis of small sample size to compare the differences in efficacy between two treatments for knee OA.


2021 ◽  
pp. bjophthalmol-2021-319067
Author(s):  
Felix Friedrich Reichel ◽  
Stylianos Michalakis ◽  
Barbara Wilhelm ◽  
Ditta Zobor ◽  
Regine Muehlfriedel ◽  
...  

AimsTo determine long-term safety and efficacy outcomes of a subretinal gene therapy for CNGA3-associated achromatopsia. We present data from an open-label, nonrandomised controlled trial (NCT02610582).MethodsDetails of the study design have been previously described. Briefly, nine patients were treated in three escalating dose groups with subretinal AAV8.CNGA3 gene therapy between November 2015 and October 2016. After the first year, patients were seen on a yearly basis. Safety assessment constituted the primary endpoint. On a secondary level, multiple functional tests were carried out to determine efficacy of the therapy.ResultsNo adverse or serious adverse events deemed related to the study drug occurred after year 1. Safety of the therapy, as the primary endpoint of this trial, can, therefore, be confirmed. The functional benefits that were noted in the treated eye at year 1 were persistent throughout the following visits at years 2 and 3. While functional improvement in the treated eye reached statistical significance for some secondary endpoints, for most endpoints, this was not the case when the treated eye was compared with the untreated fellow eye.ConclusionThe results demonstrate a very good safety profile of the therapy even at the highest dose administered. The small sample size limits the statistical power of efficacy analyses. However, trial results inform on the most promising design and endpoints for future clinical trials. Such trials have to determine whether treatment of younger patients results in greater functional gains by avoiding amblyopia as a potential limiting factor.


Sign in / Sign up

Export Citation Format

Share Document