sparse data bias
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 5)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
Mohammad Hossein Panahi ◽  
Kazem Mohammad ◽  
Razieh Bidhendi Yarandi ◽  
Fahimeh Ramezani Tehrani

This study aims to illustrate the problem of (Quasi) Complete Separation in the sparse data pattern occurring medical data. We presented the failure of traditional methods and then provided an overview of popular remedial approaches to reduce bias through vivid examples. Penalized maximum likelihood estimation and Bayesian methods are some remedial tools introduced to reduce bias. Data from the Tehran Thyroid and Pregnancy Study, a two-phase cohort study conducted from September 2013 through February 2016, was applied for illustration. The bias reduction of the estimate showed how sufficient these methods are compared to the traditional method. Extremely large measures of association such as the Risk ratios along with an extraordinarily wide range of confidence interval proved the traditional estimation methods futile in case of sparse data while it is still widely applying and reporting. In this review paper, we introduce some advanced methods such as data augmentation to provide unbiased estimations.


Author(s):  
David B Richardson ◽  
Stephen R Cole ◽  
Rachael K Ross ◽  
Charles Poole ◽  
Haitao Chu ◽  
...  

Abstract Meta-analyses are undertaken to combine information from a set of studies, often in settings where some of the individual study-specific estimates are based on relatively small study samples. Finite sample bias may occur when maximum likelihood estimates of associations are obtained by fitting logistic regression models to sparse data sets. Here we show that combining information from small studies by undertaking a meta-analytical summary of logistic regression estimates can propagate such sparse-data bias. In simulations, we illustrate 2 challenges encountered in meta-analyses of logistic regression results in settings of sparse data: 1) bias in the summary meta-analytical result and 2) confidence interval coverage that can worsen rather than improve, in terms of being less than nominal, as the number of studies in the meta-analysis increases.


BMJ Open ◽  
2018 ◽  
Vol 8 (12) ◽  
pp. e020642 ◽  
Author(s):  
Zahra Cheraghi ◽  
Saharnaz Nedjat ◽  
Parvin Mirmiran ◽  
Nazanin Moslehi ◽  
Nasrin Mansournia ◽  
...  

ObjectivesDiet and nutrition might play an important role in the aetiology of metabolic syndrome (MetS). Most studies that examine the effects of food intake on MetS have used conventional statistical analyses which usually investigate only a limited number of food items and are subject to sparse data bias. This study was undertaken with the goal of investigating the concurrent effect of numerous food items and related nutrients on the incidence of MetS using Bayesian multilevel modelling which can control for sparse data bias.DesignProspective cohort study.SettingThis prospective study was a subcohort of the Tehran Lipid and Glucose Study. We analysed dietary intake as well as pertinent covariates for cohort members in the fourth (2008–2011) and fifth (2011–2014) follow-up examinations. We fitted Bayesian multilevel model and compared the results with two logistic regression models: (1) full model which included all variables and (2) reduced model through backward selection of dietary variables.Participants3616 healthy Iranian adults, aged ≥20 years.Primary and secondary outcome measuresIncident cases of MetS.ResultsBayesian multilevel approach produced results that were more precise and biologically plausible compared with conventional logistic regression models. The OR and 95% confidence limits for the effects of the four foods comparing the Bayesian multilevel with the full conventional model were as follows: (1) noodle soup (1.20 (0.67 to 2.14) vs 1.91 (0.65 to 5.64)), (2) beans (0.96 (0.5 to 1.85) vs 0.55 (0.03 to 11.41)), (3) turnip (1.23 (0.68 to 2.23) vs 2.48 (0.82 to 7.52)) and (4) eggplant (1.01 (0.51 to 2.00) vs 1 09 396 (0.152×10–6to 768×1012)). For most food items, the Bayesian multilevel analysis gave narrower confidence limits than both logistic regression models, and hence provided the highest precision.ConclusionsThis study demonstrates that conventional regression methods do not perform well and might even be biased when assessing highly correlated exposures such as food items in dietary epidemiological studies. Despite the complexity of the Bayesian multilevel models and their inherent assumptions, this approach performs superior to conventional statistical models in studies that examine multiple nutritional exposures that are highly correlated.


BMJ ◽  
2016 ◽  
pp. i1981 ◽  
Author(s):  
Sander Greenland ◽  
Mohammad Ali Mansournia ◽  
Douglas G Altman
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document