Data Missingness Patterns in Homicide Datasets: An Applied Test on a Primary Data Set

2020 ◽  
Vol 35 (4) ◽  
pp. 589-614
Author(s):  
Melanie-Angela Neuilly ◽  
Ming-Li Hsieh ◽  
Alex Kigerl ◽  
Zachary K. Hamilton

Research on homicide missing data conventionally posits a Missing At Random pattern despite the relationship between missing data and clearance. The latter, however, cannot be satisfactorily modeled using variables traditionally available in homicide datasets. For this reason, it has been argued that missingness in homicide data follows a Nonignorable pattern instead. Hence, the use of multiple imputation strategies as recommended in the field for ignorable patterns would thus pose a threat to the validity of results obtained in such a way. This study examines missing data mechanisms by using a set of primary data collected in New Jersey. After comparing Listwise Deletion, Multiple Imputation, Propensity Score Matching, and Log-Multiplicative Association Models, our findings underscore that data in homicide datasets are indeed Missing Not At Random.

Author(s):  
Tra My Pham ◽  
Irene Petersen ◽  
James Carpenter ◽  
Tim Morris

ABSTRACT BackgroundEthnicity is an important factor to be considered in health research because of its association with inequality in disease prevalence and the utilisation of healthcare. Ethnicity recording has been incorporated in primary care electronic health records, and hence is available in large UK primary care databases such as The Health Improvement Network (THIN). However, since primary care data are routinely collected for clinical purposes, a large amount of data that are relevant for research including ethnicity is often missing. A popular approach for missing data is multiple imputation (MI). However, the conventional MI method assuming data are missing at random does not give plausible estimates of the ethnicity distribution in THIN compared to the general UK population. This might be due to the fact that ethnicity data in primary care are likely to be missing not at random. ObjectivesI propose a new MI method, termed ‘weighted multiple imputation’, to deal with data that are missing not at random in categorical variables.MethodsWeighted MI combines MI and probability weights which are calculated using external data sources. Census summary statistics for ethnicity can be used to form weights in weighted MI such that the correct marginal ethnic breakdown is recovered in THIN. I conducted a simulation study to examine weighted MI when ethnicity data are missing not at random. In this simulation study which resembled a THIN dataset, ethnicity was an independent variable in a survival model alongside other covariates. Weighted MI was compared to the conventional MI and other traditional missing data methods including complete case analysis and single imputation.ResultsWhile a small bias was still present in ethnicity coefficient estimates under weighted MI, it was less severe compared to MI assuming missing at random. Complete case analysis and single imputation were inadequate to handle data that are missing not at random in ethnicity.ConclusionsAlthough not a total cure, weighted MI represents a pragmatic approach that has potential applications not only in ethnicity but also in other incomplete categorical health indicators in electronic health records.


2018 ◽  
Vol 26 (4) ◽  
pp. 480-488 ◽  
Author(s):  
Thomas B. Pepinsky

This letter compares the performance of multiple imputation and listwise deletion using a simulation approach. The focus is on data that are “missing not at random” (MNAR), in which case both multiple imputation and listwise deletion are known to be biased. In these simulations, multiple imputation yields results that are frequently more biased, less efficient, and with worse coverage than listwise deletion when data are MNAR. This is the case even with very strong correlations between fully observed variables and variables with missing values, such that the data are very nearly “missing at random.” These results recommend caution when comparing the results from multiple imputation and listwise deletion, when the true data generating process is unknown.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background : Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders. Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms. Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. In particular the approach: 1) does not introduce bias in missing (completely) at random scenarios; 2)reduces bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself; and 3) may reduce or increase bias when unmeasured confounding is present. Conclusion : In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


2020 ◽  
Vol 29 (10) ◽  
pp. 3076-3092 ◽  
Author(s):  
Susan Gachau ◽  
Matteo Quartagno ◽  
Edmund Njeru Njagi ◽  
Nelson Owuor ◽  
Mike English ◽  
...  

Missing information is a major drawback in analyzing data collected in many routine health care settings. Multiple imputation assuming a missing at random mechanism is a popular method to handle missing data. The missing at random assumption cannot be confirmed from the observed data alone, hence the need for sensitivity analysis to assess robustness of inference. However, sensitivity analysis is rarely conducted and reported in practice. We analyzed routine paediatric data collected during a cluster randomized trial conducted in Kenyan hospitals. We imputed missing patient and clinician-level variables assuming the missing at random mechanism. We also imputed missing clinician-level variables assuming a missing not at random mechanism. We incorporated opinions from 15 clinical experts in the form of prior distributions and shift parameters in the delta adjustment method. An interaction between trial intervention arm and follow-up time, hospital, clinician and patient-level factors were included in a proportional odds random-effects analysis model. We performed these analyses using R functions derived from the jomo package. Parameter estimates from multiple imputation under the missing at random mechanism were similar to multiple imputation estimates assuming the missing not at random mechanism. Our inferences were insensitive to departures from the missing at random assumption using either the prior distributions or shift parameters sensitivity analysis approach.


2020 ◽  
Author(s):  
Matthew Sperrin ◽  
Glen P. Martin

Abstract Background: Within routinely collected health data, missing data for an individual might provide useful information in itself. This occurs, for example, in the case of electronic health records, where the presence or absence of data is informative. While the naive use of missing indicators to try to exploit such information can introduce bias, its use in conjunction with multiple imputation may unlock the potential value of missingness to reduce bias in causal effect estimation, particularly in missing not at random scenarios and where missingness might be associated with unmeasured confounders.Methods: We conducted a simulation study to determine when the use of a missing indicator, combined with multiple imputation, would reduce bias for causal effect estimation, under a range of scenarios including unmeasured variables, missing not at random, and missing at random mechanisms. We use directed acyclic graphs and structural models to elucidate a variety of causal structures of interest. We handled missing data using complete case analysis, and multiple imputation with and without missing indicator terms.Results: We find that multiple imputation combined with a missing indicator gives minimal bias for causal effect estimation in most scenarios. It does not introduce bias in missing (completely) at random scenarios, while reducing bias in missing not at random scenarios where the missing mechanism depends on the missing variable itself. The incorporation of a missing indicator can reduce or increase bias when unmeasured confounding is present.Conclusion: In the presence of missing data, careful use of missing indicators, combined with multiple imputation, can improve causal effect estimation when missingness is informative, and is not detrimental when missingness is at random.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Vijay Kumar Shrotryia ◽  
Kirti Saroha ◽  
Upasana Dhanda

PurposeThe purpose of this paper is to shed light on the relationship between organizational commitment (OC) and organizational citizenship behavior (OCB) as mediated by employee engagement (EE). The impact of different facets of OC (affective, continuance and normative) and EE (alignment, affectiveness and action-orientation) is examined with respect to OCB.Design/methodology/approachInsights from the literature underpin the hypotheses on how EE mediates the relationship between OC and OCB. Primary data using survey questionnaire were collected from 881 permanent employees of Delhi Metro Rail Corporation (DMRC) in India. Hayes' model 4 has been used for the mediation analysis.FindingsThe analyses show that only one facet of OC- affective commitment and the alignment and action-orientation dimensions of EE positively affect OCB. The relationship between OC and OCB is fully mediated by EE.Practical implicationsThe results imply that engaging employees is pivotal for effectively fostering citizenship behavior among employees. Organizations should be willing to implement strategies and interventions which enhance the emotional experience of employees to foster a sense of belongingness with the organization and engage them.Originality/valueThe paper draws on a unique data set of a prestigious organization in India to provide insights with substantial degree of generalizability into the relationship between OC, OCB and EE, whilst applying a comprehensive definition of these constructs. It is the first study to examine the inter-relationship among different facets of these constructs.


2021 ◽  
Author(s):  
Adrienne D. Woods ◽  
Pamela Davis-Kean ◽  
Max Andrew Halvorson ◽  
Kevin Michael King ◽  
Jessica A. R. Logan ◽  
...  

A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although this technique is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods and instead utilizing default methods within software like listwise deletion. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness, and reporting the extent of missing data biases and specific multiple imputation procedures in publications.


2021 ◽  
Vol 22 (17) ◽  
pp. 9650
Author(s):  
Miranda L. Gardner ◽  
Michael A. Freitas

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.


10.2196/26749 ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. e26749
Author(s):  
Simon B Goldberg ◽  
Daniel M Bolt ◽  
Richard J Davidson

Background Missing data are common in mobile health (mHealth) research. There has been little systematic investigation of how missingness is handled statistically in mHealth randomized controlled trials (RCTs). Although some missing data patterns (ie, missing at random [MAR]) may be adequately addressed using modern missing data methods such as multiple imputation and maximum likelihood techniques, these methods do not address bias when data are missing not at random (MNAR). It is typically not possible to determine whether the missing data are MAR. However, higher attrition in active (ie, intervention) versus passive (ie, waitlist or no treatment) conditions in mHealth RCTs raise a strong likelihood of MNAR, such as if active participants who benefit less from the intervention are more likely to drop out. Objective This study aims to systematically evaluate differential attrition and methods used for handling missingness in a sample of mHealth RCTs comparing active and passive control conditions. We also aim to illustrate a modern model-based sensitivity analysis and a simpler fixed-value replacement approach that can be used to evaluate the influence of MNAR. Methods We reanalyzed attrition rates and predictors of differential attrition in a sample of 36 mHealth RCTs drawn from a recent meta-analysis of smartphone-based mental health interventions. We systematically evaluated the design features related to missingness and its handling. Data from a recent mHealth RCT were used to illustrate 2 sensitivity analysis approaches (pattern-mixture model and fixed-value replacement approach). Results Attrition in active conditions was, on average, roughly twice that of passive controls. Differential attrition was higher in larger studies and was associated with the use of MAR-based multiple imputation or maximum likelihood methods. Half of the studies (18/36, 50%) used these modern missing data techniques. None of the 36 mHealth RCTs reviewed conducted a sensitivity analysis to evaluate the possible consequences of data MNAR. A pattern-mixture model and fixed-value replacement sensitivity analysis approaches were introduced. Results from a recent mHealth RCT were shown to be robust to missing data, reflecting worse outcomes in missing versus nonmissing scores in some but not all scenarios. A review of such scenarios helps to qualify the observations of significant treatment effects. Conclusions MNAR data because of differential attrition are likely in mHealth RCTs using passive controls. Sensitivity analyses are recommended to allow researchers to assess the potential impact of MNAR on trial results.


Sign in / Sign up

Export Citation Format

Share Document