listwise deletion
Recently Published Documents

Data preprocessing is an integral step prior to analyzing data in the social sciences. The purpose of this article is to report the current practices psychological researchers use to address data preprocessing or quality concerns with a focus on issues pertaining to aberrant responses and missing data in self report measures. 240 articles were sampled from four journals: Psychological Science, Journal of Personality and Social Psychology, Developmental Psychology, and Abnormal Psychology from 2012 to 2018. We found that nearly half of the studies did not report any missing data treatment (111/240; 46.25%) and if they did, the most common approach to handle missing data was listwise deletion (71/240; 29.6%). Studies that remove data due to missingness removed, on average, 12% of the sample. We also found that most studies do not report any methodology to address aberrant responses (194/240; 80.83%). For studies that reported issues with aberrant responses, a study would classify 4% of the sample, on average, as suspect responses. These results suggest that most studies are either not transparent enough about their data preprocessing steps or maybe leveraging suboptimal procedures. We outline recommendations for researchers to improve the transparency and/or the data quality of their study.

Download Full-text

Best Practices for Addressing Missing Data through Multiple Imputation

10.31234/osf.io/uaezh ◽

2021 ◽

Author(s):

Adrienne D. Woods ◽

Pamela Davis-Kean ◽

Max Andrew Halvorson ◽

Kevin Michael King ◽

Jessica A. R. Logan ◽

...

Keyword(s):

Missing Data ◽

Best Practices ◽

Multiple Imputation ◽

Statistical Techniques ◽

Parameter Estimates ◽

Developmental Research ◽

Imputation Methods ◽

Listwise Deletion ◽

Highly Effective ◽

Practical Guidelines

A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although this technique is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods and instead utilizing default methods within software like listwise deletion. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness, and reporting the extent of missing data biases and specific multiple imputation procedures in publications.

Download Full-text

Much Ado About Missingness: A Demonstration of Full Information Maximum Likelihood Estimation to Address Missingness in Functional Magnetic Resonance Imaging Data

Frontiers in Neuroscience ◽

10.3389/fnins.2021.746424 ◽

2021 ◽

Vol 15 ◽

Author(s):

Timothy D. Nelson ◽

Rebecca L. Brock ◽

Sonja Yokum ◽

Cara C. Tomaso ◽

Cary R. Savage ◽

...

Keyword(s):

Magnetic Resonance Imaging ◽

Magnetic Resonance ◽

Auxiliary Variable ◽

Fmri Data ◽

Full Information ◽

Auxiliary Variables ◽

Functional Magnetic Resonance ◽

Resonance Imaging ◽

Full Information Maximum Likelihood ◽

Listwise Deletion

The current paper leveraged a large multi-study functional magnetic resonance imaging (fMRI) dataset (N = 363) and a generated missingness paradigm to demonstrate different approaches for handling missing fMRI data under a variety of conditions. The performance of full information maximum likelihood (FIML) estimation, both with and without auxiliary variables, and listwise deletion were compared under different conditions of generated missing data volumes (i.e., 20, 35, and 50%). FIML generally performed better than listwise deletion in replicating results from the full dataset, but differences were small in the absence of auxiliary variables that correlated strongly with fMRI task data. However, when an auxiliary variable created to correlate r = 0.5 with fMRI task data was included, the performance of the FIML model improved, suggesting the potential value of FIML-based approaches for missing fMRI data when a strong auxiliary variable is available. In addition to primary methodological insights, the current study also makes an important contribution to the literature on neural vulnerability factors for obesity. Specifically, results from the full data model show that greater activation in regions implicated in reward processing (caudate and putamen) in response to tastes of milkshake significantly predicted weight gain over the following year. Implications of both methodological and substantive findings are discussed.

Download Full-text

POS0214 ASSOCIATION BETWEEN C-REACTIVE PROTEIN AND 10-YEAR RISK OF CARDIOVASCULAR DISEASE IN RHEUMATOID ARTHRITIS USING THE ERS-RA SCORE: A CROSS-SECTIONAL ANALYSIS OF THE CORDIS COHORT

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2021-eular.2643 ◽

2021 ◽

Vol 80 (Suppl 1) ◽

pp. 324.2-325

Author(s):

G. L. Erre ◽

F. Cacciapaglia ◽

G. Sakellariou ◽

A. Manfredi ◽

E. Bartoloni Bocci ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Multiple Imputation ◽

Disease Duration ◽

Analysis Model ◽

C Reactive Protein ◽

Cross Sectional ◽

Reactive Protein ◽

Listwise Deletion ◽

Cardiovascular Risk Prediction ◽

Hs Crp

Background:Rheumatoid arthritis (RA) is associated with an increased risk of atherosclerotic cardiovascular disease (CVD). The Expanded Cardiovascular Risk Prediction Score for Rheumatoid Arthritis (ERS-RA) estimates the 10-year risk of myocardial infarction, stroke or CVD-related death based on conventional and RA-specific (clinical disease activity index, CDAI, disease duration, glucocorticoid use) risk factors (1).Objectives:We evaluated the associations between ERS-RA 10-year risk of CVD, high-sensitivity C-reactive protein (hs-CRP) concentrations, and pharmacological treatment in 1,251 RA patients collected by the “Cardiovascular Obesity and Rheumatic Disease Study (CORDIS)” group of the Italian Society of Rheumatology (SIR).Methods:We assessed independent associations between ERS-RA risk score and each relevant variable using multivariate regression (ENTER approach; listwise deletion analysis). Given the relatively high number of missing hs-CRP data (n=385), regression analysis was also performed using multiple imputation (10 sets, Stata 16.1). Regression models were not adjusted for independent variables included in the ERS-RA score.Results:Among 1,251 RA patients [mean (SD) age 60.4(9.3), range (40-80) years; 78% female; mean (SD) disease duration, 11.6(8) years; mean (SD) CDAI, 9(9); mean (SD) HAQ, 0.77(0.7); mean (SD) hs-CRP, 6.8(12) mg/L] the estimated 10-year CVD risk was 11.6(0.9) % [mean (SD)]. Regarding treatment, 539(43%) received glucocorticoids, 676(54%) a biological or targeted synthetic disease-modifying anti-rheumatic drug (b/tsDMARD) (n missing=1), and 885(81%) at least one conventional synthetic DMARD (csDMARD). Ninety-three (7.4%) patients did not receive any treatment. After adjusting for the use of b/tsDMARD and csDMARD, hs-CRP concentrations were significantly associated with 10-year risk of CVD both in standard multiple regression (n=865; coefficient=0.005 for each 10 mg/L hs-CRP increment, 95% confidence interval (0.000-0.100), p=0.043) and after multiple imputation (n=1,251; coefficient=0.005 for each 10 mg/L hs-CRP increment, 95% confidence interval (0.000-0.114), p=0.035) (Table 1). This corresponds to an increase of 10-year CV risk of 1% for every 20 mg/L increase in hs-CRP concentrations.Conclusion:In a large cohort of RA patients, we observed a significant, positive, and independent association between hs-CRP concentrations and 10-year CV risk estimated by ERS-RA. The cross-sectional design of the study did not allow to establish a cause-effect relationship between hs-CRP and CV risk. Given that conventional CV risk factors and inflammation-related variables are accounted for in the ERS-RA risk score, other, unexplored, mechanisms may underlie the observed association between hs-CRP and CV risk.References:[1]Solomon, D. H., et al. “Derivation and internal validation of an expanded cardiovascular risk prediction score for rheumatoid arthritis: a Consortium of Rheumatology Researchers of North America Registry Study.” Arthritis & rheumatology 67.8 (2015): 1995-2003.Table 1.Multiple regression modelsModel 1n= 865Model 2n= 1, 251ERS-RA scoreCoefficient95% CI, pCoefficient95% CI, phs-CRP, every 10 mg/L increment0.0050.000 to 0.100, 0.0430.0050.000 to 0.011, 0.035b/tsDMARD use-0.002-0.005 to 0.001, 0.199-0.000-0.002 to 0.002, 0.963csDMARD use0.002-0.003 to 0.007, 0.3940.002-0.002 to 0.006, 0.371Prob >F, model with only CRP0.030.03Prob >F, full model0.070.08A multiple linear regression (ENTER method) was performed for the dependent variable ERS-RA score using a listwise deletion analysis (Model 1) and a multiple imputation analysis (Model 2).Disclosure of Interests:None declared

Download Full-text

A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees

Journal of Behavioral Data Science ◽

10.35566/jbds/v1n1/p6 ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Danielle M. Rodgers ◽

Ross Jacobucci ◽

Kevin J. Grimm

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Missing Values ◽

Selection Process ◽

Superior Performance ◽

Machine Learning Technique ◽

Listwise Deletion ◽

Learning Technique ◽

Missing Data Techniques ◽

Imputation Approach

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.

Download Full-text

Data Missingness Patterns in Homicide Datasets: An Applied Test on a Primary Data Set

Violence and Victims ◽

10.1891/vv-d-17-00189 ◽

2020 ◽

Vol 35 (4) ◽

pp. 589-614

Author(s):

Melanie-Angela Neuilly ◽

Ming-Li Hsieh ◽

Alex Kigerl ◽

Zachary K. Hamilton

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Missing At Random ◽

Primary Data ◽

Random Pattern ◽

Validity Of Results ◽

Data Set ◽

Missing Not At Random ◽

Listwise Deletion ◽

The Relationship

Research on homicide missing data conventionally posits a Missing At Random pattern despite the relationship between missing data and clearance. The latter, however, cannot be satisfactorily modeled using variables traditionally available in homicide datasets. For this reason, it has been argued that missingness in homicide data follows a Nonignorable pattern instead. Hence, the use of multiple imputation strategies as recommended in the field for ignorable patterns would thus pose a threat to the validity of results obtained in such a way. This study examines missing data mechanisms by using a set of primary data collected in New Jersey. After comparing Listwise Deletion, Multiple Imputation, Propensity Score Matching, and Log-Multiplicative Association Models, our findings underscore that data in homicide datasets are indeed Missing Not At Random.

Download Full-text

OPINION LEADERSHIP MEMODERASI eWOM INTENTION (STUDI PADA BOMBARU BAR DAN RESTO BENGKULU, INDONESIA)

MIX JURNAL ILMIAH MANAJEMEN ◽

10.22441/mix.2020.v10i2.002 ◽

2020 ◽

Vol 10 (2) ◽

pp. 165

Author(s):

Ryan Vernando Putra ◽

Muhartini Salim ◽

Sularsih Anggarawati

Keyword(s):

Factor Analysis ◽

Data Analysis ◽

Confirmatory Factor Analysis ◽

Word Of Mouth ◽

Analysis Data ◽

Opinion Leadership ◽

Survey Questionnaire ◽

Listwise Deletion ◽

Confirmatory Factor ◽

Quality Value

Abstract. The purpose of this study was to examine the influence of Self-Relevant Value, Quality Value and Perceived Informational Utility on Electronic Word-Of-Mouth Intention moderated by Opinion Leadership in Bombaru Bars and Restaurant Bengkulu, Indonesia consumers. The respondents of this study were 17-45 years old. Respondent data were collected by the survey questionnaire provided. After adopting the listwise deletion method through Mahalanobis Distance on SEM-AMOS, 133 questionnaires that could be used were available for analysis. Data analysis used Confirmatory Factor Analysis (CFA), assessment of normality, and regression weights. The Result indicate that (1) Self-relevant Value has a positif effect on eWOM Intention; (2) Quality Value has a positif effect on eWOM Intention; (3) Perceived Informational Utility has a postif effect on eWOM Intention; (4) Opinion Leadership moderates the influence of Self-relevant Value on eWOM Intention; (5) Opinion Leadership moderates the influence of Quality Value on eWOM Intention; (6) Opinion Leadership moderates the influence of Perceived Informational Utility on eWOM Intention.Keyword: eWOM Intention, Opinion Leadership, Perceived Informational Utility, Quality Value, and Self-relevant Value.

Download Full-text

TXA (Tranexamic Acid) Risk Evaluation in Combat Casualties (TRECC)

Trauma Surgery & Acute Care Open ◽

10.1136/tsaco-2019-000353 ◽

2020 ◽

Vol 5 (1) ◽

pp. e000353

Author(s):

Kathleen E Adair ◽

Joshua D Patrick ◽

Eric J Kliber ◽

Matthew N Peterson ◽

Seth R Holland

Keyword(s):

Tranexamic Acid ◽

Multiple Imputation ◽

Multiple Regression ◽

Injury Severity ◽

Trauma Patients ◽

Regression Analyses ◽

Military Population ◽

Level Of Evidence ◽

Listwise Deletion ◽

Military Trauma

BackgroundThe use of tranexamic acid (TXA) has become increasingly prevalent for hemorrhage prevention in military trauma patients due to its known survival benefits. There is concern of increased venous thromboembolism (VTE) subsequent to receiving TXA. The purpose of this retrospective study was to determine the rate of VTE in severely injured military personnel during Operation Enduring Freedom (2009–2014).MethodsAn analysis of 859 military trauma patients from the 2009–2014 Department of Defense Trauma Registry included subjects with an injury severity score (ISS) >10 and a massive transfusion (MT) (>10 units of blood products in the first 24 hours). Outcomes included a documented VTE (eg, deep vein thrombosis (DVT) or pulmonary embolism (PE)) during the patient’s hospital course. Comparison between those who did/did not receive TXA was analyzed using three separate multiple regression analyses using listwise deletion, systematic replacement and multiple imputation.ResultsSubjects (n=620) met inclusion criteria with 27% (n=169) having a documented VTE. A total of 30% that received TXA had a documented VTE, 26% that did not receive TXA had a documented VTE and 43% (n=264, n=620) of the sample did not have TXA documented as either given or not given. Multiple regression analyses using listwise deletion and systematic replacement of the TXA variable demonstrated no difference in odds of VTE, whereas the multiple imputation analysis demonstrated a 3% increased odds of VTE, a9.4% increased odds of PE and 8.1% decreased odds of DVT with TXA administration.DiscussionTXA use with an ISS >10 and MT resuscitation had a 3% increased odds of VTE and an increased odds of PE, whereas the odds of DVT were found to be decreased after multiple imputation analysis. Further research on the long-term risks and benefits of TXA usage in the military population is recommended.Level of evidenceIV—therapeutic.

Download Full-text

A framework for testing different imputation methods for tabular datasets

10.1101/773762 ◽

2019 ◽

Author(s):

Tabea Kossen ◽

Michelle Livne ◽

Vince I Madai ◽

Ivana Galinovic ◽

Dietmar Frey ◽

...

Keyword(s):

Linear Model ◽

Missing Values ◽

Mean Squared Error ◽

Missing At Random ◽

Imputation Method ◽

Similar Data ◽

Missing Value ◽

Imputation Methods ◽

Listwise Deletion ◽

Clinical Dataset

AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the available data. Methods to impute missing values are thus crucial. Here, we developed a publicly available framework to test different imputation methods and compared their impact in a typical stroke clinical dataset as a use case.MethodsA clinical dataset based on the 1000Plus stroke study with 380 completed-entries patients was used. 13 common clinical parameters including numerical and categorical values were selected. Missing values in a missing-at-random (MAR) and missing-completely-at-random (MCAR) fashion from 0% to 60% were simulated and consequently imputed using the mean, hot-deck, multiple imputation by chained equations, expectation maximization method and listwise deletion. The performance was assessed by the root mean squared error, the absolute bias and the performance of a linear model for discharge mRS prediction.ResultsListwise deletion was the worst performing method and started to be significantly worse than any imputation method from 2% (MAR) and 3% (MCAR) missing values on. The underlying missing value mechanism seemed to have a crucial influence on the identified best performing imputation method. Consequently no single imputation method outperformed all others. A significant performance drop of the linear model started from 11% (MAR+MCAR) and 18% (MCAR) missing values.ConclusionsIn the presented case study of a typical clinical stroke dataset we confirmed that listwise deletion should be avoided for dealing with missing values. Our findings indicate that the underlying missing value mechanism and other dataset characteristics strongly influence the best choice of imputation method. For future studies with similar data structure, we thus suggest to use the developed framework in this study to select the most suitable imputation method for a given dataset prior to analysis.

Download Full-text

Performance evaluation of listwise deletion for impaired datasets in multiple regression-based prediction

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i2.pp1009-1018 ◽

2019 ◽

Vol 15 (2) ◽

pp. 1009

Author(s):

Chanintorn Jittawiriyanukoon

Keyword(s):

Multiple Regression ◽

Performance Metrics ◽

Mean Squared Error ◽

Absolute Error ◽

Average Error ◽

Listwise Deletion ◽

Squared Error ◽

Analysis Technique ◽

The Future ◽

History Of

<span>Multiple Regression-Based Prediction (MRBP) is an emerging calculation to or analysis technique cope with the future by compiling the history of data. The MRBP characteristic will include an approximation for the associations between physical observations and predictions. MRBP is a predictive model, which will be an important source of knowledge in terms of an interesting trend to be followed in the future. However, there is impairment in the MRBP dataset, wherein each form of missing and noisy data has caused an error and is unavailable further analysis. To overcome this unavailability, so that the data analytics can be moved on, two treatment approaches are introduced. First, the given dataset is denoised; next, listwise deletion (LD) is proposed to handle the missing data. The performance of the proposed technique will be investigated by dealing with datasets that cannot be executed. Employing the Massive Online Analysis (MOA) software, the proposed model is investigated, and the results are summarized. Performance metrics, such as mean squared error (MSE), correlation coefficient (COEF), mean absolute error (MAE), root mean squared error (RMSE), and the average error percentage, are used to validate the proposed mechanism. The proposed LD projection is confirmed through actual values. The proposed LD outperforms other treatments as it only requires less state space, which reflects low computation cost, and proves its capability to overcome the limitation of analysis.</span>

Download Full-text

listwise deletionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Aspects of Data Quality in Psychology: Missing Data and Aberrant Responses

Best Practices for Addressing Missing Data through Multiple Imputation

Much Ado About Missingness: A Demonstration of Full Information Maximum Likelihood Estimation to Address Missingness in Functional Magnetic Resonance Imaging Data

POS0214 ASSOCIATION BETWEEN C-REACTIVE PROTEIN AND 10-YEAR RISK OF CARDIOVASCULAR DISEASE IN RHEUMATOID ARTHRITIS USING THE ERS-RA SCORE: A CROSS-SECTIONAL ANALYSIS OF THE CORDIS COHORT

A Multiple Imputation Approach for Handling Missing Data in Classification and Regression Trees

Data Missingness Patterns in Homicide Datasets: An Applied Test on a Primary Data Set

OPINION LEADERSHIP MEMODERASI eWOM INTENTION (STUDI PADA BOMBARU BAR DAN RESTO BENGKULU, INDONESIA)

TXA (Tranexamic Acid) Risk Evaluation in Combat Casualties (TRECC)

A framework for testing different imputation methods for tabular datasets

Performance evaluation of listwise deletion for impaired datasets in multiple regression-based prediction

listwise deletion
Recently Published Documents