listwise deletion
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 11)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Maxwell Hong ◽  
Matt Carter ◽  
Cheyeon Kim ◽  
Ying Cheng

Data preprocessing is an integral step prior to analyzing data in the social sciences. The purpose of this article is to report the current practices psychological researchers use to address data preprocessing or quality concerns with a focus on issues pertaining to aberrant responses and missing data in self report measures. 240 articles were sampled from four journals: Psychological Science, Journal of Personality and Social Psychology, Developmental Psychology, and Abnormal Psychology from 2012 to 2018. We found that nearly half of the studies did not report any missing data treatment (111/240; 46.25%) and if they did, the most common approach to handle missing data was listwise deletion (71/240; 29.6%). Studies that remove data due to missingness removed, on average, 12% of the sample. We also found that most studies do not report any methodology to address aberrant responses (194/240; 80.83%). For studies that reported issues with aberrant responses, a study would classify 4% of the sample, on average, as suspect responses. These results suggest that most studies are either not transparent enough about their data preprocessing steps or maybe leveraging suboptimal procedures. We outline recommendations for researchers to improve the transparency and/or the data quality of their study.


2021 ◽  
Author(s):  
Adrienne D. Woods ◽  
Pamela Davis-Kean ◽  
Max Andrew Halvorson ◽  
Kevin Michael King ◽  
Jessica A. R. Logan ◽  
...  

A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although this technique is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods and instead utilizing default methods within software like listwise deletion. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness, and reporting the extent of missing data biases and specific multiple imputation procedures in publications.


2021 ◽  
Vol 15 ◽  
Author(s):  
Timothy D. Nelson ◽  
Rebecca L. Brock ◽  
Sonja Yokum ◽  
Cara C. Tomaso ◽  
Cary R. Savage ◽  
...  

The current paper leveraged a large multi-study functional magnetic resonance imaging (fMRI) dataset (N = 363) and a generated missingness paradigm to demonstrate different approaches for handling missing fMRI data under a variety of conditions. The performance of full information maximum likelihood (FIML) estimation, both with and without auxiliary variables, and listwise deletion were compared under different conditions of generated missing data volumes (i.e., 20, 35, and 50%). FIML generally performed better than listwise deletion in replicating results from the full dataset, but differences were small in the absence of auxiliary variables that correlated strongly with fMRI task data. However, when an auxiliary variable created to correlate r = 0.5 with fMRI task data was included, the performance of the FIML model improved, suggesting the potential value of FIML-based approaches for missing fMRI data when a strong auxiliary variable is available. In addition to primary methodological insights, the current study also makes an important contribution to the literature on neural vulnerability factors for obesity. Specifically, results from the full data model show that greater activation in regions implicated in reward processing (caudate and putamen) in response to tastes of milkshake significantly predicted weight gain over the following year. Implications of both methodological and substantive findings are discussed.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 324.2-325
Author(s):  
G. L. Erre ◽  
F. Cacciapaglia ◽  
G. Sakellariou ◽  
A. Manfredi ◽  
E. Bartoloni Bocci ◽  
...  

Background:Rheumatoid arthritis (RA) is associated with an increased risk of atherosclerotic cardiovascular disease (CVD). The Expanded Cardiovascular Risk Prediction Score for Rheumatoid Arthritis (ERS-RA) estimates the 10-year risk of myocardial infarction, stroke or CVD-related death based on conventional and RA-specific (clinical disease activity index, CDAI, disease duration, glucocorticoid use) risk factors (1).Objectives:We evaluated the associations between ERS-RA 10-year risk of CVD, high-sensitivity C-reactive protein (hs-CRP) concentrations, and pharmacological treatment in 1,251 RA patients collected by the “Cardiovascular Obesity and Rheumatic Disease Study (CORDIS)” group of the Italian Society of Rheumatology (SIR).Methods:We assessed independent associations between ERS-RA risk score and each relevant variable using multivariate regression (ENTER approach; listwise deletion analysis). Given the relatively high number of missing hs-CRP data (n=385), regression analysis was also performed using multiple imputation (10 sets, Stata 16.1). Regression models were not adjusted for independent variables included in the ERS-RA score.Results:Among 1,251 RA patients [mean (SD) age 60.4(9.3), range (40-80) years; 78% female; mean (SD) disease duration, 11.6(8) years; mean (SD) CDAI, 9(9); mean (SD) HAQ, 0.77(0.7); mean (SD) hs-CRP, 6.8(12) mg/L] the estimated 10-year CVD risk was 11.6(0.9) % [mean (SD)]. Regarding treatment, 539(43%) received glucocorticoids, 676(54%) a biological or targeted synthetic disease-modifying anti-rheumatic drug (b/tsDMARD) (n missing=1), and 885(81%) at least one conventional synthetic DMARD (csDMARD). Ninety-three (7.4%) patients did not receive any treatment. After adjusting for the use of b/tsDMARD and csDMARD, hs-CRP concentrations were significantly associated with 10-year risk of CVD both in standard multiple regression (n=865; coefficient=0.005 for each 10 mg/L hs-CRP increment, 95% confidence interval (0.000-0.100), p=0.043) and after multiple imputation (n=1,251; coefficient=0.005 for each 10 mg/L hs-CRP increment, 95% confidence interval (0.000-0.114), p=0.035) (Table 1). This corresponds to an increase of 10-year CV risk of 1% for every 20 mg/L increase in hs-CRP concentrations.Conclusion:In a large cohort of RA patients, we observed a significant, positive, and independent association between hs-CRP concentrations and 10-year CV risk estimated by ERS-RA. The cross-sectional design of the study did not allow to establish a cause-effect relationship between hs-CRP and CV risk. Given that conventional CV risk factors and inflammation-related variables are accounted for in the ERS-RA risk score, other, unexplored, mechanisms may underlie the observed association between hs-CRP and CV risk.References:[1]Solomon, D. H., et al. “Derivation and internal validation of an expanded cardiovascular risk prediction score for rheumatoid arthritis: a Consortium of Rheumatology Researchers of North America Registry Study.” Arthritis & rheumatology 67.8 (2015): 1995-2003.Table 1.Multiple regression modelsModel 1n= 865Model 2n= 1, 251ERS-RA scoreCoefficient95% CI, pCoefficient95% CI, phs-CRP, every 10 mg/L increment0.0050.000 to 0.100, 0.0430.0050.000 to 0.011, 0.035b/tsDMARD use-0.002-0.005 to 0.001, 0.199-0.000-0.002 to 0.002, 0.963csDMARD use0.002-0.003 to 0.007, 0.3940.002-0.002 to 0.006, 0.371Prob >F, model with only CRP0.030.03Prob >F, full model0.070.08A multiple linear regression (ENTER method) was performed for the dependent variable ERS-RA score using a listwise deletion analysis (Model 1) and a multiple imputation analysis (Model 2).Disclosure of Interests:None declared


2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Danielle M. Rodgers ◽  
Ross Jacobucci ◽  
Kevin J. Grimm

Decision trees (DTs) is a machine learning technique that searches the predictor space for the variable and observed value that leads to the best prediction when the data are split into two nodes based on the variable and splitting value. The algorithm repeats its search within each partition of the data until a stopping rule ends the search. Missing data can be problematic in DTs because of an inability to place an observation with a missing value into a node based on the chosen splitting variable. Moreover, missing data can alter the selection process because of its inability to place observations with missing values. Simple missing data approaches (e.g., listwise deletion, majority rule, and surrogate split) have been implemented in DT algorithms; however, more sophisticated missing data techniques have not been thoroughly examined. We propose a modified multiple imputation approach to handling missing data in DTs, and compare this approach with simple missing data approaches as well as single imputation and a multiple imputation with prediction averaging via Monte Carlo Simulation. This study evaluated the performance of each missing data approach when data were MAR or MCAR. The proposed multiple imputation approach and surrogate splits had superior performance with the proposed multiple imputation approach performing best in the more severe missing data conditions. We conclude with recommendations for handling missing data in DTs.


2020 ◽  
Vol 35 (4) ◽  
pp. 589-614
Author(s):  
Melanie-Angela Neuilly ◽  
Ming-Li Hsieh ◽  
Alex Kigerl ◽  
Zachary K. Hamilton

Research on homicide missing data conventionally posits a Missing At Random pattern despite the relationship between missing data and clearance. The latter, however, cannot be satisfactorily modeled using variables traditionally available in homicide datasets. For this reason, it has been argued that missingness in homicide data follows a Nonignorable pattern instead. Hence, the use of multiple imputation strategies as recommended in the field for ignorable patterns would thus pose a threat to the validity of results obtained in such a way. This study examines missing data mechanisms by using a set of primary data collected in New Jersey. After comparing Listwise Deletion, Multiple Imputation, Propensity Score Matching, and Log-Multiplicative Association Models, our findings underscore that data in homicide datasets are indeed Missing Not At Random.


2020 ◽  
Vol 10 (2) ◽  
pp. 165
Author(s):  
Ryan Vernando Putra ◽  
Muhartini Salim ◽  
Sularsih Anggarawati

Abstract. The purpose of this study was to examine the influence of Self-Relevant Value, Quality Value and Perceived Informational Utility on Electronic Word-Of-Mouth Intention moderated by Opinion Leadership in Bombaru Bars and Restaurant Bengkulu, Indonesia consumers. The respondents of this study were 17-45 years old. Respondent data were collected by the survey questionnaire provided. After adopting the listwise deletion method through Mahalanobis Distance on SEM-AMOS, 133 questionnaires that could be used were available for analysis. Data analysis used Confirmatory Factor Analysis (CFA), assessment of normality, and regression weights. The Result indicate that (1) Self-relevant Value has a positif effect on eWOM Intention; (2) Quality Value has a positif effect on eWOM Intention; (3) Perceived Informational Utility has a postif effect on eWOM Intention; (4) Opinion Leadership moderates the influence of Self-relevant Value on eWOM Intention; (5) Opinion Leadership moderates the influence of Quality Value on eWOM Intention; (6) Opinion Leadership moderates the influence of Perceived Informational Utility on eWOM Intention.Keyword: eWOM Intention, Opinion Leadership, Perceived Informational Utility, Quality Value, and Self-relevant Value.


2020 ◽  
Vol 5 (1) ◽  
pp. e000353
Author(s):  
Kathleen E Adair ◽  
Joshua D Patrick ◽  
Eric J Kliber ◽  
Matthew N Peterson ◽  
Seth R Holland

BackgroundThe use of tranexamic acid (TXA) has become increasingly prevalent for hemorrhage prevention in military trauma patients due to its known survival benefits. There is concern of increased venous thromboembolism (VTE) subsequent to receiving TXA. The purpose of this retrospective study was to determine the rate of VTE in severely injured military personnel during Operation Enduring Freedom (2009–2014).MethodsAn analysis of 859 military trauma patients from the 2009–2014 Department of Defense Trauma Registry included subjects with an injury severity score (ISS) >10 and a massive transfusion (MT) (>10 units of blood products in the first 24 hours). Outcomes included a documented VTE (eg, deep vein thrombosis (DVT) or pulmonary embolism (PE)) during the patient’s hospital course. Comparison between those who did/did not receive TXA was analyzed using three separate multiple regression analyses using listwise deletion, systematic replacement and multiple imputation.ResultsSubjects (n=620) met inclusion criteria with 27% (n=169) having a documented VTE. A total of 30% that received TXA had a documented VTE, 26% that did not receive TXA had a documented VTE and 43% (n=264, n=620) of the sample did not have TXA documented as either given or not given. Multiple regression analyses using listwise deletion and systematic replacement of the TXA variable demonstrated no difference in odds of VTE, whereas the multiple imputation analysis demonstrated a 3% increased odds of VTE, a9.4% increased odds of PE and 8.1% decreased odds of DVT with TXA administration.DiscussionTXA use with an ISS >10 and MT resuscitation had a 3% increased odds of VTE and an increased odds of PE, whereas the odds of DVT were found to be decreased after multiple imputation analysis. Further research on the long-term risks and benefits of TXA usage in the military population is recommended.Level of evidenceIV—therapeutic.


2019 ◽  
Author(s):  
Tabea Kossen ◽  
Michelle Livne ◽  
Vince I Madai ◽  
Ivana Galinovic ◽  
Dietmar Frey ◽  
...  

AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the available data. Methods to impute missing values are thus crucial. Here, we developed a publicly available framework to test different imputation methods and compared their impact in a typical stroke clinical dataset as a use case.MethodsA clinical dataset based on the 1000Plus stroke study with 380 completed-entries patients was used. 13 common clinical parameters including numerical and categorical values were selected. Missing values in a missing-at-random (MAR) and missing-completely-at-random (MCAR) fashion from 0% to 60% were simulated and consequently imputed using the mean, hot-deck, multiple imputation by chained equations, expectation maximization method and listwise deletion. The performance was assessed by the root mean squared error, the absolute bias and the performance of a linear model for discharge mRS prediction.ResultsListwise deletion was the worst performing method and started to be significantly worse than any imputation method from 2% (MAR) and 3% (MCAR) missing values on. The underlying missing value mechanism seemed to have a crucial influence on the identified best performing imputation method. Consequently no single imputation method outperformed all others. A significant performance drop of the linear model started from 11% (MAR+MCAR) and 18% (MCAR) missing values.ConclusionsIn the presented case study of a typical clinical stroke dataset we confirmed that listwise deletion should be avoided for dealing with missing values. Our findings indicate that the underlying missing value mechanism and other dataset characteristics strongly influence the best choice of imputation method. For future studies with similar data structure, we thus suggest to use the developed framework in this study to select the most suitable imputation method for a given dataset prior to analysis.


Author(s):  
Chanintorn Jittawiriyanukoon

<span>Multiple Regression-Based Prediction (MRBP) is an emerging calculation to or analysis technique cope with the future by compiling the history of data. The MRBP characteristic will include an approximation for the associations between physical observations and predictions. MRBP is a predictive model, which will be an important source of knowledge in terms of an interesting trend to be followed in the future. However, there is impairment in the MRBP dataset, wherein each form of missing and noisy data has caused an error and is unavailable further analysis. To overcome this unavailability, so that the data analytics can be moved on, two treatment approaches are introduced. First, the given dataset is denoised; next, listwise deletion (LD) is proposed to handle the missing data. The performance of the proposed technique will be investigated by dealing with datasets that cannot be executed. Employing the Massive Online Analysis (MOA) software, the proposed model is investigated, and the results are summarized. Performance metrics, such as mean squared error (MSE), correlation coefficient (COEF), mean absolute error (MAE), root mean squared error (RMSE), and the average error percentage, are used to validate the proposed mechanism. The proposed LD projection is confirmed through actual values. The proposed LD outperforms other treatments as it only requires less state space, which reflects low computation cost, and proves its capability to overcome the limitation of analysis.</span>


Sign in / Sign up

Export Citation Format

Share Document