missing data
Recently Published Documents





2022 ◽  
Vol 9 (3) ◽  
pp. 0-0

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.

2022 ◽  
Vol 16 (4) ◽  
pp. 1-24
Kui Yu ◽  
Yajing Yang ◽  
Wei Ding

Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.

2022 ◽  
Vol 421 ◽  
pp. 126915
A.S.M. Bakibillah ◽  
Yong Hwa Tan ◽  
Junn Yong Loo ◽  
Chee Pin Tan ◽  
M.A.S. Kamal ◽  

2022 ◽  
pp. annrheumdis-2021-221477
Denis Mongin ◽  
Kim Lauper ◽  
Axel Finckh ◽  
Thomas Frisell ◽  
Delphine Sophie Courvoisier

ObjectivesTo assess the performance of statistical methods used to compare the effectiveness between drugs in an observational setting in the presence of attrition.MethodsIn this simulation study, we compared the estimations of low disease activity (LDA) at 1 year produced by complete case analysis (CC), last observation carried forward (LOCF), LUNDEX, non-responder imputation (NRI), inverse probability weighting (IPW) and multiple imputations of the outcome. All methods were adjusted for confounders. The reasons to stop the treatments were included in the multiple imputation method (confounder-adjusted response rate with attrition correction, CARRAC) and were either included (IPW2) or not (IPW1) in the IPW method. A realistic simulation data set was generated from a real-world data collection. The amount of missing data caused by attrition and its dependence on the ‘true’ value of the data missing were varied to assess the robustness of each method to these changes.ResultsLUNDEX and NRI strongly underestimated the absolute LDA difference between two treatments, and their estimates were highly sensitive to the amount of attrition. IPW1 and CC overestimated the absolute LDA difference between the two treatments and the overestimation increased with increasing attrition or when missingness depended on disease activity at 1 year. IPW2 and CARRAC produced unbiased estimations, but IPW2 had a greater sensitivity to the missing pattern of data and the amount of attrition than CARRAC.ConclusionsOnly multiple imputation and IPW2, which considered both confounding and treatment cessation reasons, produced accurate comparative effectiveness estimates.

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Xiaoying Lv ◽  
Ruonan Zhao ◽  
Tongsheng Su ◽  
Liyun He ◽  
Rui Song ◽  

Objective. To explore the optimal fitting path of missing data of the Scale to make the fitting data close to the real situation of patients’ data. Methods. Based on the complete data set of the SDS of 507 patients with stroke, the data simulation sets of Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) were constructed by R software, respectively, with missing rates of 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40% under three missing mechanisms. Mean substitution (MS), random forest regression (RFR), and predictive mean matching (PMM) were used to fit the data. Root mean square error (RMSE), the width of 95% confidence intervals (95% CI), and Spearman correlation coefficient (SCC) were used to evaluate the fitting effect and determine the optimal fitting path. Results. when dealing with the problem of missing data in scales, the optimal fitting path is ① under the MCAR deletion mechanism, when the deletion proportion is less than 20%, the MS method is the most convenient; when the missing ratio is greater than 20%, RFR algorithm is the best fitting method. ② Under the Mar mechanism, when the deletion ratio is less than 35%, the MS method is the most convenient. When the deletion ratio is greater than 35%, RFR has a better correlation. ③ Under the mechanism of MNAR, RFR is the best data fitting method, especially when the missing proportion is greater than 30%. In reality, when the deletion ratio is small, the complete case deletion method is the most commonly used, but the RFR algorithm can greatly expand the application scope of samples and save the cost of clinical research when the deletion ratio is less than 30%. The best way to deal with data missing should be based on the missing mechanism and proportion of actual data, and choose the best method between the statistical analysis ability of the research team, the effectiveness of the method, and the understanding of readers.

2022 ◽  
Vol 22 (1) ◽  
Huimin Wang ◽  
Jianxiang Tang ◽  
Mengyao Wu ◽  
Xiaoyu Wang ◽  
Tao Zhang

Abstract Background There are often many missing values in medical data, which directly affect the accuracy of clinical decision making. Discharge assessment is an important part of clinical decision making. Taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example, this study adopted the missing data processing evaluation criteria more suitable for clinical decision making, aiming at systematically exploring the performance and applicability of single machine learning algorithms and ensemble learning (EL) under different data missing scenarios, as well as whether they had more advantages than traditional methods, so as to provide basis and reference for the selection of suitable missing data processing method in practical clinical decision making. Methods The whole process consisted of four main steps: (1) Based on the original complete data set, missing data was generated by simulation under different missing scenarios (missing mechanisms, missing proportions and ratios of missing proportions of each group). (2) Machine learning and traditional methods (eight methods in total) were applied to impute missing values. (3) The performances of imputation techniques were evaluated and compared by estimating the sensitivity, AUC and Kappa values of prediction models. (4) Statistical tests were used to evaluate whether the observed performance differences were statistically significant. Results The performances of missing data processing methods were different to a certain extent in different missing scenarios. On the whole, machine learning had better imputation performance than traditional methods, especially in scenarios with high missing proportions. Compared with single machine learning algorithms, the performance of EL was more prominent, followed by neural networks. Meanwhile, EL was most suitable for missing imputation under MAR (the ratio of missing proportion 2:1) mechanism, and its average sensitivity, AUC and Kappa values reached 0.908, 0.924 and 0.596 respectively. Conclusions In clinical decision making, the characteristics of missing data should be actively explored before formulating missing data processing strategies. The outstanding imputation performance of machine learning methods, especially EL, shed light on the development of missing data processing technology, and provided methodological support for clinical decision making in presence of incomplete data.

Rie Toyomoto ◽  
Satoshi Funada ◽  
Toshi A. Furukawa

Ryan J. Van Lieshout ◽  
Calan Savoy ◽  
Steven Hanna

Sign in / Sign up

Export Citation Format

Share Document