missing data Latest Research Papers

Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN

International Journal of Business Analytics ◽

10.4018/ijban.292055 ◽

2022 ◽

Vol 9 (3) ◽

pp. 0-0

Keyword(s):

Data Analysis ◽

Missing Data ◽

Measurement Errors ◽

Missing Values ◽

Missing Value ◽

Hybrid Schemes ◽

Imputation Methods ◽

Research Fields ◽

Data Missing ◽

The Given

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.

Causal Feature Selection with Missing Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3488055 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-24

Author(s):

Kui Yu ◽

Yajing Yang ◽

Wei Ding

Keyword(s):

Feature Selection ◽

Missing Data ◽

Real World ◽

Missing Values ◽

Prediction Models ◽

Causal Structure ◽

Data Imputation ◽

Accurate Data ◽

Unified Framework ◽

Class Variable

Causal feature selection aims at learning the Markov blanket (MB) of a class variable for feature selection. The MB of a class variable implies the local causal structure among the class variable and its MB and all other features are probabilistically independent of the class variable conditioning on its MB, this enables causal feature selection to identify potential causal features for feature selection for building robust and physically meaningful prediction models. Missing data, ubiquitous in many real-world applications, remain an open research problem in causal feature selection due to its technical complexity. In this article, we discuss a novel multiple imputation MB (MimMB) framework for causal feature selection with missing data. MimMB integrates Data Imputation with MB Learning in a unified framework to enable the two key components to engage with each other. MB Learning enables Data Imputation in a potentially causal feature space for achieving accurate data imputation, while accurate Data Imputation helps MB Learning identify a reliable MB of the class variable in turn. Then, we further design an enhanced kNN estimator for imputing missing values and instantiate the MimMB. In our comprehensively experimental evaluation, our new approach can effectively learn the MB of a given variable in a Bayesian network and outperforms other rival algorithms using synthetic and real-world datasets.

Penalized Regression for Multiple Types of Many Features With Missing Data

Statistica Sinica ◽

10.5705/ss.202020.0401 ◽

2023 ◽

Author(s):

Kin Yau Wong ◽

Donglin Zeng ◽

Danyu Lin

Keyword(s):

Missing Data ◽

Penalized Regression

Robust estimation of traffic density with missing data using an adaptive-R extended Kalman filter

Applied Mathematics and Computation ◽

10.1016/j.amc.2022.126915 ◽

2022 ◽

Vol 421 ◽

pp. 126915

Author(s):

A.S.M. Bakibillah ◽

Yong Hwa Tan ◽

Junn Yong Loo ◽

Chee Pin Tan ◽

M.A.S. Kamal ◽

...

Keyword(s):

Kalman Filter ◽

Missing Data ◽

Extended Kalman Filter ◽

Robust Estimation ◽

Traffic Density

Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2021-221477 ◽

2022 ◽

pp. annrheumdis-2021-221477

Author(s):

Denis Mongin ◽

Kim Lauper ◽

Axel Finckh ◽

Thomas Frisell ◽

Delphine Sophie Courvoisier

Keyword(s):

Missing Data ◽

Disease Activity ◽

Multiple Imputation ◽

Simulation Study ◽

Comparative Effectiveness ◽

Real World Data ◽

Data Set ◽

Multiple Imputations ◽

True Value ◽

The Absolute

ObjectivesTo assess the performance of statistical methods used to compare the effectiveness between drugs in an observational setting in the presence of attrition.MethodsIn this simulation study, we compared the estimations of low disease activity (LDA) at 1 year produced by complete case analysis (CC), last observation carried forward (LOCF), LUNDEX, non-responder imputation (NRI), inverse probability weighting (IPW) and multiple imputations of the outcome. All methods were adjusted for confounders. The reasons to stop the treatments were included in the multiple imputation method (confounder-adjusted response rate with attrition correction, CARRAC) and were either included (IPW2) or not (IPW1) in the IPW method. A realistic simulation data set was generated from a real-world data collection. The amount of missing data caused by attrition and its dependence on the ‘true’ value of the data missing were varied to assess the robustness of each method to these changes.ResultsLUNDEX and NRI strongly underestimated the absolute LDA difference between two treatments, and their estimates were highly sensitive to the amount of attrition. IPW1 and CC overestimated the absolute LDA difference between the two treatments and the overestimation increased with increasing attrition or when missingness depended on disease activity at 1 year. IPW2 and CARRAC produced unbiased estimations, but IPW2 had a greater sensitivity to the missing pattern of data and the amount of attrition than CARRAC.ConclusionsOnly multiple imputation and IPW2, which considered both confounding and treatment cessation reasons, produced accurate comparative effectiveness estimates.

The Fitting Optimization Path Analysis on Scale Missing Data: Based on the 507 Patients of Poststroke Depression Measured by SDS

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2022/5630748 ◽

2022 ◽

Vol 2022 ◽

pp. 1-8

Author(s):

Xiaoying Lv ◽

Ruonan Zhao ◽

Tongsheng Su ◽

Liyun He ◽

Rui Song ◽

...

Keyword(s):

Missing Data ◽

Missing At Random ◽

Data Simulation ◽

Random Forest Regression ◽

Data Set ◽

Missing Completely At Random ◽

Predictive Mean Matching ◽

Fitting Method ◽

Best Fitting ◽

The Cost

Objective. To explore the optimal fitting path of missing data of the Scale to make the fitting data close to the real situation of patients’ data. Methods. Based on the complete data set of the SDS of 507 patients with stroke, the data simulation sets of Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR) were constructed by R software, respectively, with missing rates of 5%, 10%, 15%, 20%, 25%, 30%, 35%, and 40% under three missing mechanisms. Mean substitution (MS), random forest regression (RFR), and predictive mean matching (PMM) were used to fit the data. Root mean square error (RMSE), the width of 95% confidence intervals (95% CI), and Spearman correlation coefficient (SCC) were used to evaluate the fitting effect and determine the optimal fitting path. Results. when dealing with the problem of missing data in scales, the optimal fitting path is ① under the MCAR deletion mechanism, when the deletion proportion is less than 20%, the MS method is the most convenient; when the missing ratio is greater than 20%, RFR algorithm is the best fitting method. ② Under the Mar mechanism, when the deletion ratio is less than 35%, the MS method is the most convenient. When the deletion ratio is greater than 35%, RFR has a better correlation. ③ Under the mechanism of MNAR, RFR is the best data fitting method, especially when the missing proportion is greater than 30%. In reality, when the deletion ratio is small, the complete case deletion method is the most commonly used, but the RFR algorithm can greatly expand the application scope of samples and save the cost of clinical research when the deletion ratio is less than 30%. The best way to deal with data missing should be based on the missing mechanism and proportion of actual data, and choose the best method between the statistical analysis ability of the research team, the effectiveness of the method, and the understanding of readers.

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-022-01752-6 ◽

2022 ◽

Vol 22 (1) ◽

Author(s):

Huimin Wang ◽

Jianxiang Tang ◽

Mengyao Wu ◽

Xiaoyu Wang ◽

Tao Zhang

Keyword(s):

Machine Learning ◽

Decision Making ◽

Missing Data ◽

Intracerebral Hemorrhage ◽

Data Processing ◽

Missing Values ◽

Clinical Decision Making ◽

Clinical Decision ◽

Machine Learning Algorithms ◽

Traditional Methods

Abstract Background There are often many missing values in medical data, which directly affect the accuracy of clinical decision making. Discharge assessment is an important part of clinical decision making. Taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example, this study adopted the missing data processing evaluation criteria more suitable for clinical decision making, aiming at systematically exploring the performance and applicability of single machine learning algorithms and ensemble learning (EL) under different data missing scenarios, as well as whether they had more advantages than traditional methods, so as to provide basis and reference for the selection of suitable missing data processing method in practical clinical decision making. Methods The whole process consisted of four main steps: (1) Based on the original complete data set, missing data was generated by simulation under different missing scenarios (missing mechanisms, missing proportions and ratios of missing proportions of each group). (2) Machine learning and traditional methods (eight methods in total) were applied to impute missing values. (3) The performances of imputation techniques were evaluated and compared by estimating the sensitivity, AUC and Kappa values of prediction models. (4) Statistical tests were used to evaluate whether the observed performance differences were statistically significant. Results The performances of missing data processing methods were different to a certain extent in different missing scenarios. On the whole, machine learning had better imputation performance than traditional methods, especially in scenarios with high missing proportions. Compared with single machine learning algorithms, the performance of EL was more prominent, followed by neural networks. Meanwhile, EL was most suitable for missing imputation under MAR (the ratio of missing proportion 2:1) mechanism, and its average sensitivity, AUC and Kappa values reached 0.908, 0.924 and 0.596 respectively. Conclusions In clinical decision making, the characteristics of missing data should be actively explored before formulating missing data processing strategies. The outstanding imputation performance of machine learning methods, especially EL, shed light on the development of missing data processing technology, and provided methodological support for clinical decision making in presence of incomplete data.

Some Concerns About Imputation Methods for Missing Data

JAMA Psychiatry ◽

10.1001/jamapsychiatry.2021.3894 ◽

2022 ◽

Author(s):

Rie Toyomoto ◽

Satoshi Funada ◽

Toshi A. Furukawa

Keyword(s):

Missing Data ◽

Imputation Methods

Performance of Model Fit and Selection Indices for Bayesian Structural Equation Modeling with Missing Data

Structural Equation Modeling A Multidisciplinary Journal ◽

10.1080/10705511.2021.2018656 ◽

2022 ◽

pp. 1-19

Author(s):

Sonja D. Winter ◽

Sarah Depaoli

Keyword(s):

Structural Equation Modeling ◽

Missing Data ◽

Structural Equation ◽

Model Fit ◽

Equation Modeling ◽

Selection Indices ◽

Bayesian Structural Equation Modeling

Some Concerns About Imputation Methods for Missing Data—Reply

JAMA Psychiatry ◽

10.1001/jamapsychiatry.2021.3897 ◽

2022 ◽

Author(s):

Ryan J. Van Lieshout ◽

Calan Savoy ◽

Steven Hanna

Keyword(s):

Missing Data ◽

Imputation Methods

missing data
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN

Causal Feature Selection with Missing Data

Penalized Regression for Multiple Types of Many Features With Missing Data

Robust estimation of traffic density with missing data using an adaptive-R extended Kalman filter

Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study

The Fitting Optimization Path Analysis on Scale Missing Data: Based on the 507 Patients of Poststroke Depression Measured by SDS

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example

Some Concerns About Imputation Methods for Missing Data

Performance of Model Fit and Selection Indices for Bayesian Structural Equation Modeling with Missing Data

Some Concerns About Imputation Methods for Missing Data—Reply

Export Citation Format

missing dataRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN

Causal Feature Selection with Missing Data

Penalized Regression for Multiple Types of Many Features With Missing Data

Robust estimation of traffic density with missing data using an adaptive-R extended Kalman filter

Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study

The Fitting Optimization Path Analysis on Scale Missing Data: Based on the 507 Patients of Poststroke Depression Measured by SDS

Application of machine learning missing data imputation techniques in clinical decision making: taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example

Some Concerns About Imputation Methods for Missing Data

Performance of Model Fit and Selection Indices for Bayesian Structural Equation Modeling with Missing Data

Some Concerns About Imputation Methods for Missing Data—Reply

missing data
Recently Published Documents