imputation methods Latest Research Papers

Missing data is universal complexity for most part of the research fields which introduces the part of uncertainty into data analysis. We can take place due to many types of motives such as samples mishandling, unable to collect an observation, measurement errors, aberrant value deleted, or merely be short of study. The nourishment area is not an exemption to the difficulty of data missing. Most frequently, this difficulty is determined by manipulative means or medians from the existing datasets which need improvements. The paper proposed hybrid schemes of MICE and ANN known as extended ANN to search and analyze the missing values and perform imputations in the given dataset. The proposed mechanism is efficiently able to analyze the blank entries and fill them with proper examining their neighboring records in order to improve the accuracy of the dataset. In order to validate the proposed scheme, the extended ANN is further compared against various recent algorithms or mechanisms to analyze the efficiency as well as the accuracy of the results.

Download Full-text

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

PLoS ONE ◽

10.1371/journal.pone.0262131 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0262131

Author(s):

Adil Aslam Mir ◽

Kimberlee Jane Kearfott ◽

Fatih Vehbi Çelebi ◽

Muhammad Rafique

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Mean Squared Error ◽

Learning Algorithm ◽

Series Data ◽

Machine Learning Method ◽

Learning Method ◽

Imputation Methods ◽

Squared Error ◽

Feature Importance

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1st March 2017 to the 11th of May 2018, including 4 seismic activities that have taken place during the data collection time.

Download Full-text

Some Concerns About Imputation Methods for Missing Data

JAMA Psychiatry ◽

10.1001/jamapsychiatry.2021.3894 ◽

2022 ◽

Author(s):

Rie Toyomoto ◽

Satoshi Funada ◽

Toshi A. Furukawa

Keyword(s):

Missing Data ◽

Imputation Methods

Download Full-text

Some Concerns About Imputation Methods for Missing Data—Reply

JAMA Psychiatry ◽

10.1001/jamapsychiatry.2021.3897 ◽

2022 ◽

Author(s):

Ryan J. Van Lieshout ◽

Calan Savoy ◽

Steven Hanna

Keyword(s):

Missing Data ◽

Imputation Methods

Download Full-text

Comparison of Missing Data Imputation Methods in Time Series Forecasting

Computers Materials & Continua ◽

10.32604/cmc.2022.019369 ◽

2022 ◽

Vol 70 (1) ◽

pp. 767-779

Author(s):

Hyun Ahn ◽

Kyunghee Sun ◽

Kwanghoon Pio Kim

Keyword(s):

Time Series ◽

Missing Data ◽

Time Series Forecasting ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods

Download Full-text

Single Imputation Methods and Confidence Intervals for the Gini Index

Mathematics ◽

10.3390/math9243252 ◽

2021 ◽

Vol 9 (24) ◽

pp. 3252

Author(s):

Encarnación Álvarez-Verdejo ◽

Pablo J. Moya-Fernández ◽

Juan F. Muñoz-Rosas

Keyword(s):

Missing Data ◽

Correlation Coefficient ◽

Confidence Intervals ◽

Gini Index ◽

Imputation Method ◽

Empirical Measures ◽

Imputation Methods ◽

Single Imputation ◽

Regression Imputation ◽

Mean Square Errors

The problem of missing data is a common feature in any study, and a single imputation method is often applied to deal with this problem. The first contribution of this paper is to analyse the empirical performance of some traditional single imputation methods when they are applied to the estimation of the Gini index, a popular measure of inequality used in many studies. Various methods for constructing confidence intervals for the Gini index are also empirically evaluated. We consider several empirical measures to analyse the performance of estimators and confidence intervals, allowing us to quantify the magnitude of the non-response bias problem. We find extremely large biases under certain non-response mechanisms, and this problem gets noticeably worse as the proportion of missing data increases. For a large correlation coefficient between the target and auxiliary variables, the regression imputation method may notably mitigate this bias problem, yielding appropriate mean square errors. We also find that confidence intervals have poor coverage rates when the probability of data being missing is not uniform, and that the regression imputation method substantially improves the handling of this problem as the correlation coefficient increases.

Download Full-text

Identification and Estimation of Graphical Models with Nonignorable Nonresponse

Journal of Mathematics ◽

10.1155/2021/7570222 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Lingju Chen ◽

Shaoxin Hong ◽

Bo Tang

Keyword(s):

Sensitivity Analysis ◽

Graphical Models ◽

Nonlinear Models ◽

Real Data ◽

Finite Sample ◽

Imputation Methods ◽

Observable Variable ◽

The Mean ◽

Nonignorable Nonresponse ◽

Finite Sample Simulations

We study the identification and estimation of graphical models with nonignorable nonresponse. An observable variable correlated to nonresponse is added to identify the mean of response for the unidentifiable model. An approach to estimating the marginal mean of response is proposed, based on simulation imputation methods which are introduced for a variety of models including linear, generalized linear, and monotone nonlinear models. The proposed mean estimators are N -consistent, where N is the sample size. Finite sample simulations confirm the effectiveness of the proposed method. Sensitivity analysis for the untestable assumption on our augmented model is also conducted. A real data example is employed to illustrate the use of the proposed methodology.

Download Full-text

Comparison of Imputation Methods for End-User Demands in Water Distribution Systems

Journal of Water Resources Planning and Management ◽

10.1061/(asce)wr.1943-5452.0001477 ◽

2021 ◽

Vol 147 (12) ◽

pp. 04021080

Author(s):

Sanghoon Jun ◽

Donghwi Jung ◽

Kevin E. Lansey

Keyword(s):

Distribution Systems ◽

Water Distribution ◽

Water Distribution Systems ◽

End User ◽

Imputation Methods

Download Full-text

A New Multispectral Data Augmentation Technique Based on Data Imputation

Remote Sensing ◽

10.3390/rs13234875 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4875

Author(s):

Álvaro Acción ◽

Francisco Argüello ◽

Dora B. Heras

Keyword(s):

Remote Sensing ◽

Data Augmentation ◽

Matrix Completion ◽

Computational Cost ◽

High Energy ◽

Classification Performance ◽

Training Data ◽

High Spectral Resolution ◽

Data Imputation ◽

Imputation Methods

Deep Learning (DL) has been recently introduced into the hyperspectral and multispectral image classification landscape. Despite the success of DL in the remote sensing field, DL models are computationally intensive due to the large number of parameters they need to learn. The high density of information present in remote sensing imagery with high spectral resolution can make the application of DL models to large scenes challenging. Methods such as patch-based classification require large amounts of data to be processed during the training and prediction stages, which translates into long processing times and high energy consumption. One of the solutions to decrease the computational cost of these models is to perform segment-based classification. Segment-based classification schemes can significantly decrease training and prediction times, and also offer advantages over simply reducing the size of the training datasets by randomly sampling training data. The lack of a large enough number of samples can, however, pose an additional challenge, causing these models to not generalize properly. Data augmentation methods are used to generate new synthetic samples based on existing data to increase the classification performance. In this work, we propose a new data augmentation scheme using data imputation and matrix completion methods for segment-based classification. The proposal has been validated using two high-resolution multispectral datasets from the literature. The results obtained show that the proposed approach successfully increases the classification performance across all the scenes tested and that data imputation methods applied to multispectral imagery are a valid means to perform data augmentation. A comparison of classification accuracy between different imputation methods applied to the proposed scheme was also carried out.

Download Full-text

imputation methods
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A case study in the tropical region to evaluate univariate imputation methods for solar irradiance data with different weather types

Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

Some Concerns About Imputation Methods for Missing Data

Some Concerns About Imputation Methods for Missing Data—Reply

Comparison of Missing Data Imputation Methods in Time Series Forecasting

Single Imputation Methods and Confidence Intervals for the Gini Index

Identification and Estimation of Graphical Models with Nonignorable Nonresponse

Comparison of Imputation Methods for End-User Demands in Water Distribution Systems

A New Multispectral Data Augmentation Technique Based on Data Imputation

Export Citation Format

imputation methodsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A case study in the tropical region to evaluate univariate imputation methods for solar irradiance data with different weather types

Futuristic Prediction of Missing Value Imputation Methods Using Extended ANN

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

Some Concerns About Imputation Methods for Missing Data

Some Concerns About Imputation Methods for Missing Data—Reply

Comparison of Missing Data Imputation Methods in Time Series Forecasting

Single Imputation Methods and Confidence Intervals for the Gini Index

Identification and Estimation of Graphical Models with Nonignorable Nonresponse

Comparison of Imputation Methods for End-User Demands in Water Distribution Systems

A New Multispectral Data Augmentation Technique Based on Data Imputation

imputation methods
Recently Published Documents