imputation method Latest Research Papers

Data pre-processing plays a vital role in the life cycle of data mining for accomplishing quality outcomes. In this paper, it is experimentally shown the importance of data pre-processing to achieve highly accurate classifier outcomes by imputing missing values using a novel imputation method, CLUSTPRO, by selecting highly correlated features using Correlation-based Variable Selection (CVS) and by handling imbalanced data using Synthetic Minority Over-sampling Technique (SMOTE). The proposed CLUSTPRO method makes use of Random Forest (RF) and Expectation Maximization (EM) algorithms to impute missing. The imputed results are evaluated using standard evaluation metrics. The CLUSTPRO imputation method outperforms existing, state-of-the-art imputation methods. The combined approach of imputation, feature selection, and imbalanced data handling techniques has significantly contributed to attaining an improved classification accuracy (AUC curve) of 40%–50% in comparison with results obtained without any pre-processing.

Download Full-text

Imputation Method for Missing Values Considering Temperature Fluctuations with High Locality

Total Quality Science ◽

10.17929/tqs.7.1 ◽

2021 ◽

Vol 7 (1) ◽

pp. 1-9

Author(s):

Keitoku Yoshino ◽

Shizu Itaka ◽

Tomomichi Suzuki

Keyword(s):

Missing Values ◽

Temperature Fluctuations ◽

Imputation Method

Download Full-text

A Short-Term Solar Forecasting Platform Using a Physics-Based Smart Persistence Model and Data Imputation Method

10.2172/1837967 ◽

2021 ◽

Author(s):

Srinath Yelchuri ◽

A. Rangaraj ◽

Yu Xie ◽

Aron Habte ◽

Mohit Joshi ◽

...

Keyword(s):

Imputation Method ◽

Data Imputation ◽

Solar Forecasting ◽

Short Term ◽

Persistence Model

Download Full-text

Single Imputation Methods and Confidence Intervals for the Gini Index

Mathematics ◽

10.3390/math9243252 ◽

2021 ◽

Vol 9 (24) ◽

pp. 3252

Author(s):

Encarnación Álvarez-Verdejo ◽

Pablo J. Moya-Fernández ◽

Juan F. Muñoz-Rosas

Keyword(s):

Missing Data ◽

Correlation Coefficient ◽

Confidence Intervals ◽

Gini Index ◽

Imputation Method ◽

Empirical Measures ◽

Imputation Methods ◽

Single Imputation ◽

Regression Imputation ◽

Mean Square Errors

The problem of missing data is a common feature in any study, and a single imputation method is often applied to deal with this problem. The first contribution of this paper is to analyse the empirical performance of some traditional single imputation methods when they are applied to the estimation of the Gini index, a popular measure of inequality used in many studies. Various methods for constructing confidence intervals for the Gini index are also empirically evaluated. We consider several empirical measures to analyse the performance of estimators and confidence intervals, allowing us to quantify the magnitude of the non-response bias problem. We find extremely large biases under certain non-response mechanisms, and this problem gets noticeably worse as the proportion of missing data increases. For a large correlation coefficient between the target and auxiliary variables, the regression imputation method may notably mitigate this bias problem, yielding appropriate mean square errors. We also find that confidence intervals have poor coverage rates when the probability of data being missing is not uniform, and that the regression imputation method substantially improves the handling of this problem as the correlation coefficient increases.

Download Full-text

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

10.1109/bigdata52589.2021.9671338 ◽

2021 ◽

Author(s):

Jiayin Yu ◽

Yulin He ◽

Joshua Zhexue Huang

Keyword(s):

Neural Network ◽

Imputation Method ◽

Two Stage ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Journal of Applied Intelligent System ◽

10.33633/jais.v6i2.5288 ◽

2021 ◽

Vol 6 (2) ◽

pp. 134-143

Author(s):

Bijanto Bijanto ◽

Ryan Yunus

Keyword(s):

Naive Bayes ◽

Research Process ◽

Naïve Bayes ◽

Imputation Method ◽

Parameter Estimates ◽

Process Data ◽

Missing Value ◽

Mean Imputation ◽

Public Data ◽

The Mean

The lost impact on the research process, can be serious in classifying results leading to biased parameter estimates, statistical information, decreased quality, increased standard error, and weak generalization of the findings. In this paper, we discuss the problems that exist in one of the algorithms, namely the Naive Bayes Kernel algorithm. The Naive Bayes kernel algorithm has the disadvantage of not being able to process data with the mission value. Therefore, in order to process missing value data, there is one method that we propose to overcome, namely using the mean imputation method. The data we use is public data from UCI, namely the HCV (Hepatisis C Virus) dataset. The input method used to correct the missing data so that it can be filled with the average value of the existing data. Before the imputation process means, the dataset uses yahoo bootstrap first. The data that has been corrected using the mean imputation method has just been processed using the Naive Bayes Kernel Algorithm. From the results of the research tests that have been carried out, it can be obtained an accuracy value of 96.05% and the speed of the data computing process with 1 second.

Download Full-text

Handling Cellwise Outliers by Sparse Regression and Robust Covariance

Journal of Data Science, Statistics, and Visualisation ◽

10.52933/jdssv.v1i3.18 ◽

2021 ◽

Vol 1 (3) ◽

Author(s):

Jakob Raymaekers ◽

Peter Rousseeuw

Keyword(s):

Covariance Matrix ◽

Physical Interpretation ◽

Real Data ◽

Imputation Method ◽

Analytic Method ◽

Lasso Regression ◽

Penalty Term ◽

The Em Algorithm ◽

Volatile Organic ◽

Data Analytic

We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellFlagger technique which combines lasso regression with a stepwise application of constructed cutoff values. The penalty term of the lasso has a physical interpretation as the total distance that suspicious cells need to move in order to bring their row into the fold. For estimating a cellwise robust covariance matrix we construct a detection-imputation method which alternates between flagging outlying cells and updating the covariance matrix as in the EM algorithm. The proposed methods are illustrated by simulations and on real data about volatile organic compounds in children.

Download Full-text

An improved tucker decomposition‐based imputation method for recovering lane‐level missing values in traffic data

IET Intelligent Transport Systems ◽

10.1049/itr2.12148 ◽

2021 ◽

Author(s):

Wenqi Lu ◽

Tian Zhou ◽

Linheng Li ◽

Yuanli Gu ◽

Yikang Rui ◽

...

Keyword(s):

Missing Values ◽

Imputation Method ◽

Traffic Data ◽

Tucker Decomposition

Download Full-text

An efficient scRNA-seq dropout imputation method using graph attention network

BMC Bioinformatics ◽

10.1186/s12859-021-04493-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Chenyang Xu ◽

Lei Cai ◽

Jingyang Gao

Keyword(s):

Single Cell ◽

Pearson Correlation ◽

Absolute Error ◽

Imputation Method ◽

Adjusted Rand Index ◽

Normalized Mutual Information ◽

Cell Library ◽

Shared Information ◽

Cell Data ◽

High Chance

Abstract Background Single-cell sequencing technology can address the amount of single-cell library data at the same time and display the heterogeneity of different cells. However, analyzing single-cell data is a computationally challenging problem. Because there are low counts in the gene expression region, it has a high chance of recognizing the non-zero entity as zero, which are called dropout events. At present, the mainstream dropout imputation methods cannot effectively recover the true expression of cells from dropout noise such as DCA, MAGIC, scVI, scImpute and SAVER. Results In this paper, we propose an autoencoder structure network, named GNNImpute. GNNImpute uses graph attention convolution to aggregate multi-level similar cell information and implements convolution operations on non-Euclidean space on scRNA-seq data. Distinct from current imputation tools, GNNImpute can accurately and effectively impute the dropout and reduce dropout noise. We use mean square error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC) and Cosine similarity (CS) to measure the performance of different methods with GNNImpute. We analyze four real datasets, and our results show that the GNNImpute achieves 3.0130 MSE, 0.6781 MAE, 0.9073 PCC and 0.9134 CS. Furthermore, we use Adjusted rand index (ARI) and Normalized mutual information (NMI) to measure the clustering effect. The GNNImpute achieves 0.8199 (ARI) and 0.8368 (NMI), respectively. Conclusions In this investigation, we propose a single-cell dropout imputation method (GNNImpute), which effectively utilizes shared information for imputing the dropout of scRNA-seq data. We test it with different real datasets and evaluate its effectiveness in MSE, MAE, PCC and CS. The results show that graph attention convolution and autoencoder structure have great potential in single-cell dropout imputation.

Download Full-text

imputation method
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Correction: SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Efficient Pre-Processing Techniques for Improving Classifiers Performance

Imputation Method for Missing Values Considering Temperature Fluctuations with High Locality

A Short-Term Solar Forecasting Platform Using a Physics-Based Smart Persistence Model and Data Imputation Method

Single Imputation Methods and Confidence Intervals for the Gini Index

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Handling Cellwise Outliers by Sparse Regression and Robust Covariance

An improved tucker decomposition‐based imputation method for recovering lane‐level missing values in traffic data

An efficient scRNA-seq dropout imputation method using graph attention network

Export Citation Format

imputation methodRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Correction: SDImpute: A statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data

Efficient Pre-Processing Techniques for Improving Classifiers Performance

Imputation Method for Missing Values Considering Temperature Fluctuations with High Locality

A Short-Term Solar Forecasting Platform Using a Physics-Based Smart Persistence Model and Data Imputation Method

Single Imputation Methods and Confidence Intervals for the Gini Index

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

Improvement of Accuracy and Handling of Missing Value Data in the Naive Bayes Kernel Algorithm

Handling Cellwise Outliers by Sparse Regression and Robust Covariance

An improved tucker decomposition‐based imputation method for recovering lane‐level missing values in traffic data

An efficient scRNA-seq dropout imputation method using graph attention network

imputation method
Recently Published Documents