missing value imputation Latest Research Papers

Recently, the industry of healthcare started generating a large volume of datasets. If hospitals can employ the data, they could easily predict the outcomes and provide better treatments at early stages with low cost. Here, data analytics (DA) was used to make correct decisions through proper analysis and prediction. However, inappropriate data may lead to flawed analysis and thus yield unacceptable conclusions. Hence, transforming the improper data from the entire data set into useful data is essential. Machine learning (ML) technique was used to overcome the issues due to incomplete data. A new architecture, automatic missing value imputation (AMVI) was developed to predict missing values in the dataset, including data sampling and feature selection. Four prediction models (i.e., logistic regression, support vector machine (SVM), AdaBoost, and random forest algorithms) were selected from the well-known classification. The complete AMVI architecture performance was evaluated using a structured data set obtained from the UCI repository. Accuracy of around 90% was achieved. It was also confirmed from cross-validation that the trained ML model is suitable and not over-fitted. This trained model is developed based on the dataset, which is not dependent on a specific environment. It will train and obtain the outperformed model depending on the data available.

Download Full-text

Smartic: A smart tool for Big Data analytics and IoT

F1000Research ◽

10.12688/f1000research.73613.1 ◽

2022 ◽

Vol 11 ◽

pp. 17

Author(s):

Shohel Sayeed ◽

Abu Fuad Ahmad ◽

Tan Choo Peng

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Model Performance ◽

Principal Component ◽

Missing Value ◽

Missing Value Imputation ◽

Standard Data ◽

Digital World ◽

Massive Scale

The Internet of Things (IoT) is leading the physical and digital world of technology to converge. Real-time and massive scale connections produce a large amount of versatile data, where Big Data comes into the picture. Big Data refers to large, diverse sets of information with dimensions that go beyond the capabilities of widely used database management systems, or standard data processing software tools to manage within a given limit. Almost every big dataset is dirty and may contain missing data, mistyping, inaccuracies, and many more issues that impact Big Data analytics performances. One of the biggest challenges in Big Data analytics is to discover and repair dirty data; failure to do this can lead to inaccurate analytics results and unpredictable conclusions. We experimented with different missing value imputation techniques and compared machine learning (ML) model performances with different imputation methods. We propose a hybrid model for missing value imputation combining ML and sample-based statistical techniques. Furthermore, we continued with the best missing value inputted dataset, chosen based on ML model performance for feature engineering and hyperparameter tuning. We used k-means clustering and principal component analysis. Accuracy, the evaluated outcome, improved dramatically and proved that the XGBoost model gives very high accuracy at around 0.125 root mean squared logarithmic error (RMSLE). To overcome overfitting, we used K-fold cross-validation.

Download Full-text

Deep learning for missing value imputation of continuous data and the effect of data discretization

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.108079 ◽

2022 ◽

pp. 108079

Author(s):

Wei-Chao Lin ◽

Chih-Fong Tsai ◽

Jia Rong Zhong

Keyword(s):

Deep Learning ◽

Continuous Data ◽

Missing Value ◽

Missing Value Imputation ◽

Data Discretization

Download Full-text

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

10.1109/bigdata52589.2021.9671338 ◽

2021 ◽

Author(s):

Jiayin Yu ◽

Yulin He ◽

Joshua Zhexue Huang

Keyword(s):

Neural Network ◽

Imputation Method ◽

Two Stage ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182212213 ◽

2021 ◽

Vol 18 (22) ◽

pp. 12213

Author(s):

Taesung Kim ◽

Jinhee Kim ◽

Wonho Yang ◽

Hunjoo Lee ◽

Jaegul Choo

Keyword(s):

Time Series ◽

Deep Learning ◽

Air Quality ◽

Time Series Data ◽

Quality Data ◽

Series Data ◽

Missing Value ◽

Missing Value Imputation ◽

Spatio Temporal ◽

Air Quality Data

To prevent severe air pollution, it is important to analyze time-series air quality data, but this is often challenging as the time-series data is usually partially missing, especially when it is collected from multiple locations simultaneously. To solve this problem, various deep-learning-based missing value imputation models have been proposed. However, often they are barely interpretable, which makes it difficult to analyze the imputed data. Thus, we propose a novel deep learning-based imputation model that achieves high interpretability as well as shows great performance in missing value imputation for spatio-temporal data. We verify the effectiveness of our method through quantitative and qualitative results on a publicly available air-quality dataset.

Download Full-text

Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100799 ◽

2021 ◽

pp. 100799

Author(s):

Md. Kamrul Hasan ◽

Md. Ashraful Alam ◽

Shidhartho Roy ◽

Aishwariya Dutta ◽

Md. Tasnim Jawad ◽

...

Keyword(s):

Machine Learning ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Quality & Quantity ◽

10.1007/s11135-021-01186-8 ◽

2021 ◽

Author(s):

David Wutchiett ◽

Claire Durand

Keyword(s):

Time Series ◽

Missing Value ◽

Missing Value Imputation ◽

Context Data

Download Full-text

Normalization and outlier removal in class center-based firefly algorithm for missing value imputation

Journal Of Big Data ◽

10.1186/s40537-021-00518-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Heru Nugroho ◽

Nugraha Priya Utama ◽

Kridanto Surendro

Keyword(s):

Incomplete Data ◽

Statistical Power ◽

Firefly Algorithm ◽

Missing Values ◽

Outlier Removal ◽

Processing Stage ◽

Missing Value ◽

Missing Value Imputation ◽

Almost All ◽

True Values

AbstractA missing value is one of the factors that often cause incomplete data in almost all studies, even those that are well-designed and controlled. It can also decrease a study’s statistical power or result in inaccurate estimations and conclusions. Hence, data normalization and missing value handling are considered the major problems in the data pre-processing stage, while classification algorithms are adopted to handle numerical features. In cases where the observed data contained outliers, the missing value estimated results are sometimes unreliable or even differ greatly from the true values. Therefore, this study aims to propose the combination of normalization and outlier removals before imputing missing values on the class center-based firefly algorithm method (ON + C3FA). Moreover, some standard imputation techniques like mean, a random value, regression, as well as multiple imputation, KNN imputation, and decision tree (DT)-based missing value imputation were utilized as a comparison of the proposed method. Experimental results on the sonar dataset showed normalization and outlier removals effect in the methods. According to the proposed method (ON + C3FA), AUC, accuracy, F1-Score, Precision, Recall, and AUC-PR had 0.972, 0.906, 0.906, 0.908, 0.906, 0.61 respectively. The result showed combining normalization and outlier removals in C3-FA (ON + C3FA) was an efficient technique for obtaining actual data in handling missing values, and it also outperformed the previous studies methods with r and RMSE values of 0.935 and 0.02. Meanwhile, the Dks value obtained from this technique was 0.04, which indicated that it could maintain the values or distribution accuracy.

Download Full-text

Multivariate Missing Value Imputation using Weighted Cluster Softmax Technique

10.1201/9781003202240-21 ◽

2021 ◽

pp. 128-133

Author(s):

Arijit Maji ◽

Sultan Amed ◽

Tanmay Sen

Keyword(s):

Missing Value ◽

Missing Value Imputation

Download Full-text

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

International Journal of Molecular Sciences ◽

10.3390/ijms22179650 ◽

2021 ◽

Vol 22 (17) ◽

pp. 9650

Author(s):

Miranda L. Gardner ◽

Michael A. Freitas

Keyword(s):

Multiple Imputation ◽

Missing At Random ◽

Data Sets ◽

Proteomics Data ◽

Missing Not At Random ◽

Differential Abundance ◽

Missing Value ◽

Bottom Up ◽

Missing Value Imputation ◽

Impute Data

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.

Download Full-text

missing value imputation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic missing value imputation for cleaning phase of diabetic’s readmission prediction model

Smartic: A smart tool for Big Data analytics and IoT

Deep learning for missing value imputation of continuous data and the effect of data discretization

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Normalization and outlier removal in class center-based firefly algorithm for missing value imputation

Multivariate Missing Value Imputation using Weighted Cluster Softmax Technique

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

Export Citation Format

missing value imputationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Automatic missing value imputation for cleaning phase of diabetic’s readmission prediction model

Smartic: A smart tool for Big Data analytics and IoT

Deep learning for missing value imputation of continuous data and the effect of data discretization

A Two-Stage Missing Value Imputation Method Based on Autoencoder Neural Network

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021)

Multilevel and time-series missing value imputation for combined survey and longitudinal context data

Normalization and outlier removal in class center-based firefly algorithm for missing value imputation

Multivariate Missing Value Imputation using Weighted Cluster Softmax Technique

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

missing value imputation
Recently Published Documents