scholarly journals A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis

Electronics ◽  
2021 ◽  
Vol 10 (24) ◽  
pp. 3167
Author(s):  
Mohammad H. Nadimi-Shahraki ◽  
Saeed Mohammadi ◽  
Hoda Zamani ◽  
Mostafa Gandomi ◽  
Amir H. Gandomi

Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems. Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns. In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns. In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns. Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used. Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset. The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns. The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score. The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.

2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Kanisa Chodjuntug ◽  
Nuanpan Lawson

Due to its impact on health and quality of life, Thailand’s ozone pollution has become a major concern among public health investigators. Saraburi Province is one of the areas with high air pollution levels in Thailand as it is an important industrialized area in the country. Unfortunately, the August 2018 Pollution Control Department (PCD) report contained some missing values of the ozone concentrations in Saraburi Province. Missing data can significantly affect the data analysis process. We need to deal with missing data in a proper way before analysis using standard statistical techniques. In the presence of missing data, we focus on estimating ozone mean using an improved compromised imputation method that utilizes chain ratio exponential technique. Expressions for bias and mean square error (MSE) of an estimator obtained from the proposed imputation method are derived by Taylor series method. Theoretical finding is studied to compare the performance of the proposed estimator with existing estimators on the basis of MSE’s estimators. In this case study, the results in terms of the percent relative efficiencies indicate that the proposed estimator is the best under certain conditions, and it is then applied to the ozone mean estimation for Saraburi Province in August 2018.


2017 ◽  
Vol 23 (3) ◽  
pp. 260-278 ◽  
Author(s):  
Panagiotis Loukopoulos ◽  
George Zolkiewski ◽  
Ian Bennett ◽  
Pericles Pilidis ◽  
Fang Duan ◽  
...  

Purpose Centrifugal compressors are integral components in oil industry, thus effective maintenance is required. Condition-based maintenance and prognostics and health management (CBM/PHM) have been gaining popularity. CBM/PHM can also be performed remotely leading to e-maintenance. Its success depends on the quality of the data used for analysis and decision making. A major issue associated with it is the missing data. Their presence may compromise the information within a set, causing bias or misleading results. Addressing this matter is crucial. The purpose of this paper is to review and compare the most widely used imputation techniques in a case study using condition monitoring measurements from an operational industrial centrifugal compressor. Design/methodology/approach Brief overview and comparison of most widely used imputation techniques using a complete set with artificial missing values. They were tested regarding the effects of the amount, the location within the set and the variable containing the missing values. Findings Univariate and multivariate imputation techniques were compared, with the latter offering the smallest error levels. They seemed unaffected by the amount or location of the missing data although they were affected by the variable containing them. Research limitations/implications During the analysis, it was assumed that at any time only one variable contained missing data. Further research is still required to address this point. Originality/value This study can serve as a guide for selecting the appropriate imputation method for missing values in centrifugal compressor condition monitoring data.


Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 21-32
Author(s):  
Dirk Temme ◽  
Sarah Jensen

Missing values are ubiquitous in empirical marketing research. If missing data are not dealt with properly, this can lead to a loss of statistical power and distorted parameter estimates. While traditional approaches for handling missing data (e.g., listwise deletion) are still widely used, researchers can nowadays choose among various advanced techniques such as multiple imputation analysis or full-information maximum likelihood estimation. Due to the available software, using these modern missing data methods does not pose a major obstacle. Still, their application requires a sound understanding of the prerequisites and limitations of these methods as well as a deeper understanding of the processes that have led to missing values in an empirical study. This article is Part 1 and first introduces Rubin’s classical definition of missing data mechanisms and an alternative, variable-based taxonomy, which provides a graphical representation. Secondly, a selection of visualization tools available in different R packages for the description and exploration of missing data structures is presented.


Author(s):  
Dr. Akshay H. Malshikare ◽  
Dr. Sharada Chikurte

Diabetes is a major health problem in whole world. In spite of many drugs available, uncontrolled diabetes remains a challenge. Moreover, some anti-diabetic drugs are on the verge of withdrawal due to its adverse effects. So, there is an acute need for a new effective and safe drug. So in this case study we used Ayurvedic medicine ‘Mustadi Kwatha’ mentioned in Bhaishajya Ratnawali in Prameha Chikitsa. A single case study was done on use of Mustadi Kwatha on Type II Diabetes Mellitus. Significant reduction was seen in blood sugar level fasting and post-prandial.


2020 ◽  
Author(s):  
Richard P Bartlett ◽  
Alexandria Watkins

UNSTRUCTURED Background: This is an outpatient case study that examines two patients in the United States with unique cases that involve oncology, hypertension, Type II Diabetes Mellitus, and Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), also known as COVID-19. This case study involves two patients in the outpatient setting - treated via telemedicine, with laboratory-confirmed SARS-CoV-2 infection in the West Texas region between March 29th, 2020, and May 14th, 2020. Case Report: The first patient is a 63-year-old female, non-smoker, who is diagnosed with Waldenstrom’s Macroglobulinemia (2012) and Primary Cutaneous Marginal Zone Lymphoma (2020) and the second patient is a 38-year-old male, non-smoker, who has the following comorbidities: Type II Diabetes Mellitus (DM), hypertension, and gout. Both patients were empirically started on budesonide 0.5mg nebulizer twice daily, clarithromycin (Biaxin) 500mg tab twice daily for ten days, Zinc 50mg tab twice daily, and aspirin 81mg tab daily. Both patients have fully recovered with no residual effects. Conclusion: The goal is to call attention to the success of proactive, early empirical treatment, combining a classic corticosteroid (budesonide) administered via a nebulizer and an oral macrolide antibiotic known as clarithromycin (Biaxin).


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rahi Jain ◽  
Wei Xu

Abstract Background Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. Method and results This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. Conclusion DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Ken Wei Tan ◽  
Joel R. Koo ◽  
Jue Tao Lim ◽  
Alex R. Cook ◽  
Borame L. Dickens

Chronic disease burdens continue to rise in highly dense urban environments where clustering of type II diabetes mellitus, acute myocardial infarction, stroke, or any combination of these three conditions is occurring. Many individuals suffering from these conditions will require longer-term care and access to clinics which specialize in managing their illness. With Singapore as a case study, we utilized census data in an agent-modeling approach at an individual level to estimate prevalence in 2020 and found high-risk clusters with >14,000 type II diabetes mellitus cases and 2000–2500 estimated stroke cases. For comorbidities, 10% of those with type II diabetes mellitus had a past acute myocardial infarction episode, while 6% had a past stroke. The western region of Singapore had the highest number of high-risk individuals at 173,000 with at least one chronic condition, followed by the east at 169,000 and the north with the least at 137,000. Such estimates can assist in healthcare resource planning, which requires these spatial distributions for evidence-based policymaking and to investigate why such heterogeneities exist. The methodologies presented can be utilized within any urban setting where census data exists.


Author(s):  
Maria Lucia Parrella ◽  
Giuseppina Albano ◽  
Cira Perna ◽  
Michele La Rocca

AbstractMissing data reconstruction is a critical step in the analysis and mining of spatio-temporal data. However, few studies comprehensively consider missing data patterns, sample selection and spatio-temporal relationships. To take into account the uncertainty in the point forecast, some prediction intervals may be of interest. In particular, for (possibly long) missing sequences of consecutive time points, joint prediction regions are desirable. In this paper we propose a bootstrap resampling scheme to construct joint prediction regions that approximately contain missing paths of a time components in a spatio-temporal framework, with global probability $$1-\alpha $$ 1 - α . In many applications, considering the coverage of the whole missing sample-path might appear too restrictive. To perceive more informative inference, we also derive smaller joint prediction regions that only contain all elements of missing paths up to a small number k of them with probability $$1-\alpha $$ 1 - α . A simulation experiment is performed to validate the empirical performance of the proposed joint bootstrap prediction and to compare it with some alternative procedures based on a simple nominal coverage correction, loosely inspired by the Bonferroni approach, which are expected to work well standard scenarios.


Agriculture ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 727
Author(s):  
Yingpeng Fu ◽  
Hongjian Liao ◽  
Longlong Lv

UNSODA, a free international soil database, is very popular and has been used in many fields. However, missing soil property data have limited the utility of this dataset, especially for data-driven models. Here, three machine learning-based methods, i.e., random forest (RF) regression, support vector (SVR) regression, and artificial neural network (ANN) regression, and two statistics-based methods, i.e., mean and multiple imputation (MI), were used to impute the missing soil property data, including pH, saturated hydraulic conductivity (SHC), organic matter content (OMC), porosity (PO), and particle density (PD). The missing upper depths (DU) and lower depths (DL) for the sampling locations were also imputed. Before imputing the missing values in UNSODA, a missing value simulation was performed and evaluated quantitatively. Next, nonparametric tests and multiple linear regression were performed to qualitatively evaluate the reliability of these five imputation methods. Results showed that RMSEs and MAEs of all features fluctuated within acceptable ranges. RF imputation and MI presented the lowest RMSEs and MAEs; both methods are good at explaining the variability of data. The standard error, coefficient of variance, and standard deviation decreased significantly after imputation, and there were no significant differences before and after imputation. Together, DU, pH, SHC, OMC, PO, and PD explained 91.0%, 63.9%, 88.5%, 59.4%, and 90.2% of the variation in BD using RF, SVR, ANN, mean, and MI, respectively; and this value was 99.8% when missing values were discarded. This study suggests that the RF and MI methods may be better for imputing the missing data in UNSODA.


Sign in / Sign up

Export Citation Format

Share Document