Improving Power Grid Monitoring Data Quality: An Efficient Machine Learning Framework for Missing Data Prediction

Author(s):  
Weiwei Shi ◽  
Yongxin Zhu ◽  
Jinkui Zhang ◽  
Xiang Tao ◽  
Gehao Sheng ◽  
...  
Processes ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 265 ◽  
Author(s):  
Mingrui Sun ◽  
Tengfei Min ◽  
Tianyi Zang ◽  
Yadong Wang

(1) Background: Recommendation algorithms have played a vital role in the prediction of personalized recommendation for clinical decision support systems (CDSSs). Machine learning methods are powerful tools for disease diagnosis. Unfortunately, they must deal with missing data, as this will result in data error and limit the potential patterns and features associated with obtaining a clinical decision; (2) Methods: Recent years, collaborative filtering (CF) have proven to be a valuable means of coping with missing data prediction. In order to address the challenge of missing data prediction and latent feature extraction, neighbor-based and latent features-based CF methods are presented for clinical disease diagnosis. The novel discriminative restricted Boltzmann machine (DRBM) model is proposed to extract the latent features, where the deep learning technique is adopted to analyze the clinical data; (3) Results: Proposed methods were compared to machine learning models, using two different publicly available clinical datasets, which has various types of inputs and different quantity of missing. We also evaluated the performance of our algorithm, using clinical datasets that were missing at random (MAR), which were missing at various degrees; and (4) Conclusions: The experimental results demonstrate that DRBM can effectively capture the latent features of real clinical data and exhibits excellent performance for predicting missing values and result classification.


2019 ◽  
Vol 15 (1) ◽  
pp. 13-17
Author(s):  
Nurul Latiffah Abd Rani ◽  
Azman Azid ◽  
Muhamad Shirwan Abdullah Sani ◽  
Mohd Saiful Samsudin ◽  
Ku Mohd Kalkausar Ku Yusof ◽  
...  

Carbon monoxide (CO) is one of the most important pollutants since it is selected for API calculation. Therefore, it is paramount to ensure that there is no missing data of CO during the analysis. There are numbers of occurrences that may contribute to the missing data problems such as inability of the instrument to record certain parameters. In view of this fact, a CO prediction model needs to be developed to address this problem. A dataset of meteorological and air pollutants value was obtained from the Air Quality Division, Department of Environment Malaysia (DOE). A total of 113112 datasets were used to develop the model using sensitivity analysis (SA) through artificial neural network (ANN). SA showed particulate matter (PM10) and ozone (O3) were the most significant input variables for missing data prediction model of CO. Three hidden nodes were the optimum number to develop the ANN model with the value of R2 equal to 0.5311. Both models (artificial neural network-carbon monoxide-all parameters (ANN-CO-AP) and artificial neural network-carbon monoxide-leave out (ANN-CO-LO)) showed high value of R2 (0.7639 and 0.5311) and low value of RMSE (0.2482 and 0.3506), respectively. These values indicated that the models might only employ the most significant input variables to represent the CO rather than using all input variables.


Sign in / Sign up

Export Citation Format

Share Document