Multiple Imputation by Chained Equations–K-Nearest Neighbors and Deep Neural Network Architecture for Kidney Disease Prediction

Author(s):  
M. Dhilsath Fathima ◽  
R. Hariharan ◽  
S. P. Raja

Chronic kidney disease (CKD) is a health concern that affects people all over the world. Kidney dysfunction or impaired kidney functions are the causes of CKD. The machine learning-based prediction models are used to determine the risk level of CKD and assist healthcare practitioners in delaying and preventing the disease’s progression. The researchers proposed many prediction models for determining the CKD risk level. Although these models performed well, their precision is limited since they do not handle missing values in the clinical dataset adequately. The missing values of a clinical dataset can degrade the training outcomes that leads to false predictions. Thus, imputing missing values increases the prediction model performance. This proposed work developed a novel imputation technique by combining Multiple Imputation by Chained Equations and [Formula: see text]-Nearest Neighbors (MICE–KNN) for imputing the missing values. The experimental results show that MICE–KNN accurately predicts the missing values, and the Deep Neural Network (DNN) improves the prediction performance of the CKD model. Various metrics like mean absolute error, accuracy, specificity, Matthews correlation coefficient, the area under the curve, [Formula: see text]-score, sensitivity, and precision have been used to evaluate the proposed CKD model performance. The performance analysis exhibits that MICE–KNN with deep learning outperforms other classifiers. According to our experimental study, the MICE–KNN imputation algorithm with DNN is more appropriate for predicting the kidney disease.

Author(s):  
Byron C. Jaeger ◽  
Ryan Cantor ◽  
Venkata Sthanam ◽  
Rongbing Xie ◽  
James K. Kirklin ◽  
...  

Background: Risk prediction models play an important role in clinical decision making. When developing risk prediction models, practitioners often impute missing values to the mean. We evaluated the impact of applying other strategies to impute missing values on the prognostic accuracy of downstream risk prediction models, that is, models fitted to the imputed data. A secondary objective was to compare the accuracy of imputation methods based on artificially induced missing values. To complete these objectives, we used data from the Interagency Registry for Mechanically Assisted Circulatory Support. Methods: We applied 12 imputation strategies in combination with 2 different modeling strategies for mortality and transplant risk prediction following surgery to receive mechanical circulatory support. Model performance was evaluated using Monte-Carlo cross-validation and measured based on outcomes 6 months following surgery using the scaled Brier score, concordance index, and calibration error. We used Bayesian hierarchical models to compare model performance. Results: Multiple imputation with random forests emerged as a robust strategy to impute missing values, increasing model concordance by 0.0030 (25th–75th percentile: 0.0008–0.0052) compared with imputation to the mean for mortality risk prediction using a downstream proportional hazards model. The posterior probability that single and multiple imputation using random forests would improve concordance versus mean imputation was 0.464 and >0.999, respectively. Conclusions: Selecting an optimal strategy to impute missing values such as random forests and applying multiple imputation can improve the prognostic accuracy of downstream risk prediction models.


Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 116
Author(s):  
Vijendra Singh ◽  
Vijayan K. Asari ◽  
Rajkumar Rajasekaran

Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). Glomerular Filtration Rate (GFR) and kidney damage markers are used by researchers around the world to identify CKD as a condition that leads to reduced renal function over time. A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage in order to prevent the disease. This research presents a novel deep learning model for the early detection and prediction of CKD. This research objectives to create a deep neural network and compare its performance to that of other contemporary machine learning techniques. In tests, the average of the associated features was used to replace all missing values in the database. After that, the neural network’s optimum parameters were fixed by establishing the parameters and running multiple trials. The foremost important features were selected by Recursive Feature Elimination (RFE). Hemoglobin, Specific Gravity, Serum Creatinine, Red Blood Cell Count, Albumin, Packed Cell Volume, and Hypertension were found as key features in the RFE. Selected features were passed to machine learning models for classification purposes. The proposed Deep neural model outperformed the other four classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The proposed approach could be a useful tool for nephrologists in detecting CKD.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Li-Hsin Cheng ◽  
Te-Cheng Hsu ◽  
Che Lin

AbstractBreast cancer is a heterogeneous disease. To guide proper treatment decisions for each patient, robust prognostic biomarkers, which allow reliable prognosis prediction, are necessary. Gene feature selection based on microarray data is an approach to discover potential biomarkers systematically. However, standard pure-statistical feature selection approaches often fail to incorporate prior biological knowledge and select genes that lack biological insights. Besides, due to the high dimensionality and low sample size properties of microarray data, selecting robust gene features is an intrinsically challenging problem. We hence combined systems biology feature selection with ensemble learning in this study, aiming to select genes with biological insights and robust prognostic predictive power. Moreover, to capture breast cancer's complex molecular processes, we adopted a multi-gene approach to predict the prognosis status using deep learning classifiers. We found that all ensemble approaches could improve feature selection robustness, wherein the hybrid ensemble approach led to the most robust result. Among all prognosis prediction models, the bimodal deep neural network (DNN) achieved the highest test performance, further verified by survival analysis. In summary, this study demonstrated the potential of combining ensemble learning and bimodal DNN in guiding precision medicine.


2020 ◽  
Author(s):  
Hamza Turabieh ◽  
Alaa Sheta ◽  
Malik Braik ◽  
Elvira Kovač-Andrić

To fulfill the national air quality standards, many countries have created emissions monitoring strategies on air quality. Nowadays, policymakers and air quality executives depend on scientific computation and prediction models to monitor that cause air pollution, especially in industrial cities. Air pollution is considered one of the primary problems that could cause many human health problems such as asthma, damage to lungs, and even death. In this study, we present investigated development forecasting models for air pollutant attributes including Particulate Matters (PM2.5, PM10), ground-level Ozone (O3), and Nitrogen Oxides (NO2). The dataset used was collected from Dubrovnik city, which is located in the east of Croatia. The collected data has missing values. Therefore, we suggested the use of a Layered Recurrent Neural Network (L-RNN) to impute the missing value(s) of air pollutant attributes then build forecasting models. We adopted four regression models to forecast air pollutant attributes, which are: Multiple Linear Regression (MLR), Decision Tree Regression (DTR), Artificial Neural Network (ANN) and L-RNN. The obtained results show that the proposed method enhances the overall performance of other forecasting models.


2021 ◽  
Author(s):  
Jeong-Beom Lee ◽  
Jae-Bum Lee ◽  
Youn-Seo Koo ◽  
Hee-Yong Kwon ◽  
Min-Hyeok Choi ◽  
...  

Abstract. This study aims to develop a deep neural network (DNN) model as an artificial neural network (ANN) for the prediction of 6-hour average fine particulate matter (PM2.5) concentrations for a three-day period—the day of prediction (D+0), one day after prediction (D+1) and two days after prediction (D+2)—using observation data and forecast data obtained via numerical models. The performance of the DNN model was comparatively evaluated against that of the currently operational Community Multiscale Air Quality (CMAQ) modelling system for air quality forecasting in South Korea. In addition, the effect on predictive performance of the DNN model on using different training data was analyzed. For the D+0 forecast, the DNN model performance was superior to that of the CMAQ model, and there was no significant dependence on the training data. For the D+1 and D+2 forecasts, the DNN model that used the observation and forecast data (DNN-ALL) outperformed the CMAQ model. The root-mean-squared error (RMSE) of DNN-ALL was lower than that of the CMAQ model by 2.2 μgm−3, and 3.0 μgm−3 for the D+1 and D+2 forecasts, respectively, because the overprediction of higher concentrations was curtailed. An IOA increase of 0.46 for D+1 prediction and 0.59 for the D+2 prediction was observed in case of the DNN-ALL model compared to the IOA of the DNN model that used only observation data (DNN-OBS). In additionally, An RMSE decrease of 7.2 μgm−3 for the D+1 prediction and 6.3 μgm−3 for the D+2 prediction was observed in case of the DNN-ALL model, compared to the RMSE of DNN-OBS, indicating that the inclusion of forecast data in the training data greatly affected the DNN model performance. Considering the prediction of the 6-hour average PM2.5 concentration, the 8.8 μgm−3 RMSE of the DNN-ALL model was 2.7 μgm−3 lower than that of the CMAQ model, indicating the superior prediction performance of the former. These results suggest that the DNN model could be utilized as a better-performing air quality forecasting model than the CMAQ, and that observation data plays an important role in determining the prediction performance of the DNN model for D+0 forecasting, while prediction data does the same for D+1 and D+2 forecasting. The use of the proposed DNN model as a forecasting model may result in a reduction in the economic losses caused by pollution-mitigation policies and aid better protection of public health.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Maria Bibi ◽  
Muhammad Kashif Hanif ◽  
Muhammad Umer Sarwar ◽  
Muhammad Irfan Khan ◽  
Shouket Zaman Khan ◽  
...  

Asian citrus psyllid, Diaphorina citri Kuwayama (Liviidae: Hemiptera) is a menacing and notorious pest of citrus plants. It vectors a phloem vessel-dwelling bacterium Candidatus Liberibacter asiaticus, which is a causative pathogen of the serious citrus disease known as Huanglongbing. Huanglongbing disease is a major bottleneck in the export of citrus fruits from Pakistan. It is being responsible for huge citrus economic losses globally. In the current study, several prediction models were developed based on regression algorithms of machine learning to monitor different phenological stages of Asian citrus psyllid to predict its population about different abiotic variables (average maximum temperature, average minimum temperature, average weekly temperature, average weekly relative humidity, and average weekly rainfall) and biotic variable (host plant phenological patterns) in citrus-growing regions of Pakistan. The pest prediction models can be used for proper applications of pesticides only when needed for reducing the environmental and cost impacts of pesticides. Pearson’s correlation analysis was performed to find the relationship between different predictor (abiotic and biotic) variables and pest infestation rate on citrus plants. Multiple linear regression, random forest regressor, and deep neural network approaches were compared to predict population dynamics of Asian citrus psyllid. In comparison with other regression techniques, a deep neural network-based prediction model resulted in the least root mean squared error values while predicting egg, nymph, and adult populations.


2021 ◽  
Vol 6 (9) ◽  
pp. 129
Author(s):  
T. Pradeep ◽  
Abidhan Bardhan ◽  
Avijit Burman ◽  
Pijush Samui

The majority of natural ground vibrations are caused by the release of strain energy accumulated in the rock strata. The strain reacts to the formation of crack patterns and rock stratum failure. Rock strain prediction is one of the significant works for the assessment of the failure of rock material. The purpose of this paper is to investigate the development of a new strain prediction approach in rock samples utilizing deep neural network (DNN) and hybrid ANFIS (adaptive neuro-fuzzy inference system) models. Four optimization algorithms, namely particle swarm optimization (PSO), Fireflies algorithm (FF), genetic algorithm (GA), and grey wolf optimizer (GWO), were used to optimize the learning parameters of ANFIS and ANFIS-PSO, ANFIS-FF, ANFIS-GA, and ANFIS-GWO were constructed. For this purpose, the necessary datasets were obtained from an experimental setup of an unconfined compression test of rocks in lateral and longitudinal directions. Various statistical parameters were used to investigate the accuracy of the proposed prediction models. In addition, rank analysis was performed to select the most robust model for accurate rock sample prediction. Based on the experimental results, the constructed DNN is very potential to be a new alternative to assist engineers to estimate the rock strain in the design phase of many engineering projects.


Sign in / Sign up

Export Citation Format

Share Document