Using Machine Learning to Identify Metabolomic Signatures of Pediatric Chronic Kidney Disease Etiology

2022 ◽  
pp. ASN.2021040538
Author(s):  
Arthur M. Lee ◽  
Jian Hu ◽  
Yunwen Xu ◽  
Alison G. Abraham ◽  
Rui Xiao ◽  
...  

BackgroundUntargeted plasma metabolomic profiling combined with machine learning (ML) may lead to discovery of metabolic profiles that inform our understanding of pediatric CKD causes. We sought to identify metabolomic signatures in pediatric CKD based on diagnosis: FSGS, obstructive uropathy (OU), aplasia/dysplasia/hypoplasia (A/D/H), and reflux nephropathy (RN).MethodsUntargeted metabolomic quantification (GC-MS/LC-MS, Metabolon) was performed on plasma from 702 Chronic Kidney Disease in Children study participants (n: FSGS=63, OU=122, A/D/H=109, and RN=86). Lasso regression was used for feature selection, adjusting for clinical covariates. Four methods were then applied to stratify significance: logistic regression, support vector machine, random forest, and extreme gradient boosting. ML training was performed on 80% total cohort subsets and validated on 20% holdout subsets. Important features were selected based on being significant in at least two of the four modeling approaches. We additionally performed pathway enrichment analysis to identify metabolic subpathways associated with CKD cause.ResultsML models were evaluated on holdout subsets with receiver-operator and precision-recall area-under-the-curve, F1 score, and Matthews correlation coefficient. ML models outperformed no-skill prediction. Metabolomic profiles were identified based on cause. FSGS was associated with the sphingomyelin-ceramide axis. FSGS was also associated with individual plasmalogen metabolites and the subpathway. OU was associated with gut microbiome–derived histidine metabolites.ConclusionML models identified metabolomic signatures based on CKD cause. Using ML techniques in conjunction with traditional biostatistics, we demonstrated that sphingomyelin-ceramide and plasmalogen dysmetabolism are associated with FSGS and that gut microbiome–derived histidine metabolites are associated with OU.

Diagnostics ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 2267
Author(s):  
Nakib Hayat Chowdhury ◽  
Mamun Bin Ibne Reaz ◽  
Fahmida Haque ◽  
Shamim Ahmad ◽  
Sawal Hamid Md Ali ◽  
...  

Chronic kidney disease (CKD) is one of the severe side effects of type 1 diabetes mellitus (T1DM). However, the detection and diagnosis of CKD are often delayed because of its asymptomatic nature. In addition, patients often tend to bypass the traditional urine protein (urinary albumin)-based CKD detection test. Even though disease detection using machine learning (ML) is a well-established field of study, it is rarely used to diagnose CKD in T1DM patients. This research aimed to employ and evaluate several ML algorithms to develop models to quickly predict CKD in patients with T1DM using easily available routine checkup data. This study analyzed 16 years of data of 1375 T1DM patients, obtained from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials directed by the National Institute of Diabetes, Digestive, and Kidney Diseases, USA. Three data imputation techniques (RF, KNN, and MICE) and the SMOTETomek resampling technique were used to preprocess the primary dataset. Ten ML algorithms including logistic regression (LR), k-nearest neighbor (KNN), Gaussian naïve Bayes (GNB), support vector machine (SVM), stochastic gradient descent (SGD), decision tree (DT), gradient boosting (GB), random forest (RF), extreme gradient boosting (XGB), and light gradient-boosted machine (LightGBM) were applied to developed prediction models. Each model included 19 demographic, medical history, behavioral, and biochemical features, and every feature’s effect was ranked using three feature ranking techniques (XGB, RF, and Extra Tree). Lastly, each model’s ROC, sensitivity (recall), specificity, accuracy, precision, and F-1 score were estimated to find the best-performing model. The RF classifier model exhibited the best performance with 0.96 (±0.01) accuracy, 0.98 (±0.01) sensitivity, and 0.93 (±0.02) specificity. LightGBM performed second best and was quite close to RF with 0.95 (±0.06) accuracy. In addition to these two models, KNN, SVM, DT, GB, and XGB models also achieved more than 90% accuracy.


Author(s):  
Pedro Pedrosa Rebouças Filho ◽  
Suane Pires Pinheiro Da Silva ◽  
Jefferson Silva Almeida ◽  
Elene Firmeza Ohata ◽  
Shara Shami Araujo Alves ◽  
...  

Chronic kidney diseases cause over a million deaths worldwide every year. One of the techniques used to diagnose the diseases is renal scintigraphy. However, the way that is processed can vary depending on hospitals and doctors, compromising the reproducibility of the method. In this context, we propose an approach to process the exam using computer vision and machine learning to classify the stage of chronic kidney disease. An analysis of different features extraction methods, such as Gray-Level Co-occurrence Matrix, Structural Co-occurrence Matrix, Local Binary Patters (LBP), Hu's Moments and Zernike's Moments in combination with machine learning methods, such as Bayes, Multi-layer Perceptron, k-Nearest Neighbors, Random Forest and Support Vector Machines (SVM), was performed. The best result was obtained by combining LBP feature extractor with SVM classifier. This combination achieved accuracy of 92.00% and F1-score of 91.00%, indicating that the proposed method is adequate to classify chronic kidney disease in two stages, being a high risk of developing end-stage renal failure and other outcomes, and otherwise.


Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm


Author(s):  
Claire Salkar

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionalities reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.


Author(s):  
Harsh Vardhan Singh

Chronic Kidney Disease (CKD) is a disease which doesn't shows symptoms at all or in some cases it doesn't show any disease specific symptoms it is hard to predict, detect and prevent such a disease and this could be lead to permanently health damage, but machine learning can be hope in this problem it is best in prediction and analysis. The objective of paper is to build the model for predicting the Chronic Kidney Disease using various machine learning classification algorithm. Classification is a powerful machine learning technique that is commonly used for prediction. Some of the classification algorithm are Logistic Regression, Support Vector Machine, Naïve Bayes, Random Forest Classifier, KNN. This paper investigate which algorithm is used for the improving the accuracy in the prediction of Chronic Kidney Disease. And, a comparative analysis on the accuracy and mean squared error is to done for predicting the best model.


Diagnostics ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 116
Author(s):  
Vijendra Singh ◽  
Vijayan K. Asari ◽  
Rajkumar Rajasekaran

Diabetes and high blood pressure are the primary causes of Chronic Kidney Disease (CKD). Glomerular Filtration Rate (GFR) and kidney damage markers are used by researchers around the world to identify CKD as a condition that leads to reduced renal function over time. A person with CKD has a higher chance of dying young. Doctors face a difficult task in diagnosing the different diseases linked to CKD at an early stage in order to prevent the disease. This research presents a novel deep learning model for the early detection and prediction of CKD. This research objectives to create a deep neural network and compare its performance to that of other contemporary machine learning techniques. In tests, the average of the associated features was used to replace all missing values in the database. After that, the neural network’s optimum parameters were fixed by establishing the parameters and running multiple trials. The foremost important features were selected by Recursive Feature Elimination (RFE). Hemoglobin, Specific Gravity, Serum Creatinine, Red Blood Cell Count, Albumin, Packed Cell Volume, and Hypertension were found as key features in the RFE. Selected features were passed to machine learning models for classification purposes. The proposed Deep neural model outperformed the other four classifiers (Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Logistic regression, Random Forest, and Naive Bayes classifier) by achieving 100% accuracy. The proposed approach could be a useful tool for nephrologists in detecting CKD.


According to the health statistics of India on Chronic Kidney Disease (CKD) a total of 63538 cases has been registered. Average age of men and women prone to kidney disease lies in the range of 48 to 70 years. CKD is more prevalent among male than among female. India ranks 17th position in CKD during 2015[1]. This paper focus on the predictive analytics architecture to analyse CKD dataset using feature engineering and classification algorithm. The proposed model incorporates techniques to validate the feasibility of the data points used for analysis. The main focus of this research work is to analyze the dataset of chronic kidney failure and perform the classification of CKD and Non CKD cases. The feasibility of the proposed dataset is determined through the Learning curve performance. The features which play a vital role in classification are determined using sequential forward selection algorithm. The training dataset with the selected features is fed into various classifier to determine which classifier plays a vital and accurate role in detection of CKD. The proposed dataset is classified using various Classification algorithms like Linear Regression(LR), Linear Discriminant Analysis(LDA), K-Nearest Neighbour(KNN), Classification and Regression Tree(CART), Naive Bayes(NB), Support Vector Machine(SVM), Random Forest(RF), eXtreme Gradient Boosting(XGBoost) and Ada Boost Regressor (ABR). It was found that for the given CKD dataset with 25 attributes of 11 Numeric and 14 Nominal the following classifier like LR, LDA, CART,NB,RF,XGB and ABR provides an accuracy ranging from 98% to 100% . The proposed architecture validates the dataset against the thumb rule when working with less number of data points used for classification and the classifier is validated against under fit, over fit conditions. The performance of the classifier is evaluated using accuracy and F-Score. The proposed architecture indicates that LR, RF and ABR provides a very high accuracy and F-Score


Author(s):  
Reena Chandra, Et. al.

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionality reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.


Author(s):  
Komal Kumar N ◽  
R. Lakshmi Tulasi ◽  
Vigneswari D

<span lang="EN-US">Chronic Kidney Disease (CKD) is a type of lifelong kidney disease that leads to the gradual loss of kidney function over time; the main function of the kidney is to filter the wastein the human body. When the kidney malfunctions, the wastes accumulate in our body leading to complete failure. Machine learning algorithms can be used in prediction of the kidney disease at early stages by analyzing the symptoms. The aim of this paper is to propose an ensemble learning technique for predicting Chronic Kidney Disease (CKD). We propose a new hybrid classifier called as ABC4.5, which is ensemble learning for predicting Chronic Kidney Disease (CKD). The proposed hybrid classifier is compared with the machine learning classifiers such as Support Vector Machine (SVM), Decision Tree (DT), C4.5, Particle Swarm Optimized Multi Layer Perceptron (PSO-MLP). The proposed classifier accurately predicts the occurrences of kidney disease by analysis various medical factors. The work comprises of two stages, the first stage consists of obtaining weak decision tree classifiers from C4.5 and in the second stage, the weak classifiers are added to the weighted sum to represent the final output for improved performance of the classifier.</span>


Author(s):  
Anik Das ◽  
Mohamed M. Ahmed

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.


Sign in / Sign up

Export Citation Format

Share Document