scholarly journals Machine learning based models for prediction of subtype diagnosis of primary aldosteronism using blood test

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hiroki Kaneko ◽  
Hironobu Umakoshi ◽  
Masatoshi Ogata ◽  
Norio Wada ◽  
Norifusa Iwahashi ◽  
...  

AbstractPrimary aldosteronism (PA) is associated with an increased risk of cardiometabolic diseases, especially in unilateral subtype. Despite its high prevalence, the case detection rate of PA is limited, partly because of no clinical models available in general practice to identify patients highly suspicious of unilateral subtype of PA, who should be referred to specialized centers. The aim of this retrospective cross-sectional study was to develop a predictive model for subtype diagnosis of PA based on machine learning methods using clinical data available in general practice. Overall, 91 patients with unilateral and 138 patients with bilateral PA were randomly assigned to the training and test cohorts. Four supervised machine learning classifiers; logistic regression, support vector machines, random forests (RF), and gradient boosting decision trees, were used to develop predictive models from 21 clinical variables. The accuracy and the area under the receiver operating characteristic curve (AUC) for predicting of subtype diagnosis of PA in the test cohort were compared among the optimized classifiers. Of the four classifiers, the accuracy and AUC were highest in RF, with 95.7% and 0.990, respectively. Serum potassium, plasma aldosterone, and serum sodium levels were highlighted as important variables in this model. For feature-selected RF with the three variables, the accuracy and AUC were 89.1% and 0.950, respectively. With an independent external PA cohort, we confirmed a similar accuracy for feature-selected RF (accuracy: 85.1%). Machine learning models developed using blood test can help predict subtype diagnosis of PA in general practice.

2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. A88-A89
Author(s):  
Hiroki Kaneko ◽  
Hironobu Umakoshi ◽  
Masatoshi Ogata ◽  
Norio Wada ◽  
Norifusa Iwahashi ◽  
...  

Abstract Context: Primary aldosteronism (PA) is a common cause of secondary hypertension and is associated with an increased risk of cardiometabolic diseases, especially in unilateral subtype. Despite its high prevalence, the case detection rate of PA is limited, partly because of no clinical models available in general practice to identify patients highly suspicious of unilateral subtype of PA, who can be cured by unilateral adrenalectomy and should be referred to specialized centers. The use of machine learning has been introduced in some fields of clinical prediction. Combining machine learning with clinical data could lead to the development of new models for predicting unilateral subtype of PA. Objective: The aim of this study was to develop a predictive model of subtype diagnosis of PA based on machine learning methods using clinical data available in general practice. Design and setting: This was a retrospective cross-sectional study in referral centers. Patients: We retrospectively analyzed 91 patients with unilateral and 138 patients with bilateral subtype of PA diagnosed according to adrenal venous sampling findings, and stratified randomly split to the training (80%) and test cohorts (20%). Four supervised machine learning classifiers; logistic regression, support vector machines, random forests (RF), and gradient boosting decision trees, were used to develop prediction models from 21 clinical variables. Classifiers were trained using stratified 10 fold cross-validation of the training cohort and hyperparameters of each classifier were adjusted using grid search to optimize the accuracy in the training cohort. Main Outcome Measures: The accuracy and the area under the receiver operating characteristic curve (AUC) for subtype prediction in the test cohort were compared among the optimized classifiers. Results: Of the four classifiers, the accuracy and AUC were highest in RF, with 95.7% and 0.990, respectively. Serum potassium levels, plasma aldosterone concentrations, and serum sodium levels were highlighted as important variables in this model. For feature-selected RF with the three variables, the accuracy and AUC were 89.1% and 0.950, respectively. With an independent external PA cohort, we confirmed a similar accuracy for feature-selected RF. Conclusions: Machine learning models developed using blood test can help predict subtype diagnosis of PA in general practice.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Toktam Khatibi ◽  
Elham Hanifi ◽  
Mohammad Mehdi Sepehri ◽  
Leila Allahqoli

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.


2018 ◽  
Vol 129 (4) ◽  
pp. 675-688 ◽  
Author(s):  
Samir Kendale ◽  
Prathamesh Kulkarni ◽  
Andrew D. Rosenberg ◽  
Jing Wang

AbstractEditor’s PerspectiveWhat We Already Know about This TopicWhat This Article Tells Us That Is NewBackgroundHypotension is a risk factor for adverse perioperative outcomes. Machine-learning methods allow large amounts of data for development of robust predictive analytics. The authors hypothesized that machine-learning methods can provide prediction for the risk of postinduction hypotension.MethodsData was extracted from the electronic health record of a single quaternary care center from November 2015 to May 2016 for patients over age 12 that underwent general anesthesia, without procedure exclusions. Multiple supervised machine-learning classification techniques were attempted, with postinduction hypotension (mean arterial pressure less than 55 mmHg within 10 min of induction by any measurement) as primary outcome, and preoperative medications, medical comorbidities, induction medications, and intraoperative vital signs as features. Discrimination was assessed using cross-validated area under the receiver operating characteristic curve. The best performing model was tuned and final performance assessed using split-set validation.ResultsOut of 13,323 cases, 1,185 (8.9%) experienced postinduction hypotension. Area under the receiver operating characteristic curve using logistic regression was 0.71 (95% CI, 0.70 to 0.72), support vector machines was 0.63 (95% CI, 0.58 to 0.60), naive Bayes was 0.69 (95% CI, 0.67 to 0.69), k-nearest neighbor was 0.64 (95% CI, 0.63 to 0.65), linear discriminant analysis was 0.72 (95% CI, 0.71 to 0.73), random forest was 0.74 (95% CI, 0.73 to 0.75), neural nets 0.71 (95% CI, 0.69 to 0.71), and gradient boosting machine 0.76 (95% CI, 0.75 to 0.77). Test set area for the gradient boosting machine was 0.74 (95% CI, 0.72 to 0.77).ConclusionsThe success of this technique in predicting postinduction hypotension demonstrates feasibility of machine-learning models for predictive analytics in the field of anesthesiology, with performance dependent on model selection and appropriate tuning.


2021 ◽  
Vol 10 (6) ◽  
pp. 3369-3376
Author(s):  
Saima Afrin ◽  
F. M. Javed Mehedi Shamrat ◽  
Tafsirul Islam Nibir ◽  
Mst. Fahmida Muntasim ◽  
Md. Shakil Moharram ◽  
...  

In this contemporary era, the uses of machine learning techniques are increasing rapidly in the field of medical science for detecting various diseases such as liver disease (LD). Around the globe, a large number of people die because of this deadly disease. By diagnosing the disease in a primary stage, early treatment can be helpful to cure the patient. In this research paper, a method is proposed to diagnose the LD using supervised machine learning classification algorithms, namely logistic regression, decision tree, random forest, AdaBoost, KNN, linear discriminant analysis, gradient boosting and support vector machine (SVM). We also deployed a least absolute shrinkage and selection operator (LASSO) feature selection technique on our taken dataset to suggest the most highly correlated attributes of LD. The predictions with 10 fold cross-validation (CV) made by the algorithms are tested in terms of accuracy, sensitivity, precision and f1-score values to forecast the disease. It is observed that the decision tree algorithm has the best performance score where accuracy, precision, sensitivity and f1-score values are 94.295%, 92%, 99% and 96% respectively with the inclusion of LASSO. Furthermore, a comparison with recent studies is shown to prove the significance of the proposed system. 


2019 ◽  
Vol 58 (01) ◽  
pp. 031-041 ◽  
Author(s):  
Sara Rabhi ◽  
Jérémie Jakubowicz ◽  
Marie-Helene Metzger

Objective The objective of this article was to compare the performances of health care-associated infection (HAI) detection between deep learning and conventional machine learning (ML) methods in French medical reports. Methods The corpus consisted in different types of medical reports (discharge summaries, surgery reports, consultation reports, etc.). A total of 1,531 medical text documents were extracted and deidentified in three French university hospitals. Each of them was labeled as presence (1) or absence (0) of HAI. We started by normalizing the records using a list of preprocessing techniques. We calculated an overall performance metric, the F1 Score, to compare a deep learning method (convolutional neural network [CNN]) with the most popular conventional ML models (Bernoulli and multi-naïve Bayes, k-nearest neighbors, logistic regression, random forests, extra-trees, gradient boosting, support vector machines). We applied the hyperparameter Bayesian optimization for each model based on its HAI identification performances. We included the set of text representation as an additional hyperparameter for each model, using four different text representations (bag of words, term frequency–inverse document frequency, word2vec, and Glove). Results CNN outperforms all other conventional ML algorithms for HAI classification. The best F1 Score of 97.7% ± 3.6% and best area under the curve score of 99.8% ± 0.41% were achieved when CNN was directly applied to the processed clinical notes without a pretrained word2vec embedding. Through receiver operating characteristic curve analysis, we could achieve a good balance between false notifications (with a specificity equal to 0.937) and system detection capability (with a sensitivity equal to 0.962) using the Youden's index reference. Conclusions The main drawback of CNNs is their opacity. To address this issue, we investigated CNN inner layers' activation values to visualize the most meaningful phrases in a document. This method could be used to build a phrase-based medical assistant algorithm to help the infection control practitioner to select relevant medical records. Our study demonstrated that deep learning approach outperforms other classification learning algorithms for automatically identifying HAIs in medical reports.


Hypertension ◽  
2020 ◽  
Vol 76 (Suppl_1) ◽  
Author(s):  
Sachin Aryal ◽  
Ahmad Alimadadi ◽  
Ishan Manandhar ◽  
Bina Joe ◽  
Xi Cheng

In recent years, the microbiome has been recognized as an important factor associated with cardiovascular disease (CVD), which is the leading cause of human mortality worldwide. Disparities in gut microbial compositions between individuals with and without CVD were reported, whereby, we hypothesized that utilizing such microbiome-based data for training with supervised machine learning (ML) models could be exploited as a new strategy for evaluation of cardiovascular health. To test our hypothesis, we analyzed the metagenomics data extracted from the American Gut Project. Specifically, 16S rRNA reads from stool samples of 478 CVD and 473 non-CVD control samples were analyzed using five supervised ML algorithms: random forest (RF), support vector machine with radial kernel (svmRadial), decision tree (DT), elastic net (ENet) and neural networks (NN). Thirty-nine differential bacterial taxa (LEfSe: LDA > 2) were identified between CVD and non-CVD groups. ML classifications, using these taxonomic features, achieved an AUC (area under the receiver operating characteristic curve) of ~0.58 (RF). However, by choosing the top 500 high-variance features of operational taxonomic units (OTUs) for training ML models, an improved AUC of ~0.65 (RF) was achieved. Further, by limiting the selection to only the top 25 highly contributing OTU features to reduce the dimensionality of feature space, the AUC was further significantly enhanced to ~0.70 (RF). In summary, this study is the first to demonstrate the successful development of a ML model using microbiome-based datasets for a systematic diagnostic screening of CVD.


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 403
Author(s):  
Muhammad Waleed ◽  
Tai-Won Um ◽  
Tariq Kamal ◽  
Syed Muhammad Usman

In this paper, we apply the multi-class supervised machine learning techniques for classifying the agriculture farm machinery. The classification of farm machinery is important when performing the automatic authentication of field activity in a remote setup. In the absence of a sound machine recognition system, there is every possibility of a fraudulent activity taking place. To address this need, we classify the machinery using five machine learning techniques—K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and Gradient Boosting (GB). For training of the model, we use the vibration and tilt of machinery. The vibration and tilt of machinery are recorded using the accelerometer and gyroscope sensors, respectively. The machinery included the leveler, rotavator and cultivator. The preliminary analysis on the collected data revealed that the farm machinery (when in operation) showed big variations in vibration and tilt, but observed similar means. Additionally, the accuracies of vibration-based and tilt-based classifications of farm machinery show good accuracy when used alone (with vibration showing slightly better numbers than the tilt). However, the accuracies improve further when both (the tilt and vibration) are used together. Furthermore, all five machine learning algorithms used for classification have an accuracy of more than 82%, but random forest was the best performing. The gradient boosting and random forest show slight over-fitting (about 9%), but both algorithms produce high testing accuracy. In terms of execution time, the decision tree takes the least time to train, while the gradient boosting takes the most time.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Yihan Zhang ◽  
Dong Yang ◽  
Zifeng Liu ◽  
Chaojin Chen ◽  
Mian Ge ◽  
...  

Abstract Background Early prediction of acute kidney injury (AKI) after liver transplantation (LT) facilitates timely recognition and intervention. We aimed to build a risk predictor of post-LT AKI via supervised machine learning and visualize the mechanism driving within to assist clinical decision-making. Methods Data of 894 cases that underwent liver transplantation from January 2015 to September 2019 were collected, covering demographics, donor characteristics, etiology, peri-operative laboratory results, co-morbidities and medications. The primary outcome was new-onset AKI after LT according to Kidney Disease Improving Global Outcomes guidelines. Predicting performance of five classifiers including logistic regression, support vector machine, random forest, gradient boosting machine (GBM) and adaptive boosting were respectively evaluated by the area under the receiver-operating characteristic curve (AUC), accuracy, F1-score, sensitivity and specificity. Model with the best performance was validated in an independent dataset involving 195 adult LT cases from October 2019 to March 2021. SHapley Additive exPlanations (SHAP) method was applied to evaluate feature importance and explain the predictions made by ML algorithms. Results 430 AKI cases (55.1%) were diagnosed out of 780 included cases. The GBM model achieved the highest AUC (0.76, CI 0.70 to 0.82), F1-score (0.73, CI 0.66 to 0.79) and sensitivity (0.74, CI 0.66 to 0.8) in the internal validation set, and a comparable AUC (0.75, CI 0.67 to 0.81) in the external validation set. High preoperative indirect bilirubin, low intraoperative urine output, long anesthesia time, low preoperative platelets, and graft steatosis graded NASH CRN 1 and above were revealed by SHAP method the top 5 important variables contributing to the diagnosis of post-LT AKI made by GBM model. Conclusions Our GBM-based predictor of post-LT AKI provides a highly interoperable tool across institutions to assist decision-making after LT. Graphic abstract


Author(s):  
Nelson Yego ◽  
Juma Kasozi ◽  
Joseph Nkrunziza

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.


2021 ◽  
Author(s):  
Yihan Zhang ◽  
Dong Yang ◽  
Zifeng Liu ◽  
Chaojin Chen ◽  
Mian Ge ◽  
...  

Abstract Background: Early prediction of acute kidney injury (AKI) after liver transplantation (LT) facilitates timely recognition and intervention. We aimed to build a risk predictor of post-LT AKI via supervised machine learning and visualize the mechanism driving within to assist clinical decision-making.Methods: Data of 894 cases that underwent liver transplantation from January 2015 to September 2019 were collected, covering demographics, donor characteristics, etiology, peri-operative laboratory results, co-morbidities and medications. The primary outcome was new-onset AKI after LT according to Kidney Disease Improving Global Outcomes guidelines. Predicting performance of five classifiers including logistic regression, support vector machine, random forest, gradient boosting machine (GBM) and adaptive boosting were respectively evaluated by the area under the receiver-operating characteristic curve (AUC), accuracy, F1-score, sensitivity and specificity. SHapley Additive exPlanations (SHAP) method was applied to evaluate feature importance and explain the predictions made by ML algorithms.Results: 430 AKI cases (55.1%) were diagnosed out of 780 included cases. The GBM model achieved the highest AUC (0.76, CI 0.70 to 0.82), F1-score (0.73, CI 0.66to 0.79) and sensitivity (0.74, CI 0.66 to 0.8). High preoperative indirect bilirubin, low intraoperative urine output, long anesthesia time, low preoperative platelets, and graft steatosis graded NASH CRN 1 and above were revealed by SHAP method the top 5 important variables contributing to the diagnosis of post-LT AKI made by GBM model.Conclusions: Our GBM-based predictor of post-LT AKI provides a highly interoperable tool across institutions to assist decision-making after LT.


Sign in / Sign up

Export Citation Format

Share Document