scholarly journals Development of Predictive Risk Models for All-cause Mortality in Pulmonary Hypertension using Machine Learning

Author(s):  
Jiandong Zhou ◽  
Ka Hei Gabriel Wong ◽  
Sharen Lee ◽  
Tong Liu ◽  
Keith Sai Kit Leung ◽  
...  

AbstractBackgroundPulmonary hypertension, a progressive lung disorder with symptoms such as breathlessness and loss of exercise capacity, is highly debilitating and has a negative impact on the quality of life. In this study, we examined whether a multi-parametric approach using machine learning can improve mortality prediction.MethodsA population-based territory-wide cohort of pulmonary hypertension patients from January 1, 2000 to December 31, 2017 were retrospectively analyzed. Significant predictors of all-cause mortality were identified. Easy-to-use frailty indexes predicting primary and secondary pulmonary hypertension were derived and stratification performances of the derived scores were compared. A factorization machine model was used for the development of an accurate predictive risk model and the results were compared to multivariate logistic regression, support vector machine, random forests, and multilayer perceptron.ResultsThe cohorts consist of 2562 patients with either primary (n=1009) or secondary (n=1553) pulmonary hypertension. Multivariate Cox regression showed that age, prior cardiovascular, respiratory and kidney diseases, hypertension, number of emergency readmissions within 28 days of discharge were all predictors of all-cause mortality. Easy-to-use frailty scores were developed from Cox regression. A factorization machine model demonstrates superior risk prediction improvements for both primary (precision: 0.90, recall: 0.89, F1-score: 0.91, AUC: 0.91) and secondary pulmonary hypertension (precision: 0.87, recall: 0.86, F1-score: 0.89, AUC: 0.88) patients.ConclusionWe derived easy-to-use frailty scores predicting mortality in primary and secondary pulmonary hypertension. A machine learning model incorporating multi-modality clinical data significantly improves risk stratification performance.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


2020 ◽  
Author(s):  
Wanjun Zhao ◽  
Yong Zhang ◽  
Xinming Li ◽  
Yonghong Mao ◽  
Changwei Wu ◽  
...  

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10884
Author(s):  
Xin Yu ◽  
Qian Yang ◽  
Dong Wang ◽  
Zhaoyang Li ◽  
Nianhang Chen ◽  
...  

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.


2021 ◽  
Vol 7 (2) ◽  
pp. 203-206
Author(s):  
Herag Arabian ◽  
Verena Wagner-Hartl ◽  
Knut Moeller

Abstract Facial emotion recognition (FER) is a topic that has gained interest over the years for its role in bridging the gap between Human and Machine interactions. This study explores the potential of real time FER modelling, to be integrated in a closed loop system, to help in treatment of children suffering from Autism Spectrum Disorder (ASD). The aim of this study is to show the differences between implementing Traditional machine learning and Deep learning approaches for FER modelling. Two classification approaches were taken, the first approach was based on classic machine learning techniques using Histogram of Oriented Gradients (HOG) for feature extraction, with a k-Nearest Neighbor and a Support Vector Machine model as classifiers. The second approach uses Transfer Learning based on the popular “Alex Net” Neural Network architecture. The performance of the approaches was based on the accuracy of randomly selected validation sets after training on random training sets of the Oulu-CASIA database. The data analyzed shows that traditional machine learning methods are as effective as deep neural net models and are a good compromise between accuracy, extracted features, computational speed and costs.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
vardhmaan jain ◽  
Vikram Sharma ◽  
Agam Bansal ◽  
Cerise Kleb ◽  
Chirag Sheth ◽  
...  

Background: Post-transplant major adverse cardiovascular events (MACE) are amongst the leading cause of death amongst orthotopic liver transplant(OLT) recipients. Despite years of guideline directed therapy, there are limited data on predictors of post-OLT MACE. We assessed if machine learning algorithms (MLA) can predict MACE and all-cause mortality in patients undergoing OLT. Methods: We tested three MLA: support vector machine, extreme gradient boosting(XG-Boost) and random forest with traditional logistic regression for prediction of MACE and all-cause mortality on a cohort of consecutive patients undergoing OLT at our center between 2008-2019. The cohort was randomly split into a training (80%) and testing (20%) cohort. Model performance was assessed using c-statistic or AUC. Results: We included 1,459 consecutive patients with mean ± SD age 54.2 ± 13.8 years, 32% female who underwent OLT. There were 199 (13.6%) MACE and 289 (20%) deaths at a mean follow up of 4.56 ± 3.3 years. The random forest MLA was the best performing model for predicting MACE [AUC:0.78, 95% CI: 0.70-0.85] as well as mortality [AUC:0.69, 95% CI: 0.61-0.76], with all models performing better when predicting MACE vs mortality. See Table and Figure. Conclusion: Random forest machine learning algorithms were more predictive and discriminative than traditional regression models for predicting major adverse cardiovascular events and all-cause mortality in patients undergoing OLT. Validation and subsequent incorporation of MLA in clinical decision making for OLT candidacy could help risk stratify patients for post-transplant adverse cardiovascular events.


Sensors ◽  
2020 ◽  
Vol 20 (5) ◽  
pp. 1393 ◽  
Author(s):  
Yanwei Yang ◽  
Xiaojian Hao ◽  
Lili Zhang ◽  
Long Ren

Due to the complexity of, and low accuracy in, iron ore classification, a method of Laser-Induced Breakdown Spectroscopy (LIBS) combined with machine learning is proposed. In the research, we collected LIBS spectra of 10 iron ore samples. At the beginning, principal component analysis algorithm was employed to reduce the dimensionality of spectral data, then we applied k-nearest neighbor model, neural network model, and support vector machine model to the classification. The results showed that the accuracy of three models were 82.96%, 93.33%, and 94.07% respectively. The results also demonstrated that LIBS with machine learning model exhibits an excellent classification performance. Therefore, LIBS technique combined with machine learning can achieve a rapid, precise classification of iron ores, and can provide a completely new method for iron ores’ selection in the metallurgical industry.


2017 ◽  
Vol 36 (3) ◽  
pp. 267-269 ◽  
Author(s):  
Matt Hall ◽  
Brendon Hall

The Geophysical Tutorial in the October issue of The Leading Edge was the first we've done on the topic of machine learning. Brendon Hall's article ( Hall, 2016 ) showed readers how to take a small data set — wireline logs and geologic facies data from nine wells in the Hugoton natural gas and helium field of southwest Kansas ( Dubois et al., 2007 ) — and predict the facies in two wells for which the facies data were not available. The article demonstrated with 25 lines of code how to explore the data set, then create, train and test a machine learning model for facies classification, and finally visualize the results. The workflow took a deliberately naive approach using a support vector machine model. It achieved a sort of baseline accuracy rate — a first-order prediction, if you will — of 0.42. That might sound low, but it's not untypical for a naive approach to this kind of problem. For comparison, random draws from the facies distribution score 0.16, which is therefore the true baseline.


Author(s):  
Pedro Pedrosa Rebouças Filho ◽  
Suane Pires Pinheiro Da Silva ◽  
Jefferson Silva Almeida ◽  
Elene Firmeza Ohata ◽  
Shara Shami Araujo Alves ◽  
...  

Chronic kidney diseases cause over a million deaths worldwide every year. One of the techniques used to diagnose the diseases is renal scintigraphy. However, the way that is processed can vary depending on hospitals and doctors, compromising the reproducibility of the method. In this context, we propose an approach to process the exam using computer vision and machine learning to classify the stage of chronic kidney disease. An analysis of different features extraction methods, such as Gray-Level Co-occurrence Matrix, Structural Co-occurrence Matrix, Local Binary Patters (LBP), Hu's Moments and Zernike's Moments in combination with machine learning methods, such as Bayes, Multi-layer Perceptron, k-Nearest Neighbors, Random Forest and Support Vector Machines (SVM), was performed. The best result was obtained by combining LBP feature extractor with SVM classifier. This combination achieved accuracy of 92.00% and F1-score of 91.00%, indicating that the proposed method is adequate to classify chronic kidney disease in two stages, being a high risk of developing end-stage renal failure and other outcomes, and otherwise.


2021 ◽  
Vol 15 (58) ◽  
pp. 308-318
Author(s):  
Tran-Hieu Nguyen ◽  
Anh-Tuan Vu

In this paper, a machine learning-based framework is developed to quickly evaluate the structural safety of trusses. Three numerical examples of a 10-bar truss, a 25-bar truss, and a 47-bar truss are used to illustrate the proposed framework. Firstly, several truss cases with different cross-sectional areas are generated by employing the Latin Hypercube Sampling method. Stresses inside truss members as well as displacements of nodes are determined through finite element analyses and obtained values are compared with design constraints. According to the constraint verification, the safety state is assigned as safe or unsafe. Members’ sectional areas and the safety state are stored as the inputs and outputs of the training dataset, respectively. Three popular machine learning classifiers including Support Vector Machine, Deep Neural Network, and Adaptive Boosting are used for evaluating the safety of structures. The comparison is conducted based on two metrics: the accuracy and the area under the ROC curve. For the two first examples, three classifiers get more than 90% of accuracy. For the 47-bar truss, the accuracies of the Support Vector Machine model and the Deep Neural Network model are lower than 70% but the Adaptive Boosting model still retains the high accuracy of approximately 98%. In terms of the area under the ROC curve, the comparative results are similar. Overall, the Adaptive Boosting model outperforms the remaining models. In addition, an investigation is carried out to show the influence of the parameters on the performance of the Adaptive Boosting model.


2021 ◽  
Author(s):  
Kaiho Cheung ◽  
Ishmael Rico ◽  
Tao Li ◽  
Yu Sun

In recent years the popularity of anime has steadily grown. Similar to other forms of media consumers often face a pressing issue: “What do I watch next?”. In this study, we thoroughly examined the current method of solving this issue and determined that the learning curve to effectively utilize the current solution is too high. We developed a program to ensure easier answers to the issue. The program uses a Python-based machine learning algorithm from ScikitLearn and data from My Animelist to create an accurate model that delivers what consumers want, good recommendations [9]. We also carried out different experiments with several iterations to study the difference in accuracy when applying different factors. Through these tests, we have successfully created a reliable Support vector machine model with 57% accuracy in recommending users what to watch.


Sign in / Sign up

Export Citation Format

Share Document