Comparing Ensemble-Based Machine Learning Classifiers Developed for Distinguishing Hypokinetic Dysarthria from Presbyphonia

It is essential to understand the voice characteristics in the normal aging process to accurately distinguish presbyphonia from neurological voice disorders. This study developed the best ensemble-based machine learning classifier that could distinguish hypokinetic dysarthria from presbyphonia using classification and regression tree (CART), random forest, gradient boosting algorithm (GBM), and XGBoost and compared the prediction performance of models. The subjects of this study were 76 elderly patients diagnosed with hypokinetic dysarthria and 174 patients with presbyopia. This study developed prediction models for distinguishing hypokinetic dysarthria from presbyphonia by using CART, GBM, XGBoost, and random forest and compared the accuracy, sensitivity, and specificity of the development models to identify the prediction performance of them. The results of this study showed that random forest had the best prediction performance when it was tested with the test dataset (accuracy = 0.83, sensitivity = 0.90, and specificity = 0.80, and area under the curve (AUC) = 0.85). The main predictors for detecting hypokinetic dysarthria were Cepstral peak prominence (CPP), jitter, shimmer, L/H ratio, L/H ratio_SD, CPP max (dB), CPP min (dB), and CPPF0 in the order of magnitude. Among them, CPP was the most important predictor for identifying hypokinetic dysarthria.

Download Full-text

Modeling Road Accident Severity with Comparisons of Logistic Regression, Decision Tree and Random Forest

Information ◽

10.3390/info11050270 ◽

2020 ◽

Vol 11 (5) ◽

pp. 270 ◽

Cited By ~ 1

Author(s):

Mu-Ming Chen ◽

Mu-Chen Chen

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Sensitivity And Specificity ◽

Prediction Models ◽

Regression Tree ◽

Positive Influence ◽

Road Accident ◽

Prediction Performance ◽

Classification And Regression Tree ◽

Accident Severity

To reduce the damage caused by road accidents, researchers have applied different techniques to explore correlated factors and develop efficient prediction models. The main purpose of this study is to use one statistical and two nonparametric data mining techniques, namely, logistic regression (LR), classification and regression tree (CART), and random forest (RF), to compare their prediction capability, identify the significant variables (identified by LR) and important variables (identified by CART or RF) that are strongly correlated with road accident severity, and distinguish the variables that have significant positive influence on prediction performance. In this study, three prediction performance evaluation measures, accuracy, sensitivity and specificity, are used to find the best integrated method which consists of the most effective prediction model and the input variables that have higher positive influence on accuracy, sensitivity and specificity.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Predicting mortality in hemodialysis patients using machine learning analysis

Clinical Kidney Journal ◽

10.1093/ckj/sfaa126 ◽

2020 ◽

Author(s):

Victoria Garcia-Montemayor ◽

Alejandro Martin-Malo ◽

Carlo Barbieri ◽

Francesco Bellocchio ◽

Sagrario Soriano ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Prediction Models ◽

Initial Period ◽

Area Under The Curve ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Prediction Of Mortality ◽

Mortality Prediction Models

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.

Download Full-text

Estimating the Optimal Dexketoprofen Pharmaceutical Formulation with Machine Learning Methods and Statistical Approaches

Healthcare Informatics Research ◽

10.4258/hir.2021.27.4.279 ◽

2021 ◽

Vol 27 (4) ◽

pp. 279-286

Author(s):

Atakan Başkor ◽

Yağmur Pirinçci Tok ◽

Burcu Mesut ◽

Yıldız Özsoy ◽

Tamer Uçar

Keyword(s):

Machine Learning ◽

Regression Tree ◽

Cost Effective ◽

Pharmaceutical Formulation ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Disintegration Time ◽

Pharmaceutical Dosage Form ◽

Extreme Gradient Boosting

Objectives: Orally disintegrating tablets (ODTs) can be utilized without any drinking water; this feature makes ODTs easy to use and suitable for specific groups of patients. Oral administration of drugs is the most commonly used route, and tablets constitute the most preferable pharmaceutical dosage form. However, the preparation of ODTs is costly and requires long trials, which creates obstacles for dosage trials. The aim of this study was to identify the most appropriate formulation using machine learning (ML) models of ODT dexketoprofen formulations, with the goal of providing a cost-effective and timereducing solution.Methods: This research utilized nonlinear regression models, including the k-nearest neighborhood (k-NN), support vector regression (SVR), classification and regression tree (CART), bootstrap aggregating (bagging), random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGBoost) methods, as well as the t-test, to predict the quantity of various components in the dexketoprofen formulation within fixed criteria.Results: All the models were developed with Python libraries. The performance of the ML models was evaluated with R2 values and the root mean square error. Hardness values of 0.99 and 2.88, friability values of 0.92 and 0.02, and disintegration time values of 0.97 and 10.09 using the GBM algorithm gave the best results.Conclusions: In this study, we developed a computational approach to estimate the optimal pharmaceutical formulation of dexketoprofen. The results were evaluated by an expert, and it was found that they complied with Food and Drug Administration criteria.

Download Full-text

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

BMC Public Health ◽

10.1186/s12889-021-11829-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Susan Idicula-Thomas ◽

Ulka Gawde ◽

Prabhat Jha

Keyword(s):

Machine Learning ◽

Verbal Autopsy ◽

Causes Of Death ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Diarrhoeal Diseases

Abstract Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.

Download Full-text

Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

PeerJ ◽

10.7717/peerj.2849 ◽

2017 ◽

Vol 5 ◽

pp. e2849 ◽

Cited By ~ 44

Author(s):

Chunrong Mi ◽

Falk Huettmann ◽

Yumin Guo ◽

Xuesong Han ◽

Lijia Wen

Keyword(s):

Random Forest ◽

Species Distribution ◽

Performance Metrics ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Supporting Evidence ◽

Boosted Regression Tree ◽

Stochastic Gradient Boosting

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.

Download Full-text

Why to choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

10.7287/peerj.preprints.2517 ◽

2016 ◽

Author(s):

Chunrong Mi ◽

Falk Huettmann ◽

Yumin Guo ◽

Xuesong Han ◽

Lijia Wen

Keyword(s):

Random Forest ◽

Species Distribution ◽

Performance Metrics ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Supporting Evidence ◽

Boosted Regression Tree ◽

Stochastic Gradient Boosting

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.

Download Full-text

Machine learning-based prediction models for accidental hypothermia patients

Journal of Intensive Care ◽

10.1186/s40560-021-00525-z ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yohei Okada ◽

Tasuku Matsuyama ◽

Sachiko Morita ◽

Naoki Ehara ◽

Nobuhiro Miyamae ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Random Forest ◽

Hospital Mortality ◽

Retrospective Cohort Study ◽

Retrospective Cohort ◽

Prediction Models ◽

Validation Cohort ◽

Accidental Hypothermia ◽

Gradient Boosting

Abstract Background Accidental hypothermia is a critical condition with high risks of fatal arrhythmia, multiple organ failure, and mortality; however, there is no established model to predict the mortality. The present study aimed to develop and validate machine learning-based models for predicting in-hospital mortality using easily available data at hospital admission among the patients with accidental hypothermia. Method This study was secondary analysis of multi-center retrospective cohort study (J-point registry) including patients with accidental hypothermia. Adult patients with body temperature 35.0 °C or less at emergency department were included. Prediction models for in-hospital mortality using machine learning (lasso, random forest, and gradient boosting tree) were made in development cohort from six hospitals, and the predictive performance were assessed in validation cohort from other six hospitals. As a reference, we compared the SOFA score and 5A score. Results We included total 532 patients in the development cohort [N = 288, six hospitals, in-hospital mortality: 22.0% (64/288)], and the validation cohort [N = 244, six hospitals, in-hospital mortality 27.0% (66/244)]. The C-statistics [95% CI] of the models in validation cohorts were as follows: lasso 0.784 [0.717–0.851] , random forest 0.794[0.735–0.853], gradient boosting tree 0.780 [0.714–0.847], SOFA 0.787 [0.722–0.851], and 5A score 0.750[0.681–0.820]. The calibration plot showed that these models were well calibrated to observed in-hospital mortality. Decision curve analysis indicated that these models obtained clinical net-benefit. Conclusion This multi-center retrospective cohort study indicated that machine learning-based prediction models could accurately predict in-hospital mortality in validation cohort among the accidental hypothermia patients. These models might be able to support physicians and patient’s decision-making. However, the applicability to clinical settings, and the actual clinical utility is still unclear; thus, further prospective study is warranted to evaluate the clinical usefulness.

Download Full-text

Machine learning in perioperative medicine: a systematic review

Journal of Anesthesia, Analgesia and Critical Care ◽

10.1186/s44158-022-00033-y ◽

2022 ◽

Vol 2 (1) ◽

Author(s):

Valentina Bellini ◽

Marina Valente ◽

Giorgia Bertorelli ◽

Barbara Pifferi ◽

Michelangelo Craca ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Health Care ◽

Random Forest ◽

Risk Stratification ◽

Prediction Models ◽

Cochrane Library ◽

Gradient Boosting ◽

Support Vector ◽

Systemic Complications

Abstract Background Risk stratification plays a central role in anesthetic evaluation. The use of Big Data and machine learning (ML) offers considerable advantages for collection and evaluation of large amounts of complex health-care data. We conducted a systematic review to understand the role of ML in the development of predictive post-surgical outcome models and risk stratification. Methods Following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, we selected the period of the research for studies from 1 January 2015 up to 30 March 2021. A systematic search in Scopus, CINAHL, the Cochrane Library, PubMed, and MeSH databases was performed; the strings of research included different combinations of keywords: “risk prediction,” “surgery,” “machine learning,” “intensive care unit (ICU),” and “anesthesia” “perioperative.” We identified 36 eligible studies. This study evaluates the quality of reporting of prediction models using the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) checklist. Results The most considered outcomes were mortality risk, systemic complications (pulmonary, cardiovascular, acute kidney injury (AKI), etc.), ICU admission, anesthesiologic risk and prolonged length of hospital stay. Not all the study completely followed the TRIPOD checklist, but the quality was overall acceptable with 75% of studies (Rev #2, comm #minor issue) showing an adherence rate to TRIPOD more than 60%. The most frequently used algorithms were gradient boosting (n = 13), random forest (n = 10), logistic regression (LR; n = 7), artificial neural networks (ANNs; n = 6), and support vector machines (SVM; n = 6). Models with best performance were random forest and gradient boosting, with AUC > 0.90. Conclusions The application of ML in medicine appears to have a great potential. From our analysis, depending on the input features considered and on the specific prediction task, ML algorithms seem effective in outcomes prediction more accurately than validated prognostic scores and traditional statistics. Thus, our review encourages the healthcare domain and artificial intelligence (AI) developers to adopt an interdisciplinary and systemic approach to evaluate the overall impact of AI on perioperative risk assessment and on further health care settings as well.

Download Full-text

Improvement of Adequate Digoxin Dosage: An Application of Machine Learning Approach

Journal of Healthcare Engineering ◽

10.1155/2018/3948245 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Ya-Han Hu ◽

Chun-Tien Tai ◽

Chih-Fong Tsai ◽

Min-Wei Huang

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Prediction Models ◽

Medical Center ◽

Laboratory Data ◽

Regression Tree ◽

Digoxin Concentration ◽

Classification And Regression Tree ◽

Machine Learning Techniques ◽

Learning Techniques

Digoxin is a high-alert medication because of its narrow therapeutic range and high drug-to-drug interactions (DDIs). Approximately 50% of digoxin toxicity cases are preventable, which motivated us to improve the treatment outcomes of digoxin. The objective of this study is to apply machine learning techniques to predict the appropriateness of initial digoxin dosage. A total of 307 inpatients who had their conditions treated with digoxin between 2004 and 2013 at a medical center in Taiwan were collected in the study. Ten independent variables, including demographic information, laboratory data, and whether the patients had CHF were also noted. A patient with serum digoxin concentration being controlled at 0.5–0.9 ng/mL after his/her initial digoxin dosage was defined as having an appropriate use of digoxin; otherwise, a patient was defined as having an inappropriate use of digoxin. Weka 3.7.3, an open source machine learning software, was adopted to develop prediction models. Six machine learning techniques were considered, including decision tree (C4.5), k-nearest neighbors (kNN), classification and regression tree (CART), randomForest (RF), multilayer perceptron (MLP), and logistic regression (LGR). In the non-DDI group, the area under ROC curve (AUC) of RF (0.912) was excellent, followed by that of MLP (0.813), CART (0.791), and C4.5 (0.784); the remaining classifiers performed poorly. For the DDI group, the AUC of RF (0.892) was the best, followed by CART (0.795), MLP (0.777), and C4.5 (0.774); the other classifiers’ performances were less than ideal. The decision tree-based approaches and MLP exhibited markedly superior accuracy performance, regardless of DDI status. Although digoxin is a high-alert medication, its initial dose can be accurately determined by using data mining techniques such as decision tree-based and MLP approaches. Developing a dosage decision support system may serve as a supplementary tool for clinicians and also increase drug safety in clinical practice.

Download Full-text