Efficient and Private Scoring of Decision Trees, based on Pre-Computation Technique with Support Vector Machines and Logistic Regression Model

doi:10.30534/ijccn/2018/16722018

Prediction of all-cause mortality in coronary artery disease patients with atrial fibrillation based on machine learning models

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-02314-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Xinyun Liu ◽

Jicheng Jiang ◽

Lili Wei ◽

Wenlu Xing ◽

Hailong Shang ◽

...

Keyword(s):

Machine Learning ◽

Atrial Fibrillation ◽

Coronary Artery Disease ◽

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Support Vector ◽

Vector Machines ◽

All Cause Mortality ◽

Artery Disease

Abstract Background Machine learning (ML) can include more diverse and more complex variables to construct models. This study aimed to develop models based on ML methods to predict the all-cause mortality in coronary artery disease (CAD) patients with atrial fibrillation (AF). Methods A total of 2037 CAD patients with AF were included in this study. Three ML methods were used, including the regularization logistic regression, random forest, and support vector machines. The fivefold cross-validation was used to evaluate model performance. The performance was quantified by calculating the area under the curve (AUC) with 95% confidence intervals (CI), sensitivity, specificity, and accuracy. Results After univariate analysis, 24 variables with statistical differences were included into the models. The AUC of regularization logistic regression model, random forest model, and support vector machines model was 0.732 (95% CI 0.649–0.816), 0.728 (95% CI 0.642–0.813), and 0.712 (95% CI 0.630–0.794), respectively. The regularization logistic regression model presented the highest AUC value (0.732 vs 0.728 vs 0.712), specificity (0.699 vs 0.663 vs 0.668), and accuracy (0.936 vs 0.935 vs 0.935) among the three models. However, no statistical differences were observed in the receiver operating characteristic (ROC) curve of the three models (all P > 0.05). Conclusion Combining the performance of all aspects of the models, the regularization logistic regression model was recommended to be used in clinical practice.

Download Full-text

Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2009.11.020 ◽

2010 ◽

Vol 63 (8) ◽

pp. 826-833 ◽

Cited By ~ 168

Author(s):

Daniel Westreich ◽

Justin Lessler ◽

Michele Jonsson Funk

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Support Vector Machines ◽

Propensity Score ◽

Decision Trees ◽

Support Vector ◽

Vector Machines

Download Full-text

Maschinelles Lernen mit Aussagen zur Modellkompetenz

Zeitschrift für Didaktik der Naturwissenschaften ◽

10.1007/s40573-020-00118-7 ◽

2020 ◽

Vol 26 (1) ◽

pp. 157-172

Author(s):

Dirk Krüger ◽

Moritz Krell

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Maschinelles Lernen ◽

Vector Machines

ZusammenfassungVerfahren des maschinellen Lernens können dazu beitragen, Aussagen in Aufgaben im offenen Format in großen Stichproben zu analysieren. Am Beispiel von Aussagen von Biologielehrkräften, Biologie-Lehramtsstudierenden und Fachdidaktiker*innen zu den fünf Teilkompetenzen von Modellkompetenz (NTraining = 456; NKlassifikation = 260) wird die Qualität maschinellen Lernens mit vier Algorithmen (naïve Bayes, logistic regression, support vector machines und decision trees) untersucht. Evidenz für die Validität der Interpretation der Kodierungen einzelner Algorithmen liegt mit zufriedenstellender bis guter Übereinstimmung zwischen menschlicher und computerbasierter Kodierung beim Training (345–607 Aussagen je nach Teilkompetenz) vor, bei der Klassifikation (157–260 Aussagen je nach Teilkompetenz) reduziert sich dies auf eine moderate Übereinstimmung. Positive Korrelationen zwischen dem kodierten Niveau und dem externen Kriterium Antwortlänge weisen darauf hin, dass die Kodierung mit naïve Bayes keine gültigen Ergebnisse liefert. Bedeutsame Attribute, die die Algorithmen bei der Klassifikation nutzen, entsprechen relevanten Begriffen der Niveaufestlegungen im zugrunde liegenden Kodierleitfaden. Abschließend wird diskutiert, inwieweit maschinelles Lernen mit den eingesetzten Algorithmen bei Aussagen zur Modellkompetenz die Qualität einer menschlichen Kodierung erreicht und damit für Zweitkodierungen oder in Vermittlungssituationen genutzt werden könnte.

Download Full-text

An investigation of the factors influencing cost system functionality using decision trees, support vector machines and logistic regression

International Journal of Accounting and Information Management ◽

10.1108/ijaim-04-2017-0052 ◽

2019 ◽

Vol 27 (1) ◽

pp. 27-55 ◽

Cited By ~ 1

Author(s):

Cemil Kuzey ◽

Ali Uyar ◽

Dursun Delen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Prediction Models ◽

Support Vector ◽

Content Type ◽

Cost System ◽

Factors Influencing ◽

Vector Machines

Purpose The paper aims to identify and critically analyze the factors influencing cost system functionality (CSF) using several machine learning techniques including decision trees, support vector machines and logistic regression. Design/methodology/approach The study used a self-administered survey method to collect the necessary data from companies conducting business in Turkey. Several prediction models are developed and tested; a series of sensitivity analyses is performed on the developed prediction models to assess the ranked importance of factors/variables. Findings Certain factors/variables influence CSF much more than others. The findings of the study suggest that utilization of management accounting practices require a functional cost system, which is supported by a comprehensive cost data management process (i.e. acquisition, storage and utilization). Research limitations/implications The underlying data were collected using a questionnaire survey; thus, it is subjective which reflects the perceptions of the respondents. Ideally, it is expected to reflect the objective of the practices of the firms. Second, the authors have measured CSF it on a “Yes” or “No” basis which does not allow survey respondents reply in between them; thus, it might have limited the choices of the respondents. Third, the Likert scales adopted in the measurement of the other constructs might be limiting the answers of the respondents. Practical implications Information technology plays a very important role for the success of CSF practices. That is, successful implementation of a functional cost system relies heavily on a fully integrated information infrastructure capable of constantly feeding CSF with accurate, relevant and timely data. Originality/value In addition to providing evidence regarding the factors underlying CSF based on a broad range of industries interesting finding, this study also illustrates the viability of machine learning methods as a research framework to critically analyze domain specific data.

Download Full-text

Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization

ASEAN Journal on Science and Technology for Development ◽

10.29037/ajstd.680 ◽

2021 ◽

Vol 38 (2) ◽

Author(s):

Nofriani ◽

Novianto Budi Kurniawan

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Classification Accuracy ◽

Naive Bayes ◽

Support Vector ◽

Base Classifier ◽

Binary Relevance ◽

Vector Machines ◽

Power Set

One fashion to report a country’s economic state is by compiling economic phenomena from several sources. The collected data may be explored based on their sentiments and economic categories. This research attempted to perform and analyze multiple approaches to multi-label text classification in addition to providing sentiment analysis on the economic phenomena. The sentiment and single-label category classification was performed utilizing the logistic regression model. Meanwhile, the multi-label category classification was fulfilled using a combination of logistic regression, support vector machines, k-nearest neighbor, naïve Bayes, and decision trees as base classifiers, with binary relevance, classifier chain, and label power set as the implementation approaches. The results showed that logistic regression works well in sentiment and single-label classification, with a classification accuracy of 80.08% and 92.71%, respectively. However, it was also discovered that it works poorly as a base classifier in multi-label classification, indicated by the classification accuracy dropping to 13.35%, 15.40%, and 30.65% for binary relevance, classifier chain, and label power set, respectively. Alternatively, naïve Bayes works best as a base classifier in the label power set approach for multi-label classification, with a classification accuracy of 63.22%, followed by decision trees and support vector machines.

Download Full-text

Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation

IEEE Transactions on Dependable and Secure Computing ◽

10.1109/tdsc.2017.2679189 ◽

2019 ◽

Vol 16 (2) ◽

pp. 217-230 ◽

Cited By ~ 20

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Caleb Horst ◽

Raj Katti ◽

Anderson C. A. Nascimento ◽

...

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Decision Trees ◽

Regression Models ◽

Support Vector ◽

Logistic Regression Models ◽

Vector Machines

Download Full-text

Performance Evaluation of Diagnosis Chronic Kidney Disease using Support Vector Machine and Logistic Regression Model

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.5167.5175 ◽

2019 ◽

Vol 14 (15) ◽

pp. 5167-5175

Author(s):

Rizgar Maghdid Ahmed ◽

Omar Qusay Alshebly

Keyword(s):

Chronic Kidney Disease ◽

Support Vector Machine ◽

Logistic Regression ◽

Performance Evaluation ◽

Kidney Disease ◽

Regression Model ◽

Logistic Regression Model ◽

Support Vector

Download Full-text

Bankruptcy Prediction of Engineering Companies in the EU Using Classification Methods

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201866051347 ◽

2018 ◽

Vol 66 (5) ◽

pp. 1347-1356 ◽

Cited By ~ 1

Author(s):

Michaela Staňková ◽

David Hampel

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Binary Classification ◽

Classification Tree ◽

Classification Trees ◽

Bankruptcy Prediction ◽

Support Vector ◽

Type I ◽

Vector Machines ◽

The Eu

This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.

Download Full-text