scholarly journals Efficient and Private Scoring of Decision Trees, based on Pre-Computation Technique with Support Vector Machines and Logistic Regression Model

Author(s):  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Xinyun Liu ◽  
Jicheng Jiang ◽  
Lili Wei ◽  
Wenlu Xing ◽  
Hailong Shang ◽  
...  

Abstract Background Machine learning (ML) can include more diverse and more complex variables to construct models. This study aimed to develop models based on ML methods to predict the all-cause mortality in coronary artery disease (CAD) patients with atrial fibrillation (AF). Methods A total of 2037 CAD patients with AF were included in this study. Three ML methods were used, including the regularization logistic regression, random forest, and support vector machines. The fivefold cross-validation was used to evaluate model performance. The performance was quantified by calculating the area under the curve (AUC) with 95% confidence intervals (CI), sensitivity, specificity, and accuracy. Results After univariate analysis, 24 variables with statistical differences were included into the models. The AUC of regularization logistic regression model, random forest model, and support vector machines model was 0.732 (95% CI 0.649–0.816), 0.728 (95% CI 0.642–0.813), and 0.712 (95% CI 0.630–0.794), respectively. The regularization logistic regression model presented the highest AUC value (0.732 vs 0.728 vs 0.712), specificity (0.699 vs 0.663 vs 0.668), and accuracy (0.936 vs 0.935 vs 0.935) among the three models. However, no statistical differences were observed in the receiver operating characteristic (ROC) curve of the three models (all P > 0.05). Conclusion Combining the performance of all aspects of the models, the regularization logistic regression model was recommended to be used in clinical practice.


2020 ◽  
Vol 26 (1) ◽  
pp. 157-172
Author(s):  
Dirk Krüger ◽  
Moritz Krell

ZusammenfassungVerfahren des maschinellen Lernens können dazu beitragen, Aussagen in Aufgaben im offenen Format in großen Stichproben zu analysieren. Am Beispiel von Aussagen von Biologielehrkräften, Biologie-Lehramtsstudierenden und Fachdidaktiker*innen zu den fünf Teilkompetenzen von Modellkompetenz (NTraining = 456; NKlassifikation = 260) wird die Qualität maschinellen Lernens mit vier Algorithmen (naïve Bayes, logistic regression, support vector machines und decision trees) untersucht. Evidenz für die Validität der Interpretation der Kodierungen einzelner Algorithmen liegt mit zufriedenstellender bis guter Übereinstimmung zwischen menschlicher und computerbasierter Kodierung beim Training (345–607 Aussagen je nach Teilkompetenz) vor, bei der Klassifikation (157–260 Aussagen je nach Teilkompetenz) reduziert sich dies auf eine moderate Übereinstimmung. Positive Korrelationen zwischen dem kodierten Niveau und dem externen Kriterium Antwortlänge weisen darauf hin, dass die Kodierung mit naïve Bayes keine gültigen Ergebnisse liefert. Bedeutsame Attribute, die die Algorithmen bei der Klassifikation nutzen, entsprechen relevanten Begriffen der Niveaufestlegungen im zugrunde liegenden Kodierleitfaden. Abschließend wird diskutiert, inwieweit maschinelles Lernen mit den eingesetzten Algorithmen bei Aussagen zur Modellkompetenz die Qualität einer menschlichen Kodierung erreicht und damit für Zweitkodierungen oder in Vermittlungssituationen genutzt werden könnte.


Author(s):  
Cemil Kuzey ◽  
Ali Uyar ◽  
Dursun Delen

Purpose The paper aims to identify and critically analyze the factors influencing cost system functionality (CSF) using several machine learning techniques including decision trees, support vector machines and logistic regression. Design/methodology/approach The study used a self-administered survey method to collect the necessary data from companies conducting business in Turkey. Several prediction models are developed and tested; a series of sensitivity analyses is performed on the developed prediction models to assess the ranked importance of factors/variables. Findings Certain factors/variables influence CSF much more than others. The findings of the study suggest that utilization of management accounting practices require a functional cost system, which is supported by a comprehensive cost data management process (i.e. acquisition, storage and utilization). Research limitations/implications The underlying data were collected using a questionnaire survey; thus, it is subjective which reflects the perceptions of the respondents. Ideally, it is expected to reflect the objective of the practices of the firms. Second, the authors have measured CSF it on a “Yes” or “No” basis which does not allow survey respondents reply in between them; thus, it might have limited the choices of the respondents. Third, the Likert scales adopted in the measurement of the other constructs might be limiting the answers of the respondents. Practical implications Information technology plays a very important role for the success of CSF practices. That is, successful implementation of a functional cost system relies heavily on a fully integrated information infrastructure capable of constantly feeding CSF with accurate, relevant and timely data. Originality/value In addition to providing evidence regarding the factors underlying CSF based on a broad range of industries interesting finding, this study also illustrates the viability of machine learning methods as a research framework to critically analyze domain specific data.


Author(s):  
Nofriani ◽  
Novianto Budi Kurniawan

One fashion to report a country’s economic state is by compiling economic phenomena from several sources. The collected data may be explored based on their sentiments and economic categories. This research attempted to perform and analyze multiple approaches to multi-label text classification in addition to providing sentiment analysis on the economic phenomena. The sentiment and single-label category classification was performed utilizing the logistic regression model. Meanwhile, the multi-label category classification was fulfilled using a combination of logistic regression, support vector machines, k-nearest neighbor, naïve Bayes, and decision trees as base classifiers, with binary relevance, classifier chain, and label power set as the implementation approaches. The results showed that logistic regression works well in sentiment and single-label classification, with a classification accuracy of 80.08% and 92.71%, respectively. However, it was also discovered that it works poorly as a base classifier in multi-label classification, indicated by the classification accuracy dropping to 13.35%, 15.40%, and 30.65% for binary relevance, classifier chain, and label power set, respectively. Alternatively, naïve Bayes works best as a base classifier in the label power set approach for multi-label classification, with a classification accuracy of 63.22%, followed by decision trees and support vector machines.


2019 ◽  
Vol 16 (2) ◽  
pp. 217-230 ◽  
Author(s):  
Martine De Cock ◽  
Rafael Dowsley ◽  
Caleb Horst ◽  
Raj Katti ◽  
Anderson C. A. Nascimento ◽  
...  

Author(s):  
Michaela Staňková ◽  
David Hampel

This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.


Sign in / Sign up

Export Citation Format

Share Document