Support vector machines versus logistic regression: improving prospective performance in clinical decision-making

2006 ◽  
Vol 27 (6) ◽  
pp. 607-608 ◽  
Author(s):  
N. L. M. M. Pochet ◽  
J. A. K. Suykens
Author(s):  
Michaela Staňková ◽  
David Hampel

This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.


2019 ◽  
Vol 3 (2) ◽  
pp. 77
Author(s):  
Herlina Herlina ◽  
Ahmad Ridho’i ◽  
Anggie Erma Yunita ◽  
Mega Puja Azhari ◽  
Ade Reynaldi Saputra

Kesulitan keuangan (financial distress) adalah sebuah tahapan yang akan dilalui oleh sebuah perusahaan sebelum mengalami kebangkrutan. Dengan alasan tersebut maka kemampuan untuk memprediksi kesulitan keuangan dapat menjadi informasi yang bermanfaat bagi perusahaan maupun investor. Penelitian mengenai financial distress sudah dimulai dari penelitian Altman pada tahun 1968 menggunakan metode Multiple Discriminant Analysis (MDA). Dimulai dari penelitian Altman, muncul penelitian-penelitian lainnya menggunakan pengembangan metode statistik, seperti Logistic Regression. Dari metode statistik kemudian berkembang dengan munculnya penelitian-penelitian menggunakan metode-metode kecerdasan buatan, serta algoritma evolusi untuk berusaha mendapatkan model prediksi financial distress yang akurat. Tujuan dari penelitian ini adalah untuk membandingkan tingkat akurasi dari model prediksi financial distress perusahaan manufaktur terbuka pada sektor industri barang konsumsi yang terdaftar pada Bursa Efek Indonesia menggunakan metode kecerdasan buatan serta algoritma evolusi. Metode yang digunakan untuk metode kecerdasan buatan adalah metode Support Vector Machines dan untuk model algoritma evolusi menggunakan metode Particle Swarm Optimization-Support Vector Machines. Tingkat akurasi dari masing-masing metode akan diukur dari prosentase misklasifikasi terkecil yang dihasilkan. Dari pengujian model menggunakan metode Support Vector Machines, didapatkan tingkat misklasifikasi terkecil sebesar 11,11% dengan menggunakan Kernel Linear dan untuk metode Particle Swarm Optimization-Support Vector Machines, didapatkan tingkat misklasifikasi terkecil sebesar 5,56% dengan menggunakan Kernel RBF, ? = 2.


Author(s):  
Mojtaba Montazery ◽  
Nic Wilson

Support Vector Machines (SVM) are among the most well-known machine learning methods, with broad use in different scientific areas. However, one necessary pre-processing phase for SVM is normalization (scaling) of features, since SVM is not invariant to the scales of the features’ spaces, i.e., different ways of scaling may lead to different results. We define a more robust decision-making approach for binary classification, in which one sample strongly belongs to a class if it belongs to that class for all possible rescalings of features. We derive a way of characterising the approach for binary SVM that allows determining when an instance strongly belongs to a class and when the classification is invariant to rescaling. The characterisation leads to a computation method to determine whether one sample is strongly positive, strongly negative or neither. Our experimental results back up the intuition that being strongly positive suggests stronger confidence that an instance really is positive.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Hao Sen Andrew Fang ◽  
Ngiap Chuan Tan ◽  
Wei Ying Tan ◽  
Ronald Wihal Oei ◽  
Mong Li Lee ◽  
...  

Abstract Background Clinical risk prediction models (CRPMs) use patient characteristics to estimate the probability of having or developing a particular disease and/or outcome. While CRPMs are gaining in popularity, they have yet to be widely adopted in clinical practice. The lack of explainability and interpretability has limited their utility. Explainability is the extent of which a model’s prediction process can be described. Interpretability is the degree to which a user can understand the predictions made by a model. Methods The study aimed to demonstrate utility of patient similarity analytics in developing an explainable and interpretable CRPM. Data was extracted from the electronic medical records of patients with type-2 diabetes mellitus, hypertension and dyslipidaemia in a Singapore public primary care clinic. We used modified K-nearest neighbour which incorporated expert input, to develop a patient similarity model on this real-world training dataset (n = 7,041) and validated it on a testing dataset (n = 3,018). The results were compared using logistic regression, random forest (RF) and support vector machine (SVM) models from the same dataset. The patient similarity model was then implemented in a prototype system to demonstrate the identification, explainability and interpretability of similar patients and the prediction process. Results The patient similarity model (AUROC = 0.718) was comparable to the logistic regression (AUROC = 0.695), RF (AUROC = 0.764) and SVM models (AUROC = 0.766). We packaged the patient similarity model in a prototype web application. A proof of concept demonstrated how the application provided both quantitative and qualitative information, in the form of patient narratives. This information was used to better inform and influence clinical decision-making, such as getting a patient to agree to start insulin therapy. Conclusions Patient similarity analytics is a feasible approach to develop an explainable and interpretable CRPM. While the approach is generalizable, it can be used to develop locally relevant information, based on the database it searches. Ultimately, such an approach can generate a more informative CRPMs which can be deployed as part of clinical decision support tools to better facilitate shared decision-making in clinical practice.


Sign in / Sign up

Export Citation Format

Share Document