DISCRIMINANT ANALYSIS AND CLASSIFICATION TREE

2018 ◽  
Author(s):  
Παντελής Σταυρούλιας

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).


Worldwide, breast cancer is the leading type of cancer in women accounting for 25% of all cases. Survival rates in the developed countries are comparatively higher with that of developing countries. This had led to the importance of computer aided diagnostic methods for early detection of breast cancer disease. This eventually reduces the death rate. This paper intents the scope of the biomarker that can be used to predict the breast cancer from the anthropometric data. This experimental study aims at computing and comparing various classification models (Binary Logistic Regression, Ball Vector Machine (BVM), C4.5, Partial Least Square (PLS) for Classification, Classification Tree, Cost sensitive Classification Tree, Cost sensitive Decision Tree, Support Vector Machine for Classification, Core Vector Machine, ID3, K-Nearest Neighbor, Linear Discriminant Analysis (LDA), Log-Reg TRIRLS, Multi Layer Perceptron (MLP), Multinomial Logistic Regression (MLR), Naïve Bayes (NB), PLS for Discriminant Analysis, PLS for LDA, Random Tree (RT), Support Vector Machine SVM) for the UCI Coimbra breast cancer dataset. The feature selection algorithms (Backward Logit, Fisher Filtering, Forward Logit, ReleifF, Step disc) are worked out to find out the minimum attributes that can achieve a better accuracy. To ascertain the accuracy results, the Jack-knife cross validation method for the algorithms is conducted and validated. The Core vector machine classification algorithm outperforms the other nineteen algorithms with an accuracy of 82.76%, sensitivity of 76.92% and specificity of 87.50% for the selected three attributes, Age, Glucose and Resistin using ReleifF feature selection algorithm.


Neurology ◽  
2020 ◽  
Vol 95 (9) ◽  
pp. e1163-e1173
Author(s):  
Aleksandra Mineyko ◽  
Alberto Nettel-Aguirre ◽  
Pauline de Jesus ◽  
Susanne Benseler ◽  
Kamran Yusuf ◽  
...  

ObjectiveTo examine the relationship between neonatal inflammatory cytokines and perinatal stroke using a systems biology approach analyzing serum and blood-spot cytokines from 47 patients.MethodsThis was a population-based, controlled cohort study with prospective and retrospective case ascertainment. Participants were recruited through the Alberta Perinatal Stroke Project. Stroke was classified as neonatal arterial ischemic stroke (NAIS), arterial presumed perinatal ischemic stroke (APPIS), or periventricular venous infarction (PVI). Biosamples were stored blood spots (retrospective) and acute serum (prospective). Controls had comparable gestational and maternal ages. Sixty-five cytokines were measured (Luminex). Hierarchical clustering analysis was performed to create heat maps. The Fisher linear discriminant analysis was used to create projection models to determine discriminatory boundaries between stroke types and controls.ResultsA total of 197 participants were analyzed (27 with NAIS, 8 with APPIS, 12 with PVI, 150 controls). Cytokines were quantifiable with quality control measures satisfied (standards testing, decay analysis). Linear discriminant analysis had high accuracy in using cytokine profiles to separate groups. Profiles in participants with PVI and controls were similar. NAIS separation was accurate (sensitivity 77%, specificity 97%). APPIS mapping was also distinguishable from NAIS (sensitivity 86%, specificity 99%). Classification tree analysis generated similar diagnostic accuracy.ConclusionsUnique inflammatory biomarker signatures are associated with specific perinatal stroke diseases. Findings support an acquired pathophysiology and suggest the possibility that at-risk pregnancies might be identified to develop prevention strategies.Classification of evidenceThis study provides Class III evidence that differences in acute neonatal serum cytokine profiles can discriminate between patients with specific perinatal stroke diseases and controls.


1987 ◽  
Vol 17 (9) ◽  
pp. 1150-1152 ◽  
Author(s):  
David L. Verbyla

Classification trees are discriminant models structured as dichtomous keys. A simple classification tree is presented and contrasted with a linear discriminant function. Classification trees have several advantages when compared with linear discriminant analysis. The method is robust with respect to outlier cases. It is nonparametric and can use nominal, ordinal, interval, and ratio scaled predictor variables. Cross-validation is used during tree development to prevent overrating the tree with too many predictor variables. Missing values are handled by using surrogate splits based on nonmissing predictor variables. Classification trees, like linear discriminant analysis, have potential prediction bias and therefore should be validated before being accepted.


Phytotaxa ◽  
2018 ◽  
Vol 333 (1) ◽  
pp. 41 ◽  
Author(s):  
JOAQUÍN MORENO ◽  
ALEJANDRO TERRONES ◽  
MARÍA ÁNGELES ALONSO ◽  
ANA JUAN ◽  
MANUEL B. CRESPO

Limonium latebracteatum is a plant species from the central and northeastern Iberian Peninsula, characterised by an inner bract wider than long, glaucous leaves, and wide petioles, which belongs to the Limonium delicatulum group. The L. delicatulum group is a complex group formed by around fifteen Iberian and Balearic species, including endemisms with narrow distributions, which is highly diversified in the Mediterranean territories of the Iberian Peninsula. The species in that group are similar each other and occasionally they are not well-delimited morphologically. In this framework, a taxonomic revision of L. latebracteatum and close species in the Iberian Peninsula has been carried out to clarify the taxonomy of L. latebracteatum and L. carpetanicum. This revision has been based on morphological features and supported by ordination analyses such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). In addition, a classification tree was performed to support these analyses. As a result of this study, L. latebracteatum has been separated in two differentspecies: L. latebracteatum and L. admirabile, a new species endemic to Albacete province which is described here. Finally, a diagnostic key is provided for the L. delicatulum group to allow identification of the species in this group.


1999 ◽  
Vol 56 (4) ◽  
pp. 661-669 ◽  
Author(s):  
Edward E Emmons ◽  
Martin J Jennings ◽  
Clayton Edwards

Wisconsin has nearly 15 000 lakes with great variation in limnology, morphometry, and origin. Classification of lakes into groups is a continuing goal. This study examines two alternative approaches to lake classification, one common and the other somewhat novel. Both approaches used lake morphometry and limnological variables and were compared for ability to form groups and assign lakes to groups with a high probability of correct classification. The first approach used nonhierarchical cluster analysis to form lake groups and discriminant analysis to put lakes into these groups. The second approach formed lake groups by iterative dichotomous splitting of the sampling space into smaller and smaller subspaces. Each binary split was done using nonhierarchical cluster analyses on a subset of the original variables. This iterative splitting resulted in a hierarchical classification tree with reduced dimensionality in comparison with the original data set. At each branch, multiple logistic regression was used to place lakes into nodes of the tree. Validation of both approaches was performed with a resubstitution analysis of the model building data set as well as a separate validation data set. The decision tree method yielded significantly lower rates of misclassification and was more easily interpreted than the discriminant analysis approach.


Author(s):  
Vadim V. Belenky ◽  
O. V. Leontiev ◽  
O. A. Klicenko ◽  
V. Ya. Gelman ◽  
E. M. Koroleva ◽  
...  

Dystonia is the debilitating movement disorder of central nervous system, often inherited, appearing as involuntary movements that occur due to deficiency or excess of neurotransmitters. The penetrance of dystonia is 30%, which means, that inherited dystonia is manifested only in 30% of mutating gene carriers, while the rest suffer from latent forms, so called forms frustes of this disorder. Until now only few mutations responsible for dystonia, had been unveiled, but we expect to exist up to 100 such mutations. Unless we uncover all mutations responsible for dystonia, we require reliable test for diagnosing latent forms of dystonia; and this necessity explains the importance of present study. The purpose of this research was to elaborate discrimination of dystonia on the basis of biogenic amines exchange peculiarities. The study presents the observational case control study. The control group was randomly composed of those patients, who were checked for neuroglial tumors. We checked catecholamines and serotonin metabolites in plasma and urine of 12 dystonia patients main group by means of chromatography method and compared the results obtained from these two groups by means of the decision tree method, discriminant analysis, and factor analysis. We revealed increased serotonin turnover in dystonia, and on the base of those increased metabolites in plasma, such as 5-hydroxytryptophane and 5-hydroxiindolacetic acid, by means of advanced statistical methods we eleborated sensitive and specific test for diagnosis of dystonia. We recommend introducing into clinical practice of diagnostic tests for dystonia on the base of analysis of diagnosing level in plasma of 5-hydroxytryptophane and 5-hydroxiindolacetic acid by means of discriminant analysis and classification tree method due to high sensitivity and high specifity of those methods.


Sign in / Sign up

Export Citation Format

Share Document