Multi-Step Classification Trees

High software reliability is an important attribute of high-assurance systems. Software quality models yield timely predictions of quality indicators on a module-by-module basis, enabling one to focus on finding faults early in development. This paper introduces the Classification And Regression Trees (CART) a algorithm to practitioners in high-assurance systems engineering. This paper presents practical lessons learned on building classification trees for software quality modeling, including an innovative way to control the balance between misclassification rates. A case study of a very large telecommunications system used CART to build software quality models. The models predicted whether or not modules would have faults discovered by customers, based on various sets of software product and process metrics as independent variables. We found that a model based on two software product metrics had comparable accuracy to a model based on forty product and process metrics.

Download Full-text

Classification Trees

Generalized Linear Models ◽

10.1201/9781482293456-31 ◽

2000 ◽

pp. 383-390

Keyword(s):

Classification Trees

Download Full-text

The Application of Classification Trees to Pharmacy School Admissions

American Journal of Pharmaceutical Education ◽

10.5688/ajpe6980 ◽

2018 ◽

Vol 82 (7) ◽

pp. 6980

Author(s):

Samuel C. Karpen ◽

Steve C. Ellis

Keyword(s):

Classification Trees

Download Full-text

Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data

Entropy ◽

10.3390/e23091210 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1210

Author(s):

Elzbieta Turska ◽

Szymon Jurga ◽

Jaroslaw Piskorski

Keyword(s):

Missing Data ◽

Secondary School ◽

Mood Disorder ◽

Random Forests ◽

Economic Status ◽

Related Result ◽

Classification Trees ◽

Lower Secondary School ◽

School Students ◽

Surrogate Variables

We apply tree-based classification algorithms, namely the classification trees, with the use of the rpart algorithm, random forests and XGBoost methods to detect mood disorder in a group of 2508 lower secondary school students. The dataset presents many challenges, the most important of which is many missing data as well as the being heavily unbalanced (there are few severe mood disorder cases). We find that all algorithms are specific, but only the rpart algorithm is sensitive; i.e., it is able to detect cases of real cases mood disorder. The conclusion of this paper is that this is caused by the fact that the rpart algorithm uses the surrogate variables to handle missing data. The most important social-studies-related result is that the adolescents’ relationships with their parents are the single most important factor in developing mood disorders—far more important than other factors, such as the socio-economic status or school success.

Download Full-text