classification tree model
Recently Published Documents


TOTAL DOCUMENTS

45
(FIVE YEARS 18)

H-INDEX

10
(FIVE YEARS 2)

Author(s):  
V. Dudnyk ◽  
O. Grishchyn ◽  
V. Netrebko ◽  
R. Prus ◽  
M. Voloshcuk

An effective mechanism for the synthesis of classification trees based on fixed initial information (in the form of a training sample) for the task of recognizing the technical condition of samples of weapons and military equipment. The constructed algorithmic classification tree (model) will unmistakably classify (recognize) the entire training sample (situational objects) according to which the classification scheme is constructed. And have a minimal structure (structural complexity) and consist of components (modules) - autonomous algorithms for classification and recognition as vertices of the structure (attributes of the tree). The developed method of building models of algorithm trees (classification schemes) allows you to work with training samples of a large amount of different types of information (discrete type). Provides high accuracy, speed and economy of hardware resources in the process of generating the final classification scheme, build classification trees (models) with a predetermined accuracy. The approach of synthesis of new algorithms of recognition (classification) on the basis of library (set) of already known algorithms (schemes) and methods is offered. Based on the proposed concept of algorithmic classification trees, a set of models was built, which provided effective classification and prediction of the technical condition of samples. The paper proposes a set of general indicators (parameters), which allows to effectively present the general characteristics of the classification tree model, it is possible to use it to select the most optimal tree of algorithms from a set based on methods of random classification trees. Practical tests have confirmed the efficiency of mathematical software and models of algorithm trees.


2021 ◽  
Vol 16 (3) ◽  
pp. 21-25
Author(s):  
Paolo Giudici ◽  
◽  
Giulia Marini ◽  

The detection of money laundering is a very important problem, especially in the financial sector. We propose a mathematical specification of the problem in terms of a classification tree model that ”automates” expert based manual decisions. We operationally validate the model on a concrete application that originates from a large Italian bank. The application of the model to the data shows a good predictive accuracy and, even more importantly, the reduction of false positives, with respect to the ”manual” expert based activity. From an interpretational viewpoint, while some drivers of suspicious laundering activity are in line with the daily business practices of the bank’s anti money laundering operations, some others are new discoveries.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jia-Cheng Shi ◽  
Xiao-Huan Chen ◽  
Qiong Yang ◽  
Cai-Mei Wang ◽  
Qian Huang ◽  
...  

AbstractCurrently, the most widely used screening methods for hyperuricemia (HUA) involves invasive laboratory tests, which are lacking in many rural hospitals in China. This study explored the use of non-invasive physical examinations to construct a simple prediction model for HUA, in order to reduce the economic burden and invasive operations such as blood sampling, and provide some help for the health management of people in poor areas with backward medical resources. Data of 9252 adults from April to June 2017 in the Affiliated Hospital of Guilin Medical College were collected and divided randomly into a training set (n = 6364) and a validation set (n = 2888) at a ratio of 7:3. In the training set, non-invasive physical examination indicators of age, gender, body mass index (BMI) and prevalence of hypertension were included for logistic regression analysis, and a nomogram model was established. The classification and regression tree (CART) algorithm of the decision tree model was used to build a classification tree model. Receiver operating characteristic (ROC) curve, calibration curve and decision curve analyses (DCA) were used to test the distinction, accuracy and clinical applicability of the two models. The results showed age, gender, BMI and prevalence of hypertension were all related to the occurrence of HUA. The area under the ROC curve (AUC) of the nomogram model was 0.806 and 0.791 in training set and validation set, respectively. The AUC of the classification tree model was 0.802 and 0.794 in the two sets, respectively, but were not statistically different. The calibration curves and DCAs of the two models performed well on accuracy and clinical practicality, which suggested these models may be suitable to predict HUA for rural setting.


2021 ◽  
Vol 20 (2) ◽  
pp. 147-163
Author(s):  
M. Mandorino ◽  
A.J. Figueiredo ◽  
G. Cima ◽  
A. Tessitore

Abstract Predicting and avoiding an injury is a challenging task. By exploiting data mining techniques, this paper aims to identify existing relationships between modifiable and non-modifiable risk factors, with the final goal of predicting non-contact injuries. Twenty-three young soccer players were monitored during an entire season, with a total of fifty-seven non-contact injuries identified. Anthropometric data were collected, and the maturity offset was calculated for each player. To quantify internal training/match load and recovery status of the players, we daily employed the session-RPE method and the total quality recovery (TQR) scale. Cumulative workloads and the acute: chronic workload ratio (ACWR) were calculated. To explore the relationship between the various risk factors and the onset of non-contact injuries, we performed a classification tree analysis. The classification tree model exhibited an acceptable discrimination (AUC=0.76), after receiver operating characteristic curve (ROC) analysis. A low state of recovery, a rapid increase in the training load, cumulative workload, and maturity offset were recognized by the data mining algorithm as the most important injury risk factors.


2021 ◽  
Author(s):  
Youcef Azeli ◽  
Alberto Fernández ◽  
Federico Capriles ◽  
Wojciech Rojewski ◽  
Vanesa Lopez-Madrid ◽  
...  

Abstract The early detection of symptoms and rapid testing are the basis of an efficient screening strategy to control COVID-19 transmission. Most COVID-19 patients show olfactory dysfunction and in many cases this is the first symptom. This study aims to develop a machine learning COVID-19 predictive tool based on symptoms and a simple olfactory test, which consists of identifying the smell of an aromatized hydroalcoholic gel (CovidGel Test). A multi-centre population-based prospective study was carried out in the city of Reus (Catalonia, Spain). A total of 519 patients were included, 386 (74.4%) had at least one symptom and 133 (25.6%) were asymptomatic. A classification tree model including sex, age, relevant symptoms and the CovidGel Test results obtained a sensitivity of 0.97 (95% CI 0.91–0.99), a specificity of 0.39 (95% CI 0.34–0.44) and an AUC of 0.87 (95% CI 0.83–0.92). This shows that the CovidGel Test is a promising mass screening tool for predicting COVID-19.


2021 ◽  
Author(s):  
Jiacheng Shi ◽  
Xiaohuan Chen ◽  
Qiong Yang ◽  
Cai-Mei Wang ◽  
Qian Huang ◽  
...  

Abstract Currently, the most widely used screening methods for hyperuricemia (HUA) involves invasive laboratory tests, which are lacking in many rural hostipals in China. This study explores the use of non-invasive physical examinations to construct a simple prediction model for HUA. Data of 9,252 adults from July to October 2019 in the Affiliated Hospital of Guilin Medical College were collected and divided randomly into a training set (n = 6,364) and a validation set (n = 2,888) at a ratio of 7:3. In the training set, non-invasive physical examination indicators of age, gender, body mass index (BMI) and prevalence of hypertension were included for logistic regression analysis, and a nomogram model was established. The classification and regression tree (CART) algorithm of the decision tree model was used to build a classification tree model. Receiver operating characteristic (ROC) curve, calibration curve and decision curve analyses (DCA) were used to test the distinction, accuracy and clinical applicability of the two models. The results showed age, gender, BMI and prevalence of hypertension were all related to the occurrence of HUA. The area under the ROC curve (AUC) of the nomogram model was 0.806 and 0.791 in training set and validation set, respectively. The AUC of the classification tree model was 0.802 and 0.794 in the two sets, respectively, but were not statistically different. The calibration curves and DCAs of the two models performed well on accuracy and clinical practicality, which suggested these models may be suitable to predict HUA for rural setting.


2021 ◽  
Author(s):  
Li Lu Wei ◽  
Yu jian

Abstract Background Hypertension is a common chronic disease in the world, and it is also a common basic disease of cardiovascular and brain complications. Overweight and obesity are the high risk factors of hypertension. In this study, three statistical methods, classification tree model, logistic regression model and BP neural network, were used to screen the risk factors of hypertension in overweight and obese population, and the interaction of risk factors was conducted Analysis, for the early detection of hypertension, early diagnosis and treatment, reduce the risk of hypertension complications, have a certain clinical significance.Methods The classification tree model, logistic regression model and BP neural network model were used to screen the risk factors of hypertension in overweight and obese people.The specificity, sensitivity and accuracy of the three models were evaluated by receiver operating characteristic curve (ROC). Finally, the classification tree CRT model was used to screen the related risk factors of overweight and obesity hypertension, and the non conditional logistic regression multiplication model was used to quantitatively analyze the interaction.Results The Youden index of ROC curve of classification tree model, logistic regression model and BP neural network model were 39.20%,37.02% ,34.85%, the sensitivity was 61.63%, 76.59%, 82.85%, the specificity was 77.58%, 60.44%, 52.00%, and the area under curve (AUC) was 0.721, 0.734,0.733, respectively. There was no significant difference in AUC between the three models (P>0.05). Classification tree CRT model and logistic regression multiplication model suggested that the interaction between NAFLD and FPG was closely related to the prevalence of overweight and obese hypertension.Conclusion NAFLD,FPG,age,TG,UA, LDL-C were the risk factors of hypertension in overweight and obese people. The interaction between NAFLD and FPG increased the risk of hypertension.


Author(s):  
Rui Fu ◽  
Nicholas Mitsakakis ◽  
Michael Chaiton

Aim: Popularity of electronic cigarettes (i.e. e-cigarettes) is soaring in Canada. Understanding person-level correlates of current e-cigarette use (vaping) is crucial to guide tobacco policy, but prior studies have not fully identified these correlates due to model overfitting caused by multicollinearity. This study addressed this issue by using classification tree, a machine learning algorithm. Methods: This population-based cross-sectional study used the Canadian Tobacco, Alcohol, and Drugs Survey (CTADS) from 2017 that targeted residents aged 15 or older. Forty-six person-level characteristics were first screened in a logistic mixed-effects regression procedure for their strength in predicting vaper type (current vs. former vaper) among people who reported to have ever vaped. A 9:1 ratio was used to randomly split the data into a training set and a validation set. A classification tree model was developed using the cross-validation method on the training set using the selected predictors and assessed on the validation set using sensitivity, specificity and accuracy. Results: Of the 3,059 people with an experience of vaping, the average age was 24.4 years (standard deviation = 11.0), with 41.9% of them being female and 8.5% of them being aboriginal. There were 556 (18.2%) current vapers. The classification tree model performed relatively well and suggested attraction to e-cigarette flavors was the most important correlate of current vaping, followed by young age (< 18) and believing vaping to be less harmful to oneself than cigarette smoking. Conclusions: People who vape due to flavors are associated with very high risk of becoming current vapers. The findings of this study provide evidence that supports the ongoing ban on flavored vaping products in the US and suggests a similar regulatory intervention may be effective in Canada.


Author(s):  
Elena Ballante ◽  
Marta Galvani ◽  
Pierpaolo Uberti ◽  
Silvia Figini

AbstractIn this paper, a new approach in classification models, called Polarized Classification Tree model, is introduced. From a methodological perspective, a new index of polarization to measure the goodness of splits in the growth of a classification tree is proposed. The new introduced measure tackles weaknesses of the classical ones used in classification trees (Gini and Information Gain), because it does not only measure the impurity but it also reflects the distribution of each covariate in the node, i.e., employing more discriminating covariates to split the data at each node. From a computational prospective, a new algorithm is proposed and implemented employing the new proposed measure in the growth of a tree. In order to show how our proposal works, a simulation exercise has been carried out. The results obtained in the simulation framework suggest that our proposal significantly outperforms impurity measures commonly adopted in classification tree modeling. Moreover, the empirical evidence on real data shows that Polarized Classification Tree models are competitive and sometimes better with respect to classical classification tree models.


Sign in / Sign up

Export Citation Format

Share Document