scholarly journals THE METHOD OF BOUNDED CONSTRUCTIONS OF LOGICAL CLASSIFICATION TREES IN THE PROBLEM OF DISCRETE OBJECTS CLASSIFICATION

2021 ◽  
Vol 3 (1) ◽  
pp. 22-29
Author(s):  
I. F. Povkhan ◽  

The problem of constructing a model of logical classification trees based on a limited method of selecting elementary features for geological data arrays is considered. A method for approximating an array of real data with a set of elementary features with a fixed criterion for stopping the branching procedure at the stage of constructing a classification tree is proposed. This approach allows to ensure the necessary accuracy of the model, reduce its structural complexity, and achieve the necessary performance indicators. A limited method for constructing classification trees has been developed, which is aimed at completing only those paths (tiers) of the classification tree structure where there are the greatest number of errors (of all types) of classification. This approach to synthesizing the recognition model makes it possible to effectively regulate the complexity (accuracy) of the classification tree model that is being built, and it is advisable to use it in situations with restrictions on the hardware resources of the information system, restrictions on the accuracy and structural complexity of the model, restrictions on the structure, sequence and depth of recognition of the training sample data array. The limited scheme of synthesis of classification trees allows to build models almost 20 % faster. The constructed logical classification tree will accurately classify (recognize) the entire training sample that the model is based on, will have a minimal structure (structural complexity), and will consist of components – sets of elementary features as design vertices, tree attributes. Based on the proposed modification of the elementary feature selection method, software has been developed that allows working with a set of different types of applied problems. An approach to synthesizing new recognition models based on a limited logic tree scheme and selecting pre-pruning parameters is proposed. In other words, an effective scheme for recognizing discrete objects has been developed based on step-by-step evaluation and selection of sets of attributes (generalized features) based on selected paths in the classification tree structure at each stage of scheme synthesis.

Author(s):  
V. Dudnyk ◽  
O. Grishchyn ◽  
V. Netrebko ◽  
R. Prus ◽  
M. Voloshcuk

An effective mechanism for the synthesis of classification trees based on fixed initial information (in the form of a training sample) for the task of recognizing the technical condition of samples of weapons and military equipment. The constructed algorithmic classification tree (model) will unmistakably classify (recognize) the entire training sample (situational objects) according to which the classification scheme is constructed. And have a minimal structure (structural complexity) and consist of components (modules) - autonomous algorithms for classification and recognition as vertices of the structure (attributes of the tree). The developed method of building models of algorithm trees (classification schemes) allows you to work with training samples of a large amount of different types of information (discrete type). Provides high accuracy, speed and economy of hardware resources in the process of generating the final classification scheme, build classification trees (models) with a predetermined accuracy. The approach of synthesis of new algorithms of recognition (classification) on the basis of library (set) of already known algorithms (schemes) and methods is offered. Based on the proposed concept of algorithmic classification trees, a set of models was built, which provided effective classification and prediction of the technical condition of samples. The paper proposes a set of general indicators (parameters), which allows to effectively present the general characteristics of the classification tree model, it is possible to use it to select the most optimal tree of algorithms from a set based on methods of random classification trees. Practical tests have confirmed the efficiency of mathematical software and models of algorithm trees.


Author(s):  
Elena Ballante ◽  
Marta Galvani ◽  
Pierpaolo Uberti ◽  
Silvia Figini

AbstractIn this paper, a new approach in classification models, called Polarized Classification Tree model, is introduced. From a methodological perspective, a new index of polarization to measure the goodness of splits in the growth of a classification tree is proposed. The new introduced measure tackles weaknesses of the classical ones used in classification trees (Gini and Information Gain), because it does not only measure the impurity but it also reflects the distribution of each covariate in the node, i.e., employing more discriminating covariates to split the data at each node. From a computational prospective, a new algorithm is proposed and implemented employing the new proposed measure in the growth of a tree. In order to show how our proposal works, a simulation exercise has been carried out. The results obtained in the simulation framework suggest that our proposal significantly outperforms impurity measures commonly adopted in classification tree modeling. Moreover, the empirical evidence on real data shows that Polarized Classification Tree models are competitive and sometimes better with respect to classical classification tree models.


Author(s):  
Oldřich Beneš ◽  
David Hampel

Due to expanding demand for the level of testing on one side and reduction of costs on the other side, the question how to replace expensive destructive testing of medical devices without compromising the quality of final product arising urgently. This situation is common within all highly regulated industries – in this article is addressed the problem from medical device manufacturing industry. Based on real data containing testing and validation datasets, logit model and classification tree model are estimated for establishing the relationship between result of destructive test and measurements of explored device. Results point to possibility of replacing destructive test by non-destructive one in our case.


Author(s):  
I. F. Povkhan ◽  

The paper offers an estimation of the complexity of the constructed logical tree structure for classifying an arbitrary case in the conditions of a strong class division of the initial training sample. The principal solution to this question is of a defining nature, regarding the assessment of the structural complexity of classification models (in the form of tree-like structures of LCT/ACT) of discrete objects for a wide range of applied classification and recognition problems in terms of developing promising schemes and methods for their final optimization (minimization) of post-pruning structure. The presented research is relevant not only for constructions (structures) of logical classification trees, but also allows us to extend the scheme of complexity estimation to the General case of algorithmic structures (ACT models) of classification trees (the concept of algorithm trees and trees of generalized features - TGF). Is investigated the actual question of the concept of decision trees (tree recognition) – evaluation of the maximum complexity of the General scheme of constructing a logical tree based classification procedure of stepwise selection of sets of elementary features (they can be diverse sets and combinations) that for given initial training sample (array of discrete information) builds a tree structure (classification model), from a set of elementary features (basic attributes) are estimated at each stage of the scheme of the model in this sample for the case of strong separation of classes. Modern information systems and technologies based on mathematical approaches (models) of pattern recognition (structures of logical and algorithmic classification trees) are widely used in socio-economic, environmental and other systems of primary analysis and processing of large amounts of information, and this is due to the fact that this approach allows you to eliminate a set of existing disadvantages of well-known classical methods, schemes and achieve a fundamentally new result. The research is devoted to the problems of classification tree models (decision trees), and offers an assessment of the complexity of logical tree structures (classification tree models), which consist of selected and ranked sets of elementary features (individual features and their combinations) built on the basis of the General concept of branched feature selection. This method, when forming the current vertex of the logical tree (node), provides the selection of the most informative (qualitative) elementary features from the source set. This approach allows you to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent instrumental analysis (the final decomposition of the model).


Author(s):  
Jou-An Chen ◽  
Chi-Chuan Shih ◽  
Pay-Fan Lin ◽  
Jin-Jong Chen ◽  
Kuan-Chia Lin

Abstract Health-related physical fitness has decreased with age; this is od immense concern to adolescents. School-based health intervention programs can be classified as either population-wide or high-risk approach. Although the population-wide and risk-based approaches adopt different healthcare angles, they all need to focus resources on risk evaluation. In this paper, we describe an exploratory application of cluster analysis and the tree model to collaborative evaluation of students’ health- related physical fitness from a high school sample in Taiwan (n=742). Cluster analysis show that physical fitness can be divided into relatively good, moderate and poor subgroups. There are significant differences in biochemical measurements among these three groups. For the tree model, we used 2004 school-year students as an experimental group and 2005 school-year students as a validation group. The results indicate that if sit-and-reach is shorter than 33 cm, BMI is >25.46 kg/m2, and 1600 m run/walk is >534 s, the predicted probability for the number of metabolic risk factors ≥2 is 100% and the population is 41, both results are the highest. From the risk-based healthcare viewpoint, the cluster analysis can sort out students’ physical fitness data in a short time and then narrow down the scope to recognize the subgroups. A classification tree model specifically shows the discrimination paths between the measurements of physical fitness for metabolic risk and would be helpful for self-management or proper healthcare education targeting different groups. Applying both methods to specific adolescents’ health issues could provide different angles in planning health promotion projects.


2021 ◽  
Author(s):  
Li Lu Wei ◽  
Yu jian

Abstract Background Hypertension is a common chronic disease in the world, and it is also a common basic disease of cardiovascular and brain complications. Overweight and obesity are the high risk factors of hypertension. In this study, three statistical methods, classification tree model, logistic regression model and BP neural network, were used to screen the risk factors of hypertension in overweight and obese population, and the interaction of risk factors was conducted Analysis, for the early detection of hypertension, early diagnosis and treatment, reduce the risk of hypertension complications, have a certain clinical significance.Methods The classification tree model, logistic regression model and BP neural network model were used to screen the risk factors of hypertension in overweight and obese people.The specificity, sensitivity and accuracy of the three models were evaluated by receiver operating characteristic curve (ROC). Finally, the classification tree CRT model was used to screen the related risk factors of overweight and obesity hypertension, and the non conditional logistic regression multiplication model was used to quantitatively analyze the interaction.Results The Youden index of ROC curve of classification tree model, logistic regression model and BP neural network model were 39.20%,37.02% ,34.85%, the sensitivity was 61.63%, 76.59%, 82.85%, the specificity was 77.58%, 60.44%, 52.00%, and the area under curve (AUC) was 0.721, 0.734,0.733, respectively. There was no significant difference in AUC between the three models (P>0.05). Classification tree CRT model and logistic regression multiplication model suggested that the interaction between NAFLD and FPG was closely related to the prevalence of overweight and obese hypertension.Conclusion NAFLD,FPG,age,TG,UA, LDL-C were the risk factors of hypertension in overweight and obese people. The interaction between NAFLD and FPG increased the risk of hypertension.


2021 ◽  
Vol 0 (0) ◽  
pp. 0-0
Author(s):  
Xiaonan Cui ◽  
Marjolein A. Heuvelmans ◽  
Grigory Sidorenkov ◽  
Yingru Zhao ◽  
Shuxuan Fan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document