scholarly journals Robust optimal classification trees under noisy labels

Author(s):  
Victor Blanco ◽  
Alberto Japón ◽  
Justo Puerto

AbstractIn this paper we propose a novel methodology to construct Optimal Classification Trees that takes into account that noisy labels may occur in the training sample. The motivation of this new methodology is based on the superaditive effect of combining together margin based classifiers and outlier detection techniques. Our approach rests on two main elements: (1) the splitting rules for the classification trees are designed to maximize the separation margin between classes applying the paradigm of SVM; and (2) some of the labels of the training sample are allowed to be changed during the construction of the tree trying to detect the label noise. Both features are considered and integrated together to design the resulting Optimal Classification Tree. We present a Mixed Integer Non Linear Programming formulation for the problem, suitable to be solved using any of the available off-the-shelf solvers. The model is analyzed and tested on a battery of standard datasets taken from UCI Machine Learning repository, showing the effectiveness of our approach. Our computational results show that in most cases the new methodology outperforms both in accuracy and AUC the results of the benchmarks provided by OCT and OCT-H.

Author(s):  
V. Dudnyk ◽  
O. Grishchyn ◽  
V. Netrebko ◽  
R. Prus ◽  
M. Voloshcuk

An effective mechanism for the synthesis of classification trees based on fixed initial information (in the form of a training sample) for the task of recognizing the technical condition of samples of weapons and military equipment. The constructed algorithmic classification tree (model) will unmistakably classify (recognize) the entire training sample (situational objects) according to which the classification scheme is constructed. And have a minimal structure (structural complexity) and consist of components (modules) - autonomous algorithms for classification and recognition as vertices of the structure (attributes of the tree). The developed method of building models of algorithm trees (classification schemes) allows you to work with training samples of a large amount of different types of information (discrete type). Provides high accuracy, speed and economy of hardware resources in the process of generating the final classification scheme, build classification trees (models) with a predetermined accuracy. The approach of synthesis of new algorithms of recognition (classification) on the basis of library (set) of already known algorithms (schemes) and methods is offered. Based on the proposed concept of algorithmic classification trees, a set of models was built, which provided effective classification and prediction of the technical condition of samples. The paper proposes a set of general indicators (parameters), which allows to effectively present the general characteristics of the classification tree model, it is possible to use it to select the most optimal tree of algorithms from a set based on methods of random classification trees. Practical tests have confirmed the efficiency of mathematical software and models of algorithm trees.


2020 ◽  
Vol 10 (2) ◽  
pp. 12-15
Author(s):  
Igor Povhan

The paper is dedicated to algorithms for constructing a logical tree of classification. Nowadays, there exist many algorithms for constructing logical classification trees. However, all of them, as a rule, are reduced to the construction of a single classification tree based on the data of a fixed training sample. There are very few algorithms for constructing recognition trees that are designed for large data sets. It is obvious that such sets have objective factors associated with the peculiarities of the generation of such complex structures, methods of working with them and storage. In this paper, we focus on the description of the algorithm for constructing classification trees for a large training set and show the way to the possibility of a uniform description of a fixed class of recognition trees. A simple, effective, economical method of constructing a logical classification tree of the training sample allows you to provide the necessary speed, the level of complexity of the recognition scheme, which guarantees a simple and complete recognition of discrete objects.


Author(s):  
Igor Povkhan ◽  

Urgency of the research.Currently there are several independent approaches (concepts) to solve the classification problem in the general setting, and the development of various concepts, approaches, methods, and models that cover the general issues of the theory of artificial intelligence and information systems, all of these approaches in a recognition theory have their advantages and disadvantages and form a single tool to solve applied problems of the theory of artificial intelligence. This study will focus on the current concept of decision trees (classification trees). The general problem of software (algorithmic) construction of logical recognition trees (classification) is considered. The object of this research is logical classification trees (LСT structures). The subject of the research is actual methods and algorithmic schemes for constructing logical classification trees. Target setting.The main existing methods and algorithms for working with arrays of discrete information in the construc-tion of recognition functions (classifiers) do not allow you to achieve a predetermined level of accuracy (efficiency) of the classification system and regulate their complexity in the construction process. However, this disadvantage is absent in meth-ods and schemes for building recognition systems based on the concept of logical classification trees (decision trees). That is, the coverage of the training sample the set of elementary signs in the case of LCT generates a fixed tree data structure (model LCT), which provides compression and conversion initial data TS, and therefore allows significant optimization and savings of hardware resources of the system, and is based on a single methodology – the optimal approximation test sample set of elementary features (attributes) that are included in some schema (operator) constructed in the learning process.Actual scientific researches and issues analysis. The possibility of an effective and economical software (algorithmic) scheme for constructing a logical classification tree (LCT structuremodel) based on the source arrays of training samples (arrays of discrete information) of a large sample.The research objective. Development of a simple and high-quality software method (algorithm and software system) for building models (structures) LCTfor large arrays of initial samples by synthesizing minimal forms of classification and recog-nition trees that provide an effective approximation of educational information with a set of ranked elementary features (at-tributes) is created on the basis of ascheme for branched feature selection in a wide range of applied problems.The statement of basic materials. We propose a general program scheme for constructing structures of logical classifi-cation trees, which for a given initial training sample builds a tree structure (classification model), which consists of a set of elementary features evaluated at each step of building the model for this sample. A method and ready-made software system build logic trees the main idea is to approximate the initial random sampling of the volume set of elementary features. This method provides the selection of the most informative (qualitative) elementary features from the source set when forming the current vertex of the logical tree (node). This approach allows to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent analysis.Conclusions. The developed and proposed mathematical support for constructing LCT structures (classification tree mod-els) allows it to be used for solving a wide range of practical problems of recognition and classification, and the prospectsfor further research may consist in creating a limited method of logical classification tree (LCT structures), which consists in maintaining the criterion for stopping the procedure for constructing a logical tree by the depth of the structure, optimizing its software implementations, as well as experimental studies of this method for a wider range of practicalproblems.


Author(s):  
I. F. Povkhan ◽  

The paper offers an estimation of the complexity of the constructed logical tree structure for classifying an arbitrary case in the conditions of a strong class division of the initial training sample. The principal solution to this question is of a defining nature, regarding the assessment of the structural complexity of classification models (in the form of tree-like structures of LCT/ACT) of discrete objects for a wide range of applied classification and recognition problems in terms of developing promising schemes and methods for their final optimization (minimization) of post-pruning structure. The presented research is relevant not only for constructions (structures) of logical classification trees, but also allows us to extend the scheme of complexity estimation to the General case of algorithmic structures (ACT models) of classification trees (the concept of algorithm trees and trees of generalized features - TGF). Is investigated the actual question of the concept of decision trees (tree recognition) – evaluation of the maximum complexity of the General scheme of constructing a logical tree based classification procedure of stepwise selection of sets of elementary features (they can be diverse sets and combinations) that for given initial training sample (array of discrete information) builds a tree structure (classification model), from a set of elementary features (basic attributes) are estimated at each stage of the scheme of the model in this sample for the case of strong separation of classes. Modern information systems and technologies based on mathematical approaches (models) of pattern recognition (structures of logical and algorithmic classification trees) are widely used in socio-economic, environmental and other systems of primary analysis and processing of large amounts of information, and this is due to the fact that this approach allows you to eliminate a set of existing disadvantages of well-known classical methods, schemes and achieve a fundamentally new result. The research is devoted to the problems of classification tree models (decision trees), and offers an assessment of the complexity of logical tree structures (classification tree models), which consist of selected and ranked sets of elementary features (individual features and their combinations) built on the basis of the General concept of branched feature selection. This method, when forming the current vertex of the logical tree (node), provides the selection of the most informative (qualitative) elementary features from the source set. This approach allows you to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent instrumental analysis (the final decomposition of the model).


2017 ◽  
Vol 27 (1) ◽  
pp. 125-132
Author(s):  
Milena Bogdanovic ◽  
Zoran Maksimovic ◽  
Ana Simic ◽  
Jelisavka Milosevic

In this paper, low discrepancy consecutive k-sums permutation problem is considered. A mixed integer linear programing (MILP) formulation with a moderate number of variables and constraints is proposed. The correctness proof shows that the proposed formulation is equivalent to the basic definition of low discrepancy consecutive k-sums permutation problem. Computational results, obtained on standard CPLEX solver, give 88 new exact values, which clearly show the usefulness of the proposed MILP formulation.


Author(s):  
I. F. Povkhan ◽  

We propose an upper estimate of the complexity of the binary logical tree synthesis procedure for classifying an arbitrary case (for conditions of weak and strong separation of classes in the training sample). The solution to this question is of a fundamental nature, regarding the assessment of the structural complexity of classification models (in the form of tree structures) of discrete objects for a wide range of applied classification and recognition problems in terms of developing promising schemes and methods for their final optimization (minimization) of the structure. This research is relevant not only for the constructions of logical classification trees, but also allows us to extend the complexity estimation scheme itself to the general case of algorithmic structures of classification trees (concepts of algorithm trees and generalized feature trees). The current issue of complexity of the general procedure for constructing a logical classification tree based on the concept of step-by-step selection of sets of elementary features (their possible heterogeneous sets and combinations), which for a given initial training sample (an array of discrete information) builds a tree structure (classification model), from a set of elementary features (basic attributes) evaluated at each stage of the model construction scheme for this sample. Thus, modern information technologies based on mathematical models of pattern recognition (logical and algorithmic classification trees) are widely used in socio-economic, environmental and other systems of primary analysis and processing of large amounts of information. This is due to the fact that this approach allows you to eliminate a set of existing disadvantages of well-known classical methods and schemes and achieve a fundamentally new result. The work is devoted to the problems of classification tree models (decision trees), and offers an assessment of the complexity of logical tree structures (classification tree models), which consist of selected and ranked sets of elementary features built on the basis of the General concept of branched feature selection. This method, when forming the current vertex of the logical tree (node), provides the selection of the most informative (qualitative) elementary features from the source set. This approach allows you to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent analysis.


2021 ◽  
Vol 3 (1) ◽  
pp. 22-29
Author(s):  
I. F. Povkhan ◽  

The problem of constructing a model of logical classification trees based on a limited method of selecting elementary features for geological data arrays is considered. A method for approximating an array of real data with a set of elementary features with a fixed criterion for stopping the branching procedure at the stage of constructing a classification tree is proposed. This approach allows to ensure the necessary accuracy of the model, reduce its structural complexity, and achieve the necessary performance indicators. A limited method for constructing classification trees has been developed, which is aimed at completing only those paths (tiers) of the classification tree structure where there are the greatest number of errors (of all types) of classification. This approach to synthesizing the recognition model makes it possible to effectively regulate the complexity (accuracy) of the classification tree model that is being built, and it is advisable to use it in situations with restrictions on the hardware resources of the information system, restrictions on the accuracy and structural complexity of the model, restrictions on the structure, sequence and depth of recognition of the training sample data array. The limited scheme of synthesis of classification trees allows to build models almost 20 % faster. The constructed logical classification tree will accurately classify (recognize) the entire training sample that the model is based on, will have a minimal structure (structural complexity), and will consist of components – sets of elementary features as design vertices, tree attributes. Based on the proposed modification of the elementary feature selection method, software has been developed that allows working with a set of different types of applied problems. An approach to synthesizing new recognition models based on a limited logic tree scheme and selecting pre-pruning parameters is proposed. In other words, an effective scheme for recognizing discrete objects has been developed based on step-by-step evaluation and selection of sets of attributes (generalized features) based on selected paths in the classification tree structure at each stage of scheme synthesis.


Sign in / Sign up

Export Citation Format

Share Document