classification quality Latest Research Papers

Minimum Query Set for Decision Tree Construction

Entropy ◽

10.3390/e23121682 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1682

Author(s):

Wojciech Wieczorek ◽

Jan Kozak ◽

Łukasz Strąk ◽

Arkadiusz Nowakowski

Keyword(s):

Genetic Algorithm ◽

Decision Tree ◽

Programming Model ◽

Building Blocks ◽

Optimal Decision ◽

Second Stage ◽

Tree Construction ◽

Series Of Experiments ◽

Definition Of ◽

Classification Quality

A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality.

Designing Chest X-ray Datasets for Improving Lung Nodules Detection Through Convolutional Neural Networks

10.3233/faia210153 ◽

2021 ◽

Author(s):

Raúl Aceñero Eixarch ◽

Raúl Díaz-Usechi Laplaza ◽

Rafael Berlanga

Keyword(s):

Lung Nodule ◽

High Quality ◽

Quality Models ◽

X Ray ◽

Lung Nodule Detection ◽

Nodule Detection ◽

Speed Up ◽

Chest X Ray ◽

Classification Quality

In this paper, we propose a method for building alternative training datasets for lung nodule detection from plain chest X-ray images. Our aim is to improve the classification quality of a state-of-the-art CNN by just selecting appropriate samples from the existing datasets. The hypothesis of this research is that high quality models need to learn by contrasting very clean images with those containing nodules, specially those difficult to identify by non-expert clinicians. Current chest X-ray datasets mostly include images where more than one pathology exist and/or contain devices like catheters. This is because most samples come from old people which are the usual patients subject to X-ray examinations. In this paper, we evaluate several combinations of samples from existing datasets in the literature. Results show a great gain in performance for some of the evaluated combinations, confirming our hypothesis. The achieved performance of these models allows a considerable speed-up in the screening of patients by radiologist.

Faster Support Vector Machines

Journal of Experimental Algorithmics ◽

10.1145/3484730 ◽

2021 ◽

Vol 26 (1) ◽

pp. 1-21

Author(s):

Sebastian Schlag ◽

Matthias Schmitt ◽

Christian Schulz

Keyword(s):

Support Vector Machines ◽

Time Complexity ◽

Original Problem ◽

Label Propagation ◽

Support Vector ◽

Propagation Algorithm ◽

Svm Model ◽

Vector Machines ◽

Data Points ◽

Classification Quality

The time complexity of support vector machines (SVMs) prohibits training on huge datasets with millions of data points. Recently, multilevel approaches to train SVMs have been developed to allow for time-efficient training on huge datasets. While regular SVMs perform the entire training in one—time-consuming—optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level, benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments indicate that our approach is up to orders of magnitude faster than the previous fastest algorithm while having comparable classification quality. For example, already one of our sequential solvers is on average a factor 15 faster than the parallel ThunderSVM algorithm, while having similar classification quality. 1

Comparative Analysis of Normalizing Techniques Based on the Use of Classification Quality Criteria

Lecture Notes in Computational Intelligence and Decision Making - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-82014-5_41 ◽

2021 ◽

pp. 602-612

Author(s):

Oleksandr Mishkov ◽

Kostiantyn Zorin ◽

Denys Kovtoniuk ◽

Vladyslav Dereko ◽

Igor Morgun

Keyword(s):

Comparative Analysis ◽

Quality Criteria ◽

Classification Quality

Some Properties Related to Reduct of Consistent Decision Systems

Cybernetics and Information Technologies ◽

10.2478/cait-2021-0015 ◽

2021 ◽

Vol 21 (2) ◽

pp. 3-9

Author(s):

Nguyen Long Giang ◽

Demetrovics Janos ◽

Vu Duc Thi ◽

Phan Dang Khoa

Keyword(s):

Machine Learning ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Heuristic Algorithms ◽

Decision Systems ◽

The Family ◽

Complete Study ◽

Sperner System ◽

Classification Quality

Abstract Reduct of decision systems is the topic that has been attracting the interest of many researchers in data mining and machine learning for more than two decades. So far, many algorithms for finding reduct of decision systems by rough set theory have been proposed. However, most of the proposed algorithms are heuristic algorithms that find one reduct with the best classification quality. The complete study of properties of reduct of decision systems is limited. In this paper, we discover equivalence properties of reduct of consistent decision systems related to a Sperner-system. As the result, the study of the family of reducts in a consistent decision system is the study of Sperner-systems.

The Classification of Short Scientific Texts Using Pretrained BERT Model

Studies in Health Technology and Informatics - Public Health and Informatics ◽

10.3233/shti210125 ◽

2021 ◽

Author(s):

Gleb Danilov ◽

Timur Ishankulov ◽

Konstantin Kotik ◽

Yuriy Orlov ◽

Mikhail Shifrin ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Binary Classification ◽

Scientific Texts ◽

Pubmed Database ◽

Automated Text Classification ◽

Literature Selection ◽

State Of Art ◽

Classification Quality

Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.

Difference-Based Mutation Operation for Neuroevolution of Augmented Topologies

Algorithms ◽

10.3390/a14050127 ◽

2021 ◽

Vol 14 (5) ◽

pp. 127

Author(s):

Vladimir Stanovov ◽

Shakhnaz Akhmedova ◽

Eugene Semenkin

Keyword(s):

Inverted Pendulum ◽

Control Algorithms ◽

Classification Problems ◽

Optimal Weights ◽

Mutation Operators ◽

Search Operation ◽

The Difference ◽

Parameter Values ◽

Rotary Inverted Pendulum ◽

Classification Quality

In this paper, a novel search operation is proposed for the neuroevolution of augmented topologies, namely the difference-based mutation. This operator uses the differences between individuals in the population to perform more efficient search for optimal weights and structure of the model. The difference is determined according to the innovation numbers assigned to each node and connection, allowing tracking the changes. The implemented neuroevolution algorithm allows backward connections and loops in the topology, and uses a set of mutation operators, including connections merging and deletion. The algorithm is tested on a set of classification problems and the rotary inverted pendulum control problem. The comparison is performed between the basic approach and modified versions. The sensitivity to parameter values is examined. The experimental results prove that the newly developed operator delivers significant improvements to the classification quality in several cases, and allow finding better control algorithms.

Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

Symmetry ◽

10.3390/sym13040615 ◽

2021 ◽

Vol 13 (4) ◽

pp. 615

Author(s):

Liliya A. Demidova

Keyword(s):

Data Classification ◽

Classification Problem ◽

Experimental Results ◽

Training Dataset ◽

Svm Classifier ◽

Two Stage ◽

Knn Classifier ◽

Svm Algorithm ◽

Class Boundary ◽

Classification Quality

The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.

Model of classification of observation objects under conditions of intersection of their motion paths based on joint analysis of trajectory and polarization information

Information-measuring and Control Systems ◽

10.18127/j20700814-202103-06 ◽

2021 ◽

Author(s):

E.E. Smirnov ◽

A.A. Pozdniakov ◽

M.S. Parshin

Keyword(s):

Quality Indicator ◽

Directional Pattern ◽

Joint Analysis ◽

Correct Classification ◽

Software Environment ◽

Reliable Classification ◽

Reflecting Surfaces ◽

Object Features ◽

Classification Quality

Currently, one of the topical issues arising in the functioning of radar stations for various purposes is the issue of tracking complex targets, namely the case of crossing the trajectories of several observation objects. When intersecting trajectories of objects, there is uncertainty in the presence of numerous elevations caused by reflections from a plurality of reflecting surfaces or areas of space, which leads to entanglement of trajectories, that is, the detected object is accompanied by a radar along the trajectory of another object. It is also possible to trace the second object along the trajectory of the first. This case is a special difficulty, as it leads to maintenance disruptions, loss of objects and their omission. At the same time, at the classification stage, an object can be assigned to a class to which it does not belong. Therefore, how to achieve a reliable classification of objects requires the development of methods for assessing its performance. To do this, a scientific and methodological apparatus for checking the quality of radar operation was developed (in which only trajectory information was analyzed at the first stage, and joint analysis of trajectory and polarization information was carried out at the second stage), which is a simulation model implemented in the software environment MathCad 15.0. The simulation results show that with an increase in the number of tracked objects and a decrease in the distance between them, the value of the classification quality indicator decreases. This indicates a contradiction between existing processing methods and classification quality requirements and indicates the need to develop new methods that provide a given quality indicator. A possible tool to resolve the contradiction may be the use of polarization information in order to ensure the required probability of correct classification of objects, namely, when identifying elevations and extrapolating trajectories at the stage of tracking objects of observation. In order to solve the problem, the initial data for the model of classification of objects were polarization scattering matrices, on the basis of which polarization parameters were calculated and object features were formed. The results of the simulation show that the use of polarization information when tracking a large number of objects (from 10 trajectories and their intersection) provides the required level of classification quality for existing algorithms. The increase in the probability of correct classification ranged from 8% (at the edges of the radar viewing area) to 12% (in the center of the directional pattern).

Online Learning für die präventive Verschleißdetektion/Online Learning for preventive wear detection – Online Retraining of Deep Learning models for unknown wear patterns

wt Werkstattstechnik online ◽

10.37544/1436-4980-2021-07-08-7 ◽

2021 ◽

Vol 111 (07-08) ◽

pp. 475-480

Author(s):

Tobias Schlagenhauf ◽

Nicholas Ammann ◽

Jürgen Fleischer

Keyword(s):

Machine Learning ◽

Online Learning ◽

Domain Knowledge ◽

Environmental Changes ◽

Industrial Condition ◽

Ball Screw ◽

Wird Eine ◽

Learning Models ◽

Ball Screw Drives ◽

Classification Quality

Die industrielle Zustandsüberwachung mithilfe von Techniken des Maschinellen Lernens (ML) wird für die Wettbewerbsfähigkeit von Herstellern immer wichtiger [1]. In diesem Beitrag wird eine Methode vorgestellt, ML-Modelle zur präventiven Verschleißerkennung von Kugelgewindetrieben auf Umgebungsveränderungen im Betrieb (online) nachzutrainieren. Damit lässt sich Domänenwissen graduell im Modell implementieren, um die Klassifikationsgüte auch für neuartige Verschleißmuster stabil zu halten.   Industrial condition monitoring using machine learning (ML) techniques is becoming increasingly important for manufacturers‘ competitiveness [1]. This paper presents a method to retrain ML models for preventive wear detection of ball screw drives in Process (online) to environmental changes and thus gradually implement domain knowledge in the model to keep the classification quality stable even for novel wear patterns.

classification quality
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Minimum Query Set for Decision Tree Construction

Designing Chest X-ray Datasets for Improving Lung Nodules Detection Through Convolutional Neural Networks

Faster Support Vector Machines

Comparative Analysis of Normalizing Techniques Based on the Use of Classification Quality Criteria

Some Properties Related to Reduct of Consistent Decision Systems

The Classification of Short Scientific Texts Using Pretrained BERT Model

Difference-Based Mutation Operation for Neuroevolution of Augmented Topologies

Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

Model of classification of observation objects under conditions of intersection of their motion paths based on joint analysis of trajectory and polarization information

Online Learning für die präventive Verschleißdetektion/Online Learning for preventive wear detection – Online Retraining of Deep Learning models for unknown wear patterns

Export Citation Format

classification qualityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Minimum Query Set for Decision Tree Construction

Designing Chest X-ray Datasets for Improving Lung Nodules Detection Through Convolutional Neural Networks

Faster Support Vector Machines

Comparative Analysis of Normalizing Techniques Based on the Use of Classification Quality Criteria

Some Properties Related to Reduct of Consistent Decision Systems

The Classification of Short Scientific Texts Using Pretrained BERT Model

Difference-Based Mutation Operation for Neuroevolution of Augmented Topologies

Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms

Model of classification of observation objects under conditions of intersection of their motion paths based on joint analysis of trajectory and polarization information

Online Learning für die präventive Verschleißdetektion/Online Learning for preventive wear detection – Online Retraining of Deep Learning models for unknown wear patterns

classification quality
Recently Published Documents