classification quality
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 29)

H-INDEX

7
(FIVE YEARS 2)

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1682
Author(s):  
Wojciech Wieczorek ◽  
Jan Kozak ◽  
Łukasz Strąk ◽  
Arkadiusz Nowakowski

A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality.


2021 ◽  
Author(s):  
Raúl Aceñero Eixarch ◽  
Raúl Díaz-Usechi Laplaza ◽  
Rafael Berlanga

In this paper, we propose a method for building alternative training datasets for lung nodule detection from plain chest X-ray images. Our aim is to improve the classification quality of a state-of-the-art CNN by just selecting appropriate samples from the existing datasets. The hypothesis of this research is that high quality models need to learn by contrasting very clean images with those containing nodules, specially those difficult to identify by non-expert clinicians. Current chest X-ray datasets mostly include images where more than one pathology exist and/or contain devices like catheters. This is because most samples come from old people which are the usual patients subject to X-ray examinations. In this paper, we evaluate several combinations of samples from existing datasets in the literature. Results show a great gain in performance for some of the evaluated combinations, confirming our hypothesis. The achieved performance of these models allows a considerable speed-up in the screening of patients by radiologist.


2021 ◽  
Vol 26 (1) ◽  
pp. 1-21
Author(s):  
Sebastian Schlag ◽  
Matthias Schmitt ◽  
Christian Schulz

The time complexity of support vector machines (SVMs) prohibits training on huge datasets with millions of data points. Recently, multilevel approaches to train SVMs have been developed to allow for time-efficient training on huge datasets. While regular SVMs perform the entire training in one—time-consuming—optimization step, multilevel SVMs first build a hierarchy of problems decreasing in size that resemble the original problem and then train an SVM model for each hierarchy level, benefiting from the solved models of previous levels. We present a faster multilevel support vector machine that uses a label propagation algorithm to construct the problem hierarchy. Extensive experiments indicate that our approach is up to orders of magnitude faster than the previous fastest algorithm while having comparable classification quality. For example, already one of our sequential solvers is on average a factor 15 faster than the parallel ThunderSVM algorithm, while having similar classification quality. 1


2021 ◽  
Vol 21 (2) ◽  
pp. 3-9
Author(s):  
Nguyen Long Giang ◽  
Demetrovics Janos ◽  
Vu Duc Thi ◽  
Phan Dang Khoa

Abstract Reduct of decision systems is the topic that has been attracting the interest of many researchers in data mining and machine learning for more than two decades. So far, many algorithms for finding reduct of decision systems by rough set theory have been proposed. However, most of the proposed algorithms are heuristic algorithms that find one reduct with the best classification quality. The complete study of properties of reduct of decision systems is limited. In this paper, we discover equivalence properties of reduct of consistent decision systems related to a Sperner-system. As the result, the study of the family of reducts in a consistent decision system is the study of Sperner-systems.


Author(s):  
Gleb Danilov ◽  
Timur Ishankulov ◽  
Konstantin Kotik ◽  
Yuriy Orlov ◽  
Mikhail Shifrin ◽  
...  

Automated text classification is a natural language processing (NLP) technology that could significantly facilitate scientific literature selection. A specific topical dataset of 630 article abstracts was obtained from the PubMed database. We proposed 27 parametrized options of PubMedBERT model and 4 ensemble models to solve a binary classification task on that dataset. Three hundred tests with resamples were performed in each classification approach. The best PubMedBERT model demonstrated F1-score = 0.857 while the best ensemble model reached F1-score = 0.853. We concluded that the short scientific texts classification quality might be improved using the latest state-of-art approaches.


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 127
Author(s):  
Vladimir Stanovov ◽  
Shakhnaz Akhmedova ◽  
Eugene Semenkin

In this paper, a novel search operation is proposed for the neuroevolution of augmented topologies, namely the difference-based mutation. This operator uses the differences between individuals in the population to perform more efficient search for optimal weights and structure of the model. The difference is determined according to the innovation numbers assigned to each node and connection, allowing tracking the changes. The implemented neuroevolution algorithm allows backward connections and loops in the topology, and uses a set of mutation operators, including connections merging and deletion. The algorithm is tested on a set of classification problems and the rotary inverted pendulum control problem. The comparison is performed between the basic approach and modified versions. The sensitivity to parameter values is examined. The experimental results prove that the newly developed operator delivers significant improvements to the classification quality in several cases, and allow finding better control algorithms.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 615
Author(s):  
Liliya A. Demidova

The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem.


Author(s):  
E.E. Smirnov ◽  
A.A. Pozdniakov ◽  
M.S. Parshin

Currently, one of the topical issues arising in the functioning of radar stations for various purposes is the issue of tracking complex targets, namely the case of crossing the trajectories of several observation objects. When intersecting trajectories of objects, there is uncertainty in the presence of numerous elevations caused by reflections from a plurality of reflecting surfaces or areas of space, which leads to entanglement of trajectories, that is, the detected object is accompanied by a radar along the trajectory of another object. It is also possible to trace the second object along the trajectory of the first. This case is a special difficulty, as it leads to maintenance disruptions, loss of objects and their omission. At the same time, at the classification stage, an object can be assigned to a class to which it does not belong. Therefore, how to achieve a reliable classification of objects requires the development of methods for assessing its performance. To do this, a scientific and methodological apparatus for checking the quality of radar operation was developed (in which only trajectory information was analyzed at the first stage, and joint analysis of trajectory and polarization information was carried out at the second stage), which is a simulation model implemented in the software environment MathCad 15.0. The simulation results show that with an increase in the number of tracked objects and a decrease in the distance between them, the value of the classification quality indicator decreases. This indicates a contradiction between existing processing methods and classification quality requirements and indicates the need to develop new methods that provide a given quality indicator. A possible tool to resolve the contradiction may be the use of polarization information in order to ensure the required probability of correct classification of objects, namely, when identifying elevations and extrapolating trajectories at the stage of tracking objects of observation. In order to solve the problem, the initial data for the model of classification of objects were polarization scattering matrices, on the basis of which polarization parameters were calculated and object features were formed. The results of the simulation show that the use of polarization information when tracking a large number of objects (from 10 trajectories and their intersection) provides the required level of classification quality for existing algorithms. The increase in the probability of correct classification ranged from 8% (at the edges of the radar viewing area) to 12% (in the center of the directional pattern).


2021 ◽  
Vol 111 (07-08) ◽  
pp. 475-480
Author(s):  
Tobias Schlagenhauf ◽  
Nicholas Ammann ◽  
Jürgen Fleischer

Die industrielle Zustandsüberwachung mithilfe von Techniken des Maschinellen Lernens (ML) wird für die Wettbewerbsfähigkeit von Herstellern immer wichtiger [1]. In diesem Beitrag wird eine Methode vorgestellt, ML-Modelle zur präventiven Verschleißerkennung von Kugelgewindetrieben auf Umgebungsveränderungen im Betrieb (online) nachzutrainieren. Damit lässt sich Domänenwissen graduell im Modell implementieren, um die Klassifikationsgüte auch für neuartige Verschleißmuster stabil zu halten.   Industrial condition monitoring using machine learning (ML) techniques is becoming increasingly important for manufacturers‘ competitiveness [1]. This paper presents a method to retrain ML models for preventive wear detection of ball screw drives in Process (online) to environmental changes and thus gradually implement domain knowledge in the model to keep the classification quality stable even for novel wear patterns.


Sign in / Sign up

Export Citation Format

Share Document