Development of a method for identification of the state of computer systems  based on bagging classifiers

The subject of the research is methods and means of identifying the state of a computer system . The purpose of the article is to improve the quality of computer system state identification by developing a method based on ensemble classifiers. Task: to investigate methods for constructing bagging classifiers based on decision trees, to configure them and develop a method for identifying the state of the computer system. Methods used: artificial intelligence methods, machine learning, ensemble methods. The following results were obtained: the use of bagging classifiers based on meta-algorithms were investigated: Pasting Ensemble, Bootstrap Ensemble, Random Subspace Ensemble, Random Patches Ensemble and Random Forest methods and their accuracy were assessed to identify the state of the computer system. The research of tuning parameters of individual decision trees was carried out and their optimal values were found, including: the maximum number of features used in the construction of the tree; the minimum number of branches when building a tree; minimum number of leaves and maximum tree depth. The optimal number of trees in the ensemble has been determined. A method for identifying the state of the computer system is proposed, which differs from the known ones by the choice of the classification meta-algorithm and the selection of the optimal parameters for its adjustment. An assessment of the accuracy of the developed method for identifying the state of a computer system is carried out. The developed method is implemented in software and investigated when solving the problem of identifying the abnormal state of the computer system functioning. Conclusions. The scientific novelty of the results obtained lies in the development of a method for identifying the state of the computer system by choosing a meta-algorithm for classification and determining the optimal parameters for its configuration.

Download Full-text

A new correlation-based approach for ensemble selection in random forests

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-10-2020-0147 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Mostafa El Habib Daho ◽

Nesma Settouti ◽

Mohammed El Amine Bechar ◽

Amina Boublenza ◽

Mohammed Amine Chikh

Keyword(s):

State Of The Art ◽

Ensemble Methods ◽

The State ◽

Ensemble Classifiers ◽

Content Type ◽

Pruning Method ◽

Ensemble Selection ◽

Small Ensemble ◽

Short Time ◽

Pruning Techniques

PurposeEnsemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.Design/methodology/approachIn this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.FindingsThe proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.Originality/valueCES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.

Download Full-text

A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421500208 ◽

2020 ◽

pp. 2150020

Author(s):

Jamshid Pirgazi ◽

Abbas Pirmohammadi ◽

Reza Shams

Keyword(s):

Imbalanced Data ◽

Ensemble Methods ◽

Data Classification ◽

Admissible Solution ◽

Optimal Number ◽

Support Vector ◽

Ensemble Classifiers ◽

Algorithm Optimization ◽

Optimal Ensemble ◽

Imbalanced Data Classification

Nowadays, imbalanced data classification is a hot topic in data mining and recently, several valuable researches have been conducted to overcome certain difficulties in the field. Moreover, those approaches, which are based on ensemble classifiers, have achieved reasonable results. Despite the success of these works, there are still many unsolved issues such as disregarding the importance of samples in balancing, determination of proper number of classifiers and optimizing weights of base classifiers in voting stage of ensemble methods. This paper intends to find an admissible solution for these challenges. The solution suggested in this paper applies the support vector data descriptor (SVDD) for sampling both minority and majority classes. After determining the optimal number of base classifiers, the selected samples are utilized to adjust base classifiers. Finally, genetic algorithm optimization is used in order to find the optimum weights of each base classifier in the voting stage. The proposed method is compared with some existing algorithms. The results of experiments confirm its effectiveness.

Download Full-text

Landslide Susceptibility Mapping Using Rotation Forest Ensemble Technique with Different Decision Trees in the Three Gorges Reservoir Area, China

Remote Sensing ◽

10.3390/rs13020238 ◽

2021 ◽

Vol 13 (2) ◽

pp. 238

Author(s):

Zhice Fang ◽

Yi Wang ◽

Gonghao Duan ◽

Ling Peng

Keyword(s):

Decision Trees ◽

Landslide Susceptibility ◽

Ensemble Methods ◽

Landslide Susceptibility Mapping ◽

Three Gorges Reservoir Area ◽

Ratio Method ◽

Susceptibility Map ◽

Rotation Forest ◽

Predictive Values ◽

Ensemble Technique

This study presents a new ensemble framework to predict landslide susceptibility by integrating decision trees (DTs) with the rotation forest (RF) ensemble technique. The proposed framework mainly includes four steps. First, training and validation sets are randomly selected according to historical landslide locations. Then, landslide conditioning factors are selected and screened by the gain ratio method. Next, several training subsets are produced from the training set and a series of trained DTs are obtained by using a DT as a base classifier couple with different training subsets. Finally, the resultant landslide susceptibility map is produced by combining all the DT classification results using the RF ensemble technique. Experimental results demonstrate that the performance of all the DTs can be effectively improved by integrating them with the RF ensemble technique. Specifically, the proposed ensemble methods achieved the predictive values of 0.012–0.121 higher than the DTs in terms of area under the curve (AUC). Furthermore, the proposed ensemble methods are better than the most popular ensemble methods with the predictive values of 0.005–0.083 in terms of AUC. Therefore, the proposed ensemble framework is effective to further improve the spatial prediction of landslides.

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

A Model of Secure Functioning of Computer Systems

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.150-156 ◽

2021 ◽

Vol 12 (3) ◽

pp. 150-156

Author(s):

A. V. Galatenko ◽

◽

V. A. Kuzovikhina ◽

Keyword(s):

Lower Bound ◽

Computer System ◽

Finite Automaton ◽

Nonnegative Integer ◽

The Other ◽

Shannon Function ◽

Security Breaches ◽

System Functioning ◽

The Cost ◽

The Given

We propose an automata model of computer system security. A system is represented by a finite automaton with states partitioned into two subsets: "secure" and "insecure". System functioning is secure if the number of consecutive insecure states is not greater than some nonnegative integer k. This definition allows one to formally reflect responsiveness to security breaches. The number of all input sequences that preserve security for the given value of k is referred to as a k-secure language. We prove that if a language is k-secure for some natural and automaton V, then it is also k-secure for any 0 < k < k and some automaton V = V (k). Reduction of the value of k is performed at the cost of amplification of the number of states. On the other hand, for any non-negative integer k there exists a k-secure language that is not k"-secure for any natural k" > k. The problem of reconstruction of a k-secure language using a conditional experiment is split into two subcases. If the cardinality of an input alphabet is bound by some constant, then the order of Shannon function of experiment complexity is the same for al k; otherwise there emerges a lower bound of the order nk.

Download Full-text

DIAGNOSIS OF MULTICLASS TACHYCARDIA BEATS USING RECURRENCE QUANTIFICATION ANALYSIS AND ENSEMBLE CLASSIFIERS

Journal of Mechanics in Medicine and Biology ◽

10.1142/s0219519416400054 ◽

2016 ◽

Vol 16 (01) ◽

pp. 1640005 ◽

Cited By ~ 30

Author(s):

USHA DESAI ◽

ROSHAN JOY MARTIS ◽

U. RAJENDRA ACHARYA ◽

C. GURUDAS NAYAK ◽

G. SESHIKALA ◽

...

Keyword(s):

Normal Sinus Rhythm ◽

Ensemble Methods ◽

Kappa Statistic ◽

Automatic Monitoring ◽

Recurrence Quantification Analysis ◽

Ecg Signal ◽

Ensemble Classifiers ◽

Clinical Tool ◽

Recurrence Quantification ◽

Quantification Analysis

Atrial Fibrillation (A-Fib), Atrial Flutter (AFL) and Ventricular Fibrillation (V-Fib) are fatal cardiac abnormalities commonly affecting people in advanced age and have indication of life-threatening condition. To detect these abnormal rhythms, Electrocardiogram (ECG) signal is most commonly visualized as a significant clinical tool. Concealed non-linearities in the ECG signal can be clearly unraveled using Recurrence Quantification Analysis (RQA) technique. In this paper, RQA features are applied for classifying four classes of ECG beats namely Normal Sinus Rhythm (NSR), A-Fib, AFL and V-Fib using ensemble classifiers. The clinically significant ([Formula: see text]) features are ranked and fed independently to three classifiers viz. Decision Tree (DT), Random Forest (RAF) and Rotation Forest (ROF) ensemble methods to select the best classifier. The training and testing of the feature set is accomplished using 10-fold cross-validation strategy. The RQA coefficients using ROF provided an overall accuracy of 98.37% against 96.29% and 94.14% for the RAF and DT, respectively. The results achieved evidently ratify the superiority of ROF ensemble classifier in the diagnosis of A-Fib, AFL and V-Fib. Precision of four classes is measured using class-specific accuracy (%) and reliability of the performance is assessed using Cohen’s kappa statistic ([Formula: see text]). The developed approach can be used in therapeutic devices and help the physicians in automatic monitoring of fatal tachycardia rhythms.

Download Full-text

Classification of Human Daily Activities Using Ensemble Methods Based on Smartphone Inertial Sensors

Sensors ◽

10.3390/s18124132 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4132 ◽

Cited By ~ 10

Author(s):

Ku Ku Abd. Rahim ◽

I. Elamvazuthi ◽

Lila Izhar ◽

Genci Capi

Keyword(s):

Activity Recognition ◽

Inertial Sensors ◽

Wearable Sensors ◽

Ensemble Methods ◽

Daily Activities ◽

Support Vector ◽

Random Subspace ◽

Ensemble Classifiers ◽

Accuracy Rate

Increasing interest in analyzing human gait using various wearable sensors, which is known as Human Activity Recognition (HAR), can be found in recent research. Sensors such as accelerometers and gyroscopes are widely used in HAR. Recently, high interest has been shown in the use of wearable sensors in numerous applications such as rehabilitation, computer games, animation, filmmaking, and biomechanics. In this paper, classification of human daily activities using Ensemble Methods based on data acquired from smartphone inertial sensors involving about 30 subjects with six different activities is discussed. The six daily activities are walking, walking upstairs, walking downstairs, sitting, standing and lying. It involved three stages of activity recognition; namely, data signal processing (filtering and segmentation), feature extraction and classification. Five types of ensemble classifiers utilized are Bagging, Adaboost, Rotation forest, Ensembles of nested dichotomies (END) and Random subspace. These ensemble classifiers employed Support vector machine (SVM) and Random forest (RF) as the base learners of the ensemble classifiers. The data classification is evaluated with the holdout and 10-fold cross-validation evaluation methods. The performance of each human daily activity was measured in terms of precision, recall, F-measure, and receiver operating characteristic (ROC) curve. In addition, the performance is also measured based on the comparison of overall accuracy rate of classification between different ensemble classifiers and base learners. It was observed that overall, SVM produced better accuracy rate with 99.22% compared to RF with 97.91% based on a random subspace ensemble classifier.

Download Full-text

Decision Trees and Ensemble Methods

Data Science and Machine Learning ◽

10.1201/9780367816971-8 ◽

2019 ◽

pp. 287-322

Author(s):

Dirk P. Kroese ◽

Zdravko I. Botev ◽

Thomas Taimre ◽

Radislav Vaisman

Keyword(s):

Decision Trees ◽

Ensemble Methods

Download Full-text

A comparison of the bagging and the boosting methods using the decision trees classifiers

Computer Science and Information Systems ◽

10.2298/csis0602057m ◽

2006 ◽

Vol 3 (2) ◽

pp. 57-72 ◽

Cited By ~ 9

Author(s):

Kristina Machova ◽

Miroslav Puszta ◽

Frantisek Barcak ◽

Peter Bednar

Keyword(s):

Decision Trees ◽

Classification Algorithm ◽

Classification Algorithms ◽

Performance Tests ◽

Binary Decision ◽

Internet Portal ◽

Minimum Number ◽

Boosting Algorithms ◽

Binary Decision Trees ◽

Tv Broadcasting

In this paper we present an improvement of the precision of classification algorithm results. Two various approaches are known: bagging and boosting. This paper describes a set of experiments with bagging and boosting methods. Our use of these methods aims at classification algorithms generating decision trees. Results of performance tests focused on the use of the bagging and boosting methods in connection with binary decision trees are presented. The minimum number of decision trees, which enables an improvement of the classification performed by the bagging and boosting methods, was found. The tests were carried out using the Reuter?s 21578 collection of documents as well as documents from an Internet portal of TV broadcasting company Mark?za. The comparison of our results on testing the bagging and boosting algorithms is presented.

Download Full-text

Optimalisasi Kapasitas Dan Peningkatan Efisiensi Biaya Pengelolaan Kelas Karyawan

INOVATOR ◽

10.32832/inovator.v7i1.1457 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1

Author(s):

Immas Nurhayati ◽

Titing Suharti

Keyword(s):

Cost Efficiency ◽

Research Method ◽

Optimal Number ◽

Capacity Optimization ◽

Point Analysis ◽

Class Management ◽

Minimum Number ◽

Break Even Point

This research would like to analysis optimizing employee class management both in capacity and cost efficiency. Capacity optimization is to determine the minimum number of students that can cover all of their class costs. The research method uses break even point analysis. The results of the study showed that the optimal number of class capacity was 21 (twenty one) students. Taking into account efficiency and fairness for all, excess teaching fees per SKS for employee classes vary from Rp. 12,500 to Rp. 50,000 with the presence of Rp. 50,000.

Download Full-text