Sieve: An Ensemble Algorithm Using Global Consensus for Binary Classification

In the field of machine learning, an ensemble approach is often utilized as an effective means of improving on the accuracy of multiple weak base classifiers. A concern associated with these ensemble algorithms is that they can suffer from the Curse of Conflict, where a classifier’s true prediction is negated by another classifier’s false prediction during the consensus period. Another concern of the ensemble technique is that it cannot effectively mitigate the problem of Imbalanced Classification, where an ensemble classifier usually presents a similar magnitude of bias to the same class as its imbalanced base classifiers. We proposed an improved ensemble algorithm called “Sieve” that overcomes the aforementioned shortcomings through the establishment of the novel concept of Global Consensus. The proposed Sieve ensemble approach was benchmarked against various ensemble classifiers, and was trained using different ensemble algorithms with the same base classifiers. The results demonstrate that better accuracy and stability was achieved.

Download Full-text

Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Entropy ◽

10.3390/e23070822 ◽

2021 ◽

Vol 23 (7) ◽

pp. 822

Author(s):

Dongxue Zhao ◽

Xin Wang ◽

Yashuang Mu ◽

Lidong Wang

Keyword(s):

Classification Performance ◽

Ensemble Classification ◽

Selection Strategy ◽

Ensemble Classifiers ◽

Imbalanced Datasets ◽

Dynamic Selection ◽

Imbalanced Classification ◽

Ensemble Techniques ◽

Imbalance Learning ◽

Ensemble Algorithms

Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.

Download Full-text

Botnet detection using ensemble classifiers of network flow

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i3.pp2543-2550 ◽

2020 ◽

Vol 10 (3) ◽

pp. 2543

Author(s):

Zahraa M. Algelal ◽

Eman Abdulaziz Ghani Aldhaher ◽

Dalia N. Abdul-Wadood ◽

Radhwan Hussein Abdulzhraa Al-Sagheer

Keyword(s):

Network Traffic ◽

Network Flow ◽

Ensemble Classifier ◽

Ensemble Classifiers ◽

Botnet Detection ◽

Click Fraud ◽

Normal Network ◽

Ensemble Algorithms ◽

Distinguishing Features ◽

Modern Technologies

Recently, Botnets have become a common tool for implementing and transferring various malicious codes over the Internet. These codes can be used to execute many malicious activities including DDOS attack, send spam, click fraud, and steal data. Therefore, it is necessary to use Modern technologies to reduce this phenomenon and avoid them in advance in order to differentiate the Botnets traffic from normal network traffic. In this work, ensemble classifier algorithms to identify such damaging botnet traffic. We experimented with different ensemble algorithms to compare and analyze their ability to classify the botnet traffic from the normal traffic by selecting distinguishing features of the network traffic. Botnet Detection offers a reliable and cheap style for ensuring transferring integrity and warning the risks before its occurrence.

Download Full-text

An Agent-Ensemble for Thresholded Multi-Target Classification

Applied Sciences ◽

10.3390/app10041376 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1376

Author(s):

Nathan H. Parrish ◽

Ashley J. Llorens ◽

Alex E. Driskell

Keyword(s):

Binary Classification ◽

Target Type ◽

Vehicle Classification ◽

Weighted Likelihood ◽

Target Classification ◽

Classification Problems ◽

Target Class ◽

Combination Strategy ◽

Individual Agent ◽

Ensemble Approach

We propose an ensemble approach for multi-target binary classification, where the target class breaks down into a disparate set of pre-defined target-types. The system goal is to maximize the probability of alerting on targets from any type while excluding background clutter. The agent-classifiers that make up the ensemble are binary classifiers trained to classify between one of the target-types vs. clutter. The agent ensemble approach offers several benefits for multi-target classification including straightforward in-situ tuning of the ensemble to drift in the target population and the ability to give an indication to a human operator of which target-type causes an alert. We propose a combination strategy that sums weighted likelihood ratios of the individual agent-classifiers, where the likelihood ratio is between the target-type for the agent vs. clutter. We show that this combination strategy is optimal under a conditionally non-discriminative assumption. We compare this combiner to the common strategy of selecting the maximum of the normalized agent-scores as the combiner score. We show experimentally that the proposed combiner gives excellent performance on the multi-target binary classification problems of pin-less verification of human faces and vehicle classification using acoustic signatures.

Download Full-text

A Novel Approach to Ensemble Classifiers: FsBoost-Based Subspace Method

Mathematical Problems in Engineering ◽

10.1155/2020/8571712 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Adeeb Noor ◽

Muhammed Kürşad Uçar ◽

Kemal Polat ◽

Abdullah Assiri ◽

Redhwan Nour

Keyword(s):

Ensemble Classifier ◽

Ensemble Classifiers ◽

Subspace Method ◽

Accuracy Rate ◽

Novel Approach

In this article, an algorithm is proposed for creating an ensemble classifier. The name of the algorithm is the F-score subspace method (FsBoost). According to this method, the features are selected with the F-score and classified with different or the same classifiers. In the next step, the ensemble classifier is created. Two versions that are named FsBoost.V1 and FsBoost.V2 have been developed based on classification by the same or different classifiers. According to the results obtained, the results are consistent with the literature. Besides, a higher accuracy rate is obtained compared with many algorithms in the literature. The algorithm is fast because it has a few steps. It is thought that the algorithm will be successful due to these advantages.

Download Full-text

Classifier Ensemble Algorithm for Data Stream with Attribute Uncertainty

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5747 ◽

2016 ◽

Vol 13 (10) ◽

pp. 7519-7525 ◽

Cited By ~ 1

Author(s):

Zhang Xing ◽

Wang MeiLi ◽

Zhang Yang ◽

Ning Jifeng

Keyword(s):

Decision Tree ◽

Data Stream ◽

High Speed ◽

Information Gain ◽

Uncertain Data ◽

Classifier Ensemble ◽

Ensemble Classifiers ◽

Decision Tree Algorithm ◽

Tree Algorithm ◽

Ensemble Algorithm

To build a classifier for uncertain data stream, an Ensemble of Uncertain Decision Tree Algorithm (EDTU) is proposed. Firstly, the decision tree algorithm for uncertain data (DTU) was improved by changing the calculation method of its information gain and improving the efficiency of the algorithm so that it can process the high-speed flow of data streams; then, based on this basic classifier, dynamic classifier ensemble algorithm was used, and the classifiers presenting effective classification were selected to constitute ensemble classifiers. Experimental results on SEA and Forest Covertype Datasets demonstrate that the proposed EDTU algorithm is efficient in classifying data stream with uncertain attribute, and the performance is stable under the different parameters.

Download Full-text

Automatic catalog of RR Lyrae from ∼14 million VVV light curves: How far can we go with traditional machine-learning?

Astronomy and Astrophysics ◽

10.1051/0004-6361/202038314 ◽

2020 ◽

Vol 642 ◽

pp. A58

Author(s):

J. B. Cabral ◽

F. Ramos ◽

S. Gurovich ◽

P. M. Granitto

Keyword(s):

Machine Learning ◽

Model Selection ◽

Broad Band ◽

Ensemble Classifier ◽

Light Curves ◽

Ensemble Classifiers ◽

Data Set ◽

Rr Lyrae ◽

Selection Step ◽

Sampling Procedures

Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.

Download Full-text

CROP TYPE MAPPING FROM A SEQUENCE OF TERRASAR-X IMAGES WITH DYNAMIC CONDITIONAL RANDOM FIELDS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-7-59-2016 ◽

2016 ◽

Vol III-7 ◽

pp. 59-66 ◽

Cited By ~ 10

Author(s):

B. K. Kenduiywo ◽

D. Bargiel ◽

U. Soergel

Keyword(s):

Random Fields ◽

Conditional Random Fields ◽

Feature Space ◽

Ensemble Classifier ◽

Ensemble Technique ◽

Single Feature ◽

The Matrix ◽

Markov Assumption ◽

Repeated Structure ◽

Crop Discrimination

Crop phenology is dynamic as it changes with times of the year. Such biophysical processes also look spectrally different to remote sensing satellites. Some crops may depict similar spectral properties if their phenology coincide, but differ later when their phenology diverge. Thus, conventional approaches that select only images from phenological stages where crops are distinguishable for classification, have low discrimination. In contrast, stacking images within a cropping season limits discrimination to a single feature space that can suffer from overlapping classes. Since crop backscatter varies with time, it can aid discrimination. Therefore, our main objective is to develop a crop sequence classification method using multitemporal TerraSAR-X images. We adopt first order markov assumption in undirected temporal graph sequence. This property is exploited to implement Dynamic Conditional Random Fields (DCRFs). Our DCRFs model has a repeated structure of temporally connected Conditional Random Fields (CRFs). Each node in the sequence is connected to its predecessor via conditional probability matrix. The matrix is computed using posterior class probabilities from association potential. This way, there is a mutual temporal exchange of phenological information observed in TerraSAR-X images. When compared to independent epoch classification, the designed DCRF model improved crop discrimination at each epoch in the sequence. However, government, insurers, agricultural market traders and other stakeholders are interested in the quantity of a certain crop in a season. Therefore, we further develop a DCRF ensemble classifier. The ensemble produces an optimal crop map by maximizing over posterior class probabilities selected from the sequence based on maximum F1-score and weighted by correctness. Our ensemble technique is compared to standard approach of stacking all images as bands for classification using Maximum Likelihood Classifier (MLC) and standard CRFs. It outperforms MLC and CRFs by 7.70% and 6.42% in overall accuracy, respectively.

Download Full-text

Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis–Taguchi System

Processes ◽

10.3390/pr9081450 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1450

Author(s):

Seul-Gi Kim ◽

Donghyun Park ◽

Jae-Yoon Jung

Keyword(s):

Fault Detection ◽

Binary Classification ◽

Rotating Machinery ◽

Industrial Robots ◽

Sensor Data ◽

Support Vector ◽

Imbalanced Classification ◽

Vibration Data ◽

Binary Classifiers ◽

One Class Classification

Today, real-time fault detection and predictive maintenance based on sensor data are actively introduced in various areas such as manufacturing, aircraft, and power system monitoring. Many faults in motors or rotating machinery like industrial robots, aircraft engines, and wind turbines can be diagnosed by analyzing signal data such as vibration and noise. In this study, to detect failures based on vibration data, preprocessing was performed using signal processing techniques such as the Hamming window and the cepstrum transform. After that, 10 statistical condition indicators were extracted to train the machine learning models. Specifically, two types of Mahalanobis distance (MD)-based one-class classification methods, the MD classifier and the Mahalanobis–Taguchi system, were evaluated in detecting the faults of rotating machinery. Their performance for fault detection on rotating machinery was evaluated with different imbalanced ratios of data by comparing with binary classification models, which included classical versions and imbalanced classification versions of support vector machine and random forest algorithms. The experimental results showed the MD-based classifiers became more effective than binary classifiers in cases in which there were much fewer defect data than normal data, which is often common in the real-world industrial field.

Download Full-text

Automated color detection in orchids using color labels and deep learning

PLoS ONE ◽

10.1371/journal.pone.0259036 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0259036

Author(s):

Diah Harnoni Apriyanti ◽

Luuk J. Spreeuwers ◽

Peter J. F. Lucas ◽

Raymond N. J. Veldhuis

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Recognition System ◽

Ensemble Classifier ◽

Ensemble Classifiers ◽

Detection Model ◽

Color Detection ◽

Multi Class Classification ◽

The One ◽

Color Scheme

The color of particular parts of a flower is often employed as one of the features to differentiate between flower types. Thus, color is also used in flower-image classification. Color labels, such as ‘green’, ‘red’, and ‘yellow’, are used by taxonomists and lay people alike to describe the color of plants. Flower image datasets usually only consist of images and do not contain flower descriptions. In this research, we have built a flower-image dataset, especially regarding orchid species, which consists of human-friendly textual descriptions of features of specific flowers, on the one hand, and digital photographs indicating how a flower looks like, on the other hand. Using this dataset, a new automated color detection model was developed. It is the first research of its kind using color labels and deep learning for color detection in flower recognition. As deep learning often excels in pattern recognition in digital images, we applied transfer learning with various amounts of unfreezing of layers with five different neural network architectures (VGG16, Inception, Resnet50, Xception, Nasnet) to determine which architecture and which scheme of transfer learning performs best. In addition, various color scheme scenarios were tested, including the use of primary and secondary color together, and, in addition, the effectiveness of dealing with multi-class classification using multi-class, combined binary, and, finally, ensemble classifiers were studied. The best overall performance was achieved by the ensemble classifier. The results show that the proposed method can detect the color of flower and labellum very well without having to perform image segmentation. The result of this study can act as a foundation for the development of an image-based plant recognition system that is able to offer an explanation of a provided classification.

Download Full-text

Creating Ensemble Classifiers with Information Entropy Diversity Measure

Security and Communication Networks ◽

10.1155/2021/9953509 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jiangbo Zou ◽

Xiaokang Fu ◽

Lingling Guo ◽

Chunhua Ju ◽

Jingjing Chen

Keyword(s):

Information Entropy ◽

Classification Accuracy ◽

Ensemble Classifier ◽

The Other ◽

Ensemble Classification ◽

Ensemble Classifiers ◽

System Cost ◽

Iterative Optimization ◽

Diversity Measure ◽

Maximum Accuracy

Ensemble classifiers improve the classification accuracy by incorporating the decisions made by its component classifiers. Basically, there are two steps to create an ensemble classifier: one is to generate base classifiers and the other is to align the base classifiers to achieve maximum accuracy integrally. One of the major problems in creating ensemble classifiers is the classification accuracy and diversity of the component classifiers. In this paper, we propose an ensemble classifier generating algorithm to improve the accuracy of an ensemble classification and to maximize the diversity of its component classifiers. In this algorithm, information entropy is introduced to measure the diversity of component classifiers, and a cyclic iterative optimization selection tactic is applied to select component classifiers from base classifiers, in which the number of component classifiers is dynamically adjusted to minimize system cost. It is demonstrated that our method has an obvious lower memory cost with higher classification accuracy compared with existing classifier methods.

Download Full-text