Bee for mining (B4M) – A novel rule discovery method using the Bees algorithm with quality-weight and coverage-weight

This paper proposes a novel tool known as Bee for Mining (B4M) for classification tasks, which enables the Bees Algorithm (BA) to discover rules automatically. In the proposed B4M, two parameters namely quality-weight and coverage-weight have been added to the BA to avoid any ambiguous situations during the prediction phase. The contributions of the proposed B4M algorithm are two-fold: the first novel contribution is in the field of swarm intelligence, using a new version of BA for automatic rule discovery, and the second novel contribution is the formulation of a weight metric based on quailty and coverage of the rules discovered from the dataset to carry out Meta-Pruning and making it suitable for any classification problem in the real world. The proposed algorithm was implemented and tested using five different datasets from University of California, at Irvine (UCI Machine Learning Repository) and was compared with other well-known classification algorithms. The results obtained using the proposed B4M show that it was capable of achieving better classification accuracy and at the same time reduce the number of rules in four out of five UCI datasets. Furthermore, the results show that it was not only effective and more robust, but also more efficient, making it at least as good as other methods such as C5.0, C4.5, Jrip and other evolutionary algorithms, and in some cases even better.

Download Full-text

Data Augmentation and Spectral Structure Features for Limited Samples Hyperspectral Classification

Remote Sensing ◽

10.3390/rs13040547 ◽

2021 ◽

Vol 13 (4) ◽

pp. 547

Author(s):

Wenning Wang ◽

Xuebin Liu ◽

Xuanqin Mou

Keyword(s):

Classification Accuracy ◽

Data Augmentation ◽

Classification Problem ◽

Classification Performance ◽

Spectral Structure ◽

Limited Sample ◽

Sample Classification ◽

Training Samples ◽

Traditional Classification ◽

Hyperspectral Classification

For both traditional classification and current popular deep learning methods, the limited sample classification problem is very challenging, and the lack of samples is an important factor affecting the classification performance. Our work includes two aspects. First, the unsupervised data augmentation for all hyperspectral samples not only improves the classification accuracy greatly with the newly added training samples, but also further improves the classification accuracy of the classifier by optimizing the augmented test samples. Second, an effective spectral structure extraction method is designed, and the effective spectral structure features have a better classification accuracy than the true spectral features.

Download Full-text

Analysing Temporal Effects on Classification of SAR and Optical Images

10.5194/egusphere-egu21-14386 ◽

2021 ◽

Author(s):

Ahmet Batuhan Polat ◽

Ozgun Akcay ◽

Fusun Balik Sanli

Keyword(s):

Rural Areas ◽

Classification Accuracy ◽

Winter Season ◽

Support Vector ◽

Classification Algorithms ◽

Sar Images ◽

Optical Images ◽

Object Based ◽

Training Samples

<p>Obtaining high accuracy in land cover classification is a non-trivial problem in geosciences for monitoring urban and rural areas. In this study, different classification algorithms were tested with different types of data, and besides the effects of seasonal changes on these classification algorithms and the evaluation of the data used are investigated. In addition, the effect of increasing classification training samples on classification accuracy has been revealed as a result of the study. Sentinel-1 Synthetic Aperture Radar (SAR) images and Sentinel-2 multispectral optical images were used as datasets. Object-based approach was used for the classification of various fused image combinations. The classification algorithms Support Vector Machines (SVM), Random Forest (RF) and K-Nearest Neighborhood (kNN) methods were used for this process. In addition, Normalized Difference Vegetation Index (NDVI) was examined separately to define the exact contribution to the classification accuracy. &#160;As a result, the overall accuracies were compared by classifying the fused data generated by combining optical and SAR images. It has been determined that the increase in the number of training samples improve the classification accuracy. Moreover, it was determined that the object-based classification obtained from single SAR imagery produced the lowest classification accuracy among the used different dataset combinations in this study. In addition, it has been shown that NDVI data does not increase the accuracy of the classification in the winter season as the trees shed their leaves due to climate conditions.</p>

Download Full-text

A Comparison of Machine Learning Methods in a High-Dimensional Classification Problem

Business Systems Research Journal ◽

10.2478/bsrj-2014-0021 ◽

2014 ◽

Vol 5 (3) ◽

pp. 82-96 ◽

Cited By ~ 3

Author(s):

Marijana Zekić-Sušac ◽

Sanja Pfeifer ◽

Nataša Šarlija

Keyword(s):

Neural Network ◽

Machine Learning ◽

Classification Accuracy ◽

Classification Problem ◽

High Dimensional ◽

Nearest Neighbour ◽

Learning Methods ◽

Machine Learning Methods ◽

Dimensional Classification ◽

Artificial Neural

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.

Download Full-text

NEW DISCRETE CROW SEARCH ALGORITHM FOR CLASS ASSOCIATION RULE MINING

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2022010109 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Search Algorithm ◽

Classification Problem ◽

Discrete Version ◽

Classification Algorithms ◽

Associative Classification ◽

Rule Mining ◽

Rule Based ◽

Class Association Rule

Associative Classification (AC) or Class Association Rule (CAR) mining is a very efficient method for the classification problem. It can build comprehensible classification models in the form of a list of simple IF-THEN classification rules from the available data. In this paper, we present a new, and improved discrete version of the Crow Search Algorithm (CSA) called NDCSA-CAR to mine the Class Association Rules. The goal of this article is to improve the data classification accuracy and the simplicity of classifiers. The authors applied the proposed NDCSA-CAR algorithm on eleven benchmark dataset and compared its result with traditional algorithms and recent well known rule-based classification algorithms. The experimental results show that the proposed algorithm outperformed other rule-based approaches in all evaluated criteria.

Download Full-text

Blind Source Separation for the Aggregation of Machine Learning Algorithms: An Arrhythmia Classification Case

Electronics ◽

10.3390/electronics9030425 ◽

2020 ◽

Vol 9 (3) ◽

pp. 425

Author(s):

Krzysztof Gajowniczek ◽

Iga Grzegorczyk ◽

Michał Gostkowski ◽

Tomasz Ząbkowski

Keyword(s):

Blind Source Separation ◽

Cardiac Arrhythmias ◽

Classification Accuracy ◽

Recursive Partitioning ◽

Source Separation ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Physiological Signals ◽

New Approach ◽

Model Aggregation

In this work, we present an application of the blind source separation (BSS) algorithm to reduce false arrhythmia alarms and to improve the classification accuracy of artificial neural networks (ANNs). The research was focused on a new approach for model aggregation to deal with arrhythmia types that are difficult to predict. The data for analysis consisted of five-minute-long physiological signals (ECG, BP, and PLETH) registered for patients with cardiac arrhythmias. For each patient, the arrhythmia alarm occurred at the end of the signal. The data present a classification problem of whether the alarm is a true one—requiring attention or is false—should not have been generated. It was confirmed that BSS ANNs are able to detect four arrhythmias—asystole, ventricular tachycardia, ventricular fibrillation, and tachycardia—with higher classification accuracy than the benchmarking models, including the ANN, random forest, and recursive partitioning and regression trees. The overall challenge scores were between 63.2 and 90.7.

Download Full-text

Simplifying decision trees: A survey

The Knowledge Engineering Review ◽

10.1017/s0269888997000015 ◽

1997 ◽

Vol 12 (01) ◽

pp. 1-40 ◽

Cited By ~ 142

Author(s):

LEONARD A. BRESLOW ◽

DAVID W. AHA

Keyword(s):

Decision Trees ◽

Data Structures ◽

Classification Accuracy ◽

Case Based Reasoning ◽

Reasoning Systems ◽

Tree Generation ◽

Good Classification ◽

Classification Tasks ◽

Insight Into ◽

Case Based

Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy, and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree induction algorithms to case retrieval in case-based reasoning systems.

Download Full-text

TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE

Jurnal Teknologi ◽

10.11113/jt.v78.9553 ◽

2016 ◽

Vol 78 (8-2) ◽

Author(s):

Siti Sakira Kamaruddin ◽

Yuhanis Yusof ◽

Husniza Husni ◽

Mohammad Hayel Al Refai

Keyword(s):

Text Classification ◽

Association Rule ◽

Classification Accuracy ◽

Classification Problem ◽

Frequent Pattern ◽

Associative Classification ◽

Vertical Data ◽

Rule Method ◽

Class Association Rule ◽

Two Stages

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.

Download Full-text

EBOC: Ensemble-Based Ordinal Classification in Transportation

Journal of Advanced Transportation ◽

10.1155/2019/7482138 ◽

2019 ◽

Vol 2019 ◽

pp. 1-17

Author(s):

Pelin Yıldırım ◽

Ulaş K. Birant ◽

Derya Birant

Keyword(s):

Historical Data ◽

Classification Problem ◽

Classification Performance ◽

Classification Algorithms ◽

Ordinal Classification ◽

Adaboost Algorithm ◽

Education And Health ◽

Transportation Sector ◽

C4.5 Decision Tree ◽

Target Attribute

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions.

Download Full-text

A Comprehensive Evaluation of Approaches for Built-Up Area Extraction from Landsat OLI Images Using Massive Samples

Remote Sensing ◽

10.3390/rs11010002 ◽

2018 ◽

Vol 11 (1) ◽

pp. 2 ◽

Cited By ~ 11

Author(s):

Tao Zhang ◽

Hong Tang

Keyword(s):

Learning Strategies ◽

Classification Accuracy ◽

Feature Learning ◽

Automatic Generation ◽

Experimental Results ◽

Support Vector ◽

Feature Engineering ◽

Classification Algorithms ◽

Sample Points ◽

Better Than

Detailed information about built-up areas is valuable for mapping complex urban environments. Although a large number of classification algorithms for such areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore, we launched a unique investigation to provide a full test of the Operational Land Imager (OLI) imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we proposed a method based on the European Space Agency’s (ESA) 38-m global built-up area data of 2014, OpenStreetMap, and MOD13Q1-NDVI to achieve the rapid and automatic generation of a large number of sample points. Our aim was to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape, and texture as the input features, and support vector machine (SVM), random forest (RF), and AdaBoost as the classification algorithms. In feature learning, the convolutional neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. The experimental results show the following: (1) The approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers (e.g., RF) are comparable to that of CNN. Two-dimensional CNN and the 7-neighborhood RF have the highest classification accuracies at nearly 91%; (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (e.g., PanTex (texture-derived built-up presence index) and enhanced morphological building index (EMBI)) can help improve classification accuracy. The code and experimental results are available at https://github.com/zhangtao151820/CompareMethod.

Download Full-text

Training Convolutional Neural Networks with Multi-Size Images and Triplet Loss for Remote Sensing Scene Classification

Sensors ◽

10.3390/s20041188 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1188 ◽

Cited By ~ 10

Author(s):

Jianming Zhang ◽

Chaoquan Lu ◽

Jin Wang ◽

Xiao-Guang Yue ◽

Se-Jung Lim ◽

...

Keyword(s):

Remote Sensing ◽

Classification Accuracy ◽

Model Parameters ◽

Classification Algorithms ◽

Scene Classification ◽

Training Strategy ◽

Network Training ◽

Training Stage ◽

Triplet Loss ◽

And Training

Many remote sensing scene classification algorithms improve their classification accuracy by additional modules, which increases the parameters and computing overhead of the model at the inference stage. In this paper, we explore how to improve the classification accuracy of the model without adding modules at the inference stage. First, we propose a network training strategy of training with multi-size images. Then, we introduce more supervision information by triplet loss and design a branch for the triplet loss. In addition, dropout is introduced between the feature extractor and the classifier to avoid over-fitting. These modules only work at the training stage and will not bring about the increase in model parameters at the inference stage. We use Resnet18 as the baseline and add the three modules to the baseline. We perform experiments on three datasets: AID, NWPU-RESISC45, and OPTIMAL. Experimental results show that our model combined with the three modules is more competitive than many existing classification algorithms. In addition, ablation experiments on OPTIMAL show that dropout, triplet loss, and training with multi-size images improve the overall accuracy of the model on the test set by 0.53%, 0.38%, and 0.7%, respectively. The combination of the three modules improves the overall accuracy of the model by 1.61%. It can be seen that the three modules can improve the classification accuracy of the model without increasing model parameters at the inference stage, and training with multi-size images brings a greater gain in accuracy than the other two modules, but the combination of the three modules will be better.

Download Full-text