Bee for mining (B4M) – A novel rule discovery method using the Bees algorithm with quality-weight and coverage-weight

Author(s):  
Michael S Packianather ◽  
Ammar K Al-Musawi ◽  
Fatih Anayi

This paper proposes a novel tool known as Bee for Mining (B4M) for classification tasks, which enables the Bees Algorithm (BA) to discover rules automatically. In the proposed B4M, two parameters namely quality-weight and coverage-weight have been added to the BA to avoid any ambiguous situations during the prediction phase. The contributions of the proposed B4M algorithm are two-fold: the first novel contribution is in the field of swarm intelligence, using a new version of BA for automatic rule discovery, and the second novel contribution is the formulation of a weight metric based on quailty and coverage of the rules discovered from the dataset to carry out Meta-Pruning and making it suitable for any classification problem in the real world. The proposed algorithm was implemented and tested using five different datasets from University of California, at Irvine (UCI Machine Learning Repository) and was compared with other well-known classification algorithms. The results obtained using the proposed B4M show that it was capable of achieving better classification accuracy and at the same time reduce the number of rules in four out of five UCI datasets. Furthermore, the results show that it was not only effective and more robust, but also more efficient, making it at least as good as other methods such as C5.0, C4.5, Jrip and other evolutionary algorithms, and in some cases even better.

2021 ◽  
Vol 13 (4) ◽  
pp. 547
Author(s):  
Wenning Wang ◽  
Xuebin Liu ◽  
Xuanqin Mou

For both traditional classification and current popular deep learning methods, the limited sample classification problem is very challenging, and the lack of samples is an important factor affecting the classification performance. Our work includes two aspects. First, the unsupervised data augmentation for all hyperspectral samples not only improves the classification accuracy greatly with the newly added training samples, but also further improves the classification accuracy of the classifier by optimizing the augmented test samples. Second, an effective spectral structure extraction method is designed, and the effective spectral structure features have a better classification accuracy than the true spectral features.


2021 ◽  
Author(s):  
Ahmet Batuhan Polat ◽  
Ozgun Akcay ◽  
Fusun Balik Sanli

<p>Obtaining high accuracy in land cover classification is a non-trivial problem in geosciences for monitoring urban and rural areas. In this study, different classification algorithms were tested with different types of data, and besides the effects of seasonal changes on these classification algorithms and the evaluation of the data used are investigated. In addition, the effect of increasing classification training samples on classification accuracy has been revealed as a result of the study. Sentinel-1 Synthetic Aperture Radar (SAR) images and Sentinel-2 multispectral optical images were used as datasets. Object-based approach was used for the classification of various fused image combinations. The classification algorithms Support Vector Machines (SVM), Random Forest (RF) and K-Nearest Neighborhood (kNN) methods were used for this process. In addition, Normalized Difference Vegetation Index (NDVI) was examined separately to define the exact contribution to the classification accuracy.  As a result, the overall accuracies were compared by classifying the fused data generated by combining optical and SAR images. It has been determined that the increase in the number of training samples improve the classification accuracy. Moreover, it was determined that the object-based classification obtained from single SAR imagery produced the lowest classification accuracy among the used different dataset combinations in this study. In addition, it has been shown that NDVI data does not increase the accuracy of the classification in the winter season as the trees shed their leaves due to climate conditions.</p>


2014 ◽  
Vol 5 (3) ◽  
pp. 82-96 ◽  
Author(s):  
Marijana Zekić-Sušac ◽  
Sanja Pfeifer ◽  
Nataša Šarlija

Abstract Background: Large-dimensional data modelling often relies on variable reduction methods in the pre-processing and in the post-processing stage. However, such a reduction usually provides less information and yields a lower accuracy of the model. Objectives: The aim of this paper is to assess the high-dimensional classification problem of recognizing entrepreneurial intentions of students by machine learning methods. Methods/Approach: Four methods were tested: artificial neural networks, CART classification trees, support vector machines, and k-nearest neighbour on the same dataset in order to compare their efficiency in the sense of classification accuracy. The performance of each method was compared on ten subsamples in a 10-fold cross-validation procedure in order to assess computing sensitivity and specificity of each model. Results: The artificial neural network model based on multilayer perceptron yielded a higher classification rate than the models produced by other methods. The pairwise t-test showed a statistical significance between the artificial neural network and the k-nearest neighbour model, while the difference among other methods was not statistically significant. Conclusions: Tested machine learning methods are able to learn fast and achieve high classification accuracy. However, further advancement can be assured by testing a few additional methodological refinements in machine learning methods.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

Associative Classification (AC) or Class Association Rule (CAR) mining is a very efficient method for the classification problem. It can build comprehensible classification models in the form of a list of simple IF-THEN classification rules from the available data. In this paper, we present a new, and improved discrete version of the Crow Search Algorithm (CSA) called NDCSA-CAR to mine the Class Association Rules. The goal of this article is to improve the data classification accuracy and the simplicity of classifiers. The authors applied the proposed NDCSA-CAR algorithm on eleven benchmark dataset and compared its result with traditional algorithms and recent well known rule-based classification algorithms. The experimental results show that the proposed algorithm outperformed other rule-based approaches in all evaluated criteria.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 425
Author(s):  
Krzysztof Gajowniczek ◽  
Iga Grzegorczyk ◽  
Michał Gostkowski ◽  
Tomasz Ząbkowski

In this work, we present an application of the blind source separation (BSS) algorithm to reduce false arrhythmia alarms and to improve the classification accuracy of artificial neural networks (ANNs). The research was focused on a new approach for model aggregation to deal with arrhythmia types that are difficult to predict. The data for analysis consisted of five-minute-long physiological signals (ECG, BP, and PLETH) registered for patients with cardiac arrhythmias. For each patient, the arrhythmia alarm occurred at the end of the signal. The data present a classification problem of whether the alarm is a true one—requiring attention or is false—should not have been generated. It was confirmed that BSS ANNs are able to detect four arrhythmias—asystole, ventricular tachycardia, ventricular fibrillation, and tachycardia—with higher classification accuracy than the benchmarking models, including the ANN, random forest, and recursive partitioning and regression trees. The overall challenge scores were between 63.2 and 90.7.


1997 ◽  
Vol 12 (01) ◽  
pp. 1-40 ◽  
Author(s):  
LEONARD A. BRESLOW ◽  
DAVID W. AHA

Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy, and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree induction algorithms to case retrieval in case-based reasoning systems.


2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Siti Sakira Kamaruddin ◽  
Yuhanis Yusof ◽  
Husniza Husni ◽  
Mohammad Hayel Al Refai

This paper presents text classification using a modified Multi Class Association Rule Method. The method is based on Associative Classification which combines classification with association rule discovery. Although previous work proved that Associative Classification produces better classification accuracy compared to typical classifiers, the study on applying Associative Classification to solve text classification problem are limited due to the common problem of high dimensionality of text data and this will consequently results in exponential number of generated classification rules. To overcome this problem the modified Multi-Class Association Rule Method was enhanced in two stages. In stage one the frequent pattern are represented using a proposed vertical data format to reduce the text dimensionality problem and in stage two the generated rule was pruned using a proposed Partial Rule Match to reduce the number of generated rules. The proposed method was tested on a text classification problem and the result shows that it performed better than the existing method in terms of classification accuracy and number of generated rules.


2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Pelin Yıldırım ◽  
Ulaş K. Birant ◽  
Derya Birant

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approach which suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions.


2018 ◽  
Vol 11 (1) ◽  
pp. 2 ◽  
Author(s):  
Tao Zhang ◽  
Hong Tang

Detailed information about built-up areas is valuable for mapping complex urban environments. Although a large number of classification algorithms for such areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore, we launched a unique investigation to provide a full test of the Operational Land Imager (OLI) imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we proposed a method based on the European Space Agency’s (ESA) 38-m global built-up area data of 2014, OpenStreetMap, and MOD13Q1-NDVI to achieve the rapid and automatic generation of a large number of sample points. Our aim was to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape, and texture as the input features, and support vector machine (SVM), random forest (RF), and AdaBoost as the classification algorithms. In feature learning, the convolutional neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. The experimental results show the following: (1) The approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers (e.g., RF) are comparable to that of CNN. Two-dimensional CNN and the 7-neighborhood RF have the highest classification accuracies at nearly 91%; (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (e.g., PanTex (texture-derived built-up presence index) and enhanced morphological building index (EMBI)) can help improve classification accuracy. The code and experimental results are available at https://github.com/zhangtao151820/CompareMethod.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1188 ◽  
Author(s):  
Jianming Zhang ◽  
Chaoquan Lu ◽  
Jin Wang ◽  
Xiao-Guang Yue ◽  
Se-Jung Lim ◽  
...  

Many remote sensing scene classification algorithms improve their classification accuracy by additional modules, which increases the parameters and computing overhead of the model at the inference stage. In this paper, we explore how to improve the classification accuracy of the model without adding modules at the inference stage. First, we propose a network training strategy of training with multi-size images. Then, we introduce more supervision information by triplet loss and design a branch for the triplet loss. In addition, dropout is introduced between the feature extractor and the classifier to avoid over-fitting. These modules only work at the training stage and will not bring about the increase in model parameters at the inference stage. We use Resnet18 as the baseline and add the three modules to the baseline. We perform experiments on three datasets: AID, NWPU-RESISC45, and OPTIMAL. Experimental results show that our model combined with the three modules is more competitive than many existing classification algorithms. In addition, ablation experiments on OPTIMAL show that dropout, triplet loss, and training with multi-size images improve the overall accuracy of the model on the test set by 0.53%, 0.38%, and 0.7%, respectively. The combination of the three modules improves the overall accuracy of the model by 1.61%. It can be seen that the three modules can improve the classification accuracy of the model without increasing model parameters at the inference stage, and training with multi-size images brings a greater gain in accuracy than the other two modules, but the combination of the three modules will be better.


Sign in / Sign up

Export Citation Format

Share Document