base classifier
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 23)

H-INDEX

6
(FIVE YEARS 3)

Author(s):  
Yange Sun ◽  
Han Shao ◽  
Bencai Zhang

Ensemble classification is an actively researched paradigm that has received much attention due to increasing real-world applications. The crucial issue of ensemble learning is to construct a pool of base classifiers with accuracy and diversity. In this paper, unlike conventional data-streams oriented ensemble methods, we propose a novel Measure via both Accuracy and Diversity (MAD) instead of one of them to supervise ensemble learning. Based on MAD, a novel online ensemble method called Accuracy and Diversity weighted Ensemble (ADE) effectively handles concept drift in data streams. ADE mainly uses the following three steps to construct a concept-drift oriented ensemble: for the current data window, 1) a new base classifier is constructed based on the current concept when drift detect, 2) MAD is used to measure the performance of ensemble members, and 3) a newly built classifier replaces the worst base classifier. If the newly constructed classifier is the worst one, the replacement has not occurred. Comparing with the state-of-art algorithms, ADE exceeds the current best-related algorithm by 2.38% in average classification accuracy. Experimental results show that the proposed method can effectively adapt to different types of drifts.


2021 ◽  
Vol 14 (12) ◽  
pp. 612
Author(s):  
Jianan Zhu ◽  
Yang Feng

We propose a new ensemble classification algorithm, named super random subspace ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the random subspace ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show that the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated data sets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.


Chemosensors ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 353
Author(s):  
Xiaorui Dong ◽  
Shijing Han ◽  
Ancheng Wang ◽  
Kai Shang

The sensor drift problem is objective and inevitable, and drift compensation has essential research significance. For long-term drift, we propose a data preprocessing method, which is different from conventional research methods, and a machine learning framework that supports online self-training and data analysis without additional sensor production costs. The data preprocessing method proposed can effectively solve the problems of sign error, decimal point error, and outliers in data samples. The framework, which we call inertial machine learning, takes advantage of the recent inertia of high classification accuracy to extend the reliability of sensors. We establish a reasonable memory and forgetting mechanism for the framework, and the choice of base classifier is not limited. In this paper, we use a support vector machine as the base classifier and use the gas sensor array drift dataset in the UCI machine learning repository for experiments. By analyzing the experimental results, the classification accuracy is greatly improved, the effective time of the sensor array is extended by 4–10 months, and the time of single response and model adjustment is less than 300 ms, which is well in line with the actual application scenarios. The research ideas and results in this paper have a certain reference value for the research in related fields.


Author(s):  
Jianan Zhu ◽  
Yang Feng

We propose a new ensemble classification algorithm, named Super Random Subspace Ensemble (Super RaSE), to tackle the sparse classification problem. The proposed algorithm is motivated by the Random Subspace Ensemble algorithm (RaSE). The RaSE method was shown to be a flexible framework that can be coupled with any existing base classification. However, the success of RaSE largely depends on the proper choice of the base classifier, which is unfortunately unknown to us. In this work, we show that Super RaSE avoids the need to choose a base classifier by randomly sampling a collection of classifiers together with the subspace. As a result, Super RaSE is more flexible and robust than RaSE. In addition to the vanilla Super RaSE, we also develop the iterative Super RaSE, which adaptively changes the base classifier distribution as well as the subspace distribution. We show the Super RaSE algorithm and its iterative version perform competitively for a wide range of simulated datasets and two real data examples. The new Super RaSE algorithm and its iterative version are implemented in a new version of the R package RaSEn.


Algorithms ◽  
2021 ◽  
Vol 14 (9) ◽  
pp. 260
Author(s):  
Naomi Simumba ◽  
Suguru Okami ◽  
Akira Kodaka ◽  
Naohiko Kohtake

Feature selection is crucial to the credit-scoring process, allowing for the removal of irrelevant variables with low predictive power. Conventional credit-scoring techniques treat this as a separate process wherein features are selected based on improving a single statistical measure, such as accuracy; however, recent research has focused on meaningful business parameters such as profit. More than one factor may be important to the selection process, making multi-objective optimization methods a necessity. However, the comparative performance of multi-objective methods has been known to vary depending on the test problem and specific implementation. This research employed a recent hybrid non-dominated sorting binary Grasshopper Optimization Algorithm and compared its performance on multi-objective feature selection for credit scoring to that of two popular benchmark algorithms in this space. Further comparison is made to determine the impact of changing the profit-maximizing base classifiers on algorithm performance. Experiments demonstrate that, of the base classifiers used, the neural network classifier improved the profit-based measure and minimized the mean number of features in the population the most. Additionally, the NSBGOA algorithm gave relatively smaller hypervolumes and increased computational time across all base classifiers, while giving the highest mean objective values for the solutions. It is clear that the base classifier has a significant impact on the results of multi-objective optimization. Therefore, careful consideration should be made of the base classifier to use in the scenarios.


Author(s):  
Nofriani ◽  
Novianto Budi Kurniawan

One fashion to report a country’s economic state is by compiling economic phenomena from several sources. The collected data may be explored based on their sentiments and economic categories. This research attempted to perform and analyze multiple approaches to multi-label text classification in addition to providing sentiment analysis on the economic phenomena. The sentiment and single-label category classification was performed utilizing the logistic regression model. Meanwhile, the multi-label category classification was fulfilled using a combination of logistic regression, support vector machines, k-nearest neighbor, naïve Bayes, and decision trees as base classifiers, with binary relevance, classifier chain, and label power set as the implementation approaches. The results showed that logistic regression works well in sentiment and single-label classification, with a classification accuracy of 80.08% and 92.71%, respectively. However, it was also discovered that it works poorly as a base classifier in multi-label classification, indicated by the classification accuracy dropping to 13.35%, 15.40%, and 30.65% for binary relevance, classifier chain, and label power set, respectively. Alternatively, naïve Bayes works best as a base classifier in the label power set approach for multi-label classification, with a classification accuracy of 63.22%, followed by decision trees and support vector machines.


Author(s):  
Scott Wares ◽  
John Isaacs ◽  
Eyad Elyan

Concept drift detection algorithms have historically been faithful to the aged architecture of forcefully resetting the base classifiers for each detected drift. This approach prevents underlying classifiers becoming outdated as the distribution of a data stream shifts from one concept to another. In situations where both concept drift and temporal dependence are present within a data stream, forced resetting can cause complications in classifier evaluation. Resetting the base classifier too frequently when temporal dependence is present can cause classifier performance to appear successful, when in fact this is misleading. In this research, a novel architectural method for determining base classifier resets, Burst Detection-based Selective Classifier Resetting (BD-SCR), is presented. BD-SCR statistically monitors changes in the temporal dependence of a data stream to determine if a base classifier should be reset for detected drifts. The experimental process compares the predictive performance of state-of-the-art drift detectors in comparison to the “No-Change” detector using BD-SCR to inform and control the resetting decision. Results show that BD-SCR effectively reduces the negative impact of temporal dependence during concept drift detection through a clear negation in the performance of the “No-Change” detector, but is capable of maintaining the predictive performance of state-of-the-art drift detection methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Gang Li ◽  
Mengdi Shen ◽  
Meixuan Li ◽  
Jingyi Cheng

Assessing the default of customers is an essential basis for personal credit issuance. This paper considers developing a personal credit default discrimination model based on Super Learner heterogeneous ensemble to improve the accuracy and robustness of default discrimination. First, we select six kinds of single classifiers such as logistic regression, SVM, and three kinds of homogeneous ensemble classifiers such as random forest to build a base classifier candidate library for Super Learner. Then, we use the ten-fold cross-validation method to exercise the base classifier to improve the base classifier’s robustness. We compute the base classifier’s total loss using the difference between the predicted and actual values and establish a base classifier-weighted optimization model to solve for the optimal weight of the base classifier, which minimizes the weighted total loss of all base classifiers. Thus, we obtain the heterogeneous ensembled Super Learner classifier. Finally, we use three real credit datasets in the UCI database regarding Australia, Japanese, and German and the large credit dataset GMSC published by Kaggle platform to test the ensembled Super Learner model’s effectiveness. We also employ four commonly used evaluation indicators, the accuracy rate, type I error rate, type II error rate, and AUC. Compared with the base classifier’s classification results and heterogeneous models such as Stacking and Bstacking, the results show that the ensembled Super Learner model has higher discrimination accuracy and robustness.


Sign in / Sign up

Export Citation Format

Share Document