scholarly journals Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy

Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 822
Author(s):  
Dongxue Zhao ◽  
Xin Wang ◽  
Yashuang Mu ◽  
Lidong Wang

Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.

AI ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 242-262 ◽  
Author(s):  
Chongya Song ◽  
Alexander Pons ◽  
Kang Yen

In the field of machine learning, an ensemble approach is often utilized as an effective means of improving on the accuracy of multiple weak base classifiers. A concern associated with these ensemble algorithms is that they can suffer from the Curse of Conflict, where a classifier’s true prediction is negated by another classifier’s false prediction during the consensus period. Another concern of the ensemble technique is that it cannot effectively mitigate the problem of Imbalanced Classification, where an ensemble classifier usually presents a similar magnitude of bias to the same class as its imbalanced base classifiers. We proposed an improved ensemble algorithm called “Sieve” that overcomes the aforementioned shortcomings through the establishment of the novel concept of Global Consensus. The proposed Sieve ensemble approach was benchmarked against various ensemble classifiers, and was trained using different ensemble algorithms with the same base classifiers. The results demonstrate that better accuracy and stability was achieved.


Author(s):  
Sajad Emamipour ◽  
Rasoul Sali ◽  
Zahra Yousefi

This article describes how class imbalance learning has attracted great attention in recent years as many real world domain applications suffer from this problem. Imbalanced class distribution occurs when the number of training examples for one class far surpasses the training examples of the other class often the one that is of more interest. This problem may produce an important deterioration of the classifier performance, in particular with patterns belonging to the less represented classes. Toward this end, the authors developed a hybrid model to address the class imbalance learning with focus on binary class problems. This model combines benefits of the ensemble classifiers with a multi objective feature selection technique to achieve higher classification performance. The authors' model also proposes non-dominated sets of features. Then they evaluate the performance of the proposed model by comparing its results with notable algorithms for solving imbalanced data problem. Finally, the authors utilize the proposed model in medical domain of predicting life expectancy in post-operative of thoracic surgery patients.


Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 818
Author(s):  
Eustace M. Dogo ◽  
Nnamdi I. Nwulu ◽  
Bhekisipho Twala ◽  
Clinton Aigbavboa

Automatic anomaly detection monitoring plays a vital role in water utilities’ distribution systems to reduce the risk posed by unclean water to consumers. One of the major problems with anomaly detection is imbalanced datasets. Dynamic selection techniques combined with ensemble models have proven to be effective for imbalanced datasets classification tasks. In this paper, water quality anomaly detection is formulated as a classification problem in the presences of class imbalance. To tackle this problem, considering the asymmetry dataset distribution between the majority and minority classes, the performance of sixteen previously proposed single and static ensemble classification methods embedded with resampling strategies are first optimised and compared. After that, six dynamic selection techniques, namely, Modified Class Rank (Rank), Local Class Accuracy (LCA), Overall-Local Accuracy (OLA), K-Nearest Oracles Eliminate (KNORA-E), K-Nearest Oracles Union (KNORA-U) and Meta-Learning for Dynamic Ensemble Selection (META-DES) in combination with homogeneous and heterogeneous ensemble models and three SMOTE-based resampling algorithms (SMOTE, SMOTE+ENN and SMOTE+Tomek Links), and one missing data method (missForest) are proposed and evaluated. A binary real-world drinking-water quality anomaly detection dataset is utilised to evaluate the models. The experimental results obtained reveal all the models benefitting from the combined optimisation of both the classifiers and resampling methods. Considering the three performance measures (balanced accuracy, F-score and G-mean), the result also shows that the dynamic classifier selection (DCS) techniques, in particular, the missForest+SMOTE+RANK and missForest+SMOTE+OLA models based on homogeneous ensemble-bagging with decision tree as the base classifier, exhibited better performances in terms of balanced accuracy and G-mean, while the Bg+mF+SMENN+LCA model based on homogeneous ensemble-bagging with random forest has a better overall F1-measure in comparison to the other models.


Author(s):  
ANA CERNEA ◽  
JUAN. LUIS. FERNÁNDEZ-MARTÍNEZ

In this paper, we propose different ensemble learning algorithms and their application to the face recognition problem. Three types of attributes are used for image representation: statistical, spectral, and segmentation features and regional descriptors. Classification is performed by nearest neighbor using different p-norms defined in the corresponding spaces of attributes. In this approach, each attribute together with its corresponding type of the analysis (local or global) and the distance criterion (norm or cosine), define a different classifier. The classification is unsupervised since no class information is used to improve the design of the different classifiers. Three different versions of ensemble classifiers are proposed in this paper: CAV1, CAV2, and CBAG, being the main differences among them the way the image candidates that perform the consensus are selected. The main results shown in this paper are the following: 1. The statistical attributes (local histogram and percentiles) are the individual classifiers that provided the higher accuracies, followed by the spectral methods (DWT), and the regional features (texture analysis). 2. No single attribute is able to provide systematically 100% accuracy over the ORL database. 3. The accuracy and stability of the classification is increased by consensus classification (ensemble learning techniques). 4. Optimum results are obtained by reducing the number of classifiers taking into account their diversity, and by optimizing the parameters of these classifiers using a member of the Particle Swarm Optimization (PSO) family. These results are in accord with the conclusions that are presented in the literature using ensemble learning methodologies, that is, it is possible to build strong classifiers by assembling different weak (or simple) classifiers based on different and diverse image attributes. Due to these encouraging results, future research will be devoted to the use of supervised ensemble techniques in face recognition and in other important biometric problems.


2014 ◽  
Vol 123 ◽  
pp. 424-435 ◽  
Author(s):  
Chen Lin ◽  
Wenqiang Chen ◽  
Cheng Qiu ◽  
Yunfeng Wu ◽  
Sridhar Krishnan ◽  
...  

2019 ◽  
Vol 11 (16) ◽  
pp. 1933 ◽  
Author(s):  
Yangyang Li ◽  
Ruoting Xing ◽  
Licheng Jiao ◽  
Yanqiao Chen ◽  
Yingte Chai ◽  
...  

Polarimetric synthetic aperture radar (PolSAR) image classification is a recent technology with great practical value in the field of remote sensing. However, due to the time-consuming and labor-intensive data collection, there are few labeled datasets available. Furthermore, most available state-of-the-art classification methods heavily suffer from the speckle noise. To solve these problems, in this paper, a novel semi-supervised algorithm based on self-training and superpixels is proposed. First, the Pauli-RGB image is over-segmented into superpixels to obtain a large number of homogeneous areas. Then, features that can mitigate the effects of the speckle noise are obtained using spatial weighting in the same superpixel. Next, the training set is expanded iteratively utilizing a semi-supervised unlabeled sample selection strategy that elaborately makes use of spatial relations provided by superpixels. In addition, a stacked sparse auto-encoder is self-trained using the expanded training set to obtain classification results. Experiments on two typical PolSAR datasets verified its capability of suppressing the speckle noise and showed excellent classification performance with limited labeled data.


Author(s):  
Antonio Giovannetti ◽  
Gianluca Susi ◽  
Paola Casti ◽  
Arianna Mencattini ◽  
Sandra Pusil ◽  
...  

AbstractIn this paper, we present the novel Deep-MEG approach in which image-based representations of magnetoencephalography (MEG) data are combined with ensemble classifiers based on deep convolutional neural networks. For the scope of predicting the early signs of Alzheimer’s disease (AD), functional connectivity (FC) measures between the brain bio-magnetic signals originated from spatially separated brain regions are used as MEG data representations for the analysis. After stacking the FC indicators relative to different frequency bands into multiple images, a deep transfer learning model is used to extract different sets of deep features and to derive improved classification ensembles. The proposed Deep-MEG architectures were tested on a set of resting-state MEG recordings and their corresponding magnetic resonance imaging scans, from a longitudinal study involving 87 subjects. Accuracy values of 89% and 87% were obtained, respectively, for the early prediction of AD conversion in a sample of 54 mild cognitive impairment subjects and in a sample of 87 subjects, including 33 healthy controls. These results indicate that the proposed Deep-MEG approach is a powerful tool for detecting early alterations in the spectral–temporal connectivity profiles and in their spatial relationships.


2020 ◽  
Author(s):  
Aristidis G. Vrahatis ◽  
Sotiris Tasoulis ◽  
Spiros Georgakopoulos ◽  
Vassilis Plagianakos

AbstractNowadays the biomedical data are generated exponentially, creating datasets for analysis with ultra-high dimensionality and complexity. This revolution, which has been caused by recent advents in biotechnologies, has driven to big-data and data-driven computational approaches. An indicative example is the emerging single-cell RNA-sequencing (scRNA-seq) technology, which isolates and measures individual cells. Although scRNA-seq has revolutionized the biotechnology domain, such data computational analysis is a major challenge because of their ultra-high dimensionality and complexity. Following this direction, in this work we study the properties, effectiveness and generalization of the recently proposed MRPV algorithm for single cell RNA-seq data. MRPV is an ensemble classification technique utilizing multiple ultra-low dimensional Random Projected spaces. A given classifier determines the class for each sample for all independent spaces while a majority voting scheme defines their predominant class. We show that Random Projection ensembles offer a platform not only for a low computational time analysis but also for enhancing classification performance. The developed methodologies were applied to four real biomedical high dimensional data from single-cell RNA-seq studies and compared against well-known and similar classification tools. Experimental results showed that based on simplistic tools we can create a computationally fast, simple, yet effective approach for single cell RNA-seq data with ultra-high dimensionality.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Ziting Zhao ◽  
Tong Liu ◽  
Xudong Zhao

Machine learning plays an important role in computational intelligence and has been widely used in many engineering fields. Surface voids or bugholes frequently appearing on concrete surface after the casting process make the corresponding manual inspection time consuming, costly, labor intensive, and inconsistent. In order to make a better inspection of the concrete surface, automatic classification of concrete bugholes is needed. In this paper, a variable selection strategy is proposed for pursuing feature interpretability, together with an automatic ensemble classification designed for getting a better accuracy of the bughole classification. A texture feature deriving from the Gabor filter and gray-level run lengths is extracted in concrete surface images. Interpretable variables, which are also the components of the feature, are selected according to a presented cumulative voting strategy. An ensemble classifier with its base classifier automatically assigned is provided to detect whether a surface void exists in an image or not. Experimental results on 1000 image samples indicate the effectiveness of our method with a comparable prediction accuracy and model explicable.


Sign in / Sign up

Export Citation Format

Share Document