Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model

2020 ◽  
Vol 153 ◽  
pp. 553-560
Author(s):  
S. Ancy ◽  
D. Paulraj
2016 ◽  
Vol 78 (12-2) ◽  
Author(s):  
Abbas Jalilvand ◽  
Naomie Salim

Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, people express their opinions towards an entity based on their characteristics which may change over time. User‘s opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches did not considered the evolution of User‘s opinions. They assumed that instances are independent, identically distributed and generated from a stationary distribution, while generated from a stream distribution. They used the static classification model that builds a classifier using a training set without considering the time that reviews are posted. However, time may be very useful as an important feature for classification task. In this paper, a stream sentiment classification framework is proposed to deal with concept drift and imbalanced data distribution using ensemble learning and instance selection methods. The experimental results show the effectiveness of the proposed method in compared with static sentiment classification. 


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Cheng-Chung Li ◽  
Meng-Yun Wu ◽  
Ying-Chou Sun ◽  
Hung-Hsun Chen ◽  
Hsiu-Mei Wu ◽  
...  

AbstractThe extraction of brain tumor tissues in 3D Brain Magnetic Resonance Imaging (MRI) plays an important role in diagnosis before the gamma knife radiosurgery (GKRS). In this article, the post-contrast T1 whole-brain MRI images had been collected by Taipei Veterans General Hospital (TVGH) and stored in DICOM format (dated from 1999 to 2018). The proposed method starts with the active contour model to get the region of interest (ROI) automatically and enhance the image contrast. The segmentation models are trained by MRI images with tumors to avoid imbalanced data problem under model construction. In order to achieve this objective, a two-step ensemble approach is used to establish such diagnosis, first, classify whether there is any tumor in the image, and second, segment the intracranial metastatic tumors by ensemble neural networks based on 2D U-Net architecture. The ensemble for classification and segmentation simultaneously also improves segmentation accuracy. The result of classification achieves a F1-measure of $$75.64\%$$ 75.64 % , while the result of segmentation achieves an IoU of $$84.83\%$$ 84.83 % and a DICE score of $$86.21\%$$ 86.21 % . Significantly reduce the time for manual labeling from 30 min to 18 s per patient.


Author(s):  
Snehlata Sewakdas Dongre ◽  
Latesh G. Malik

A data stream is giant amount of data which is generated uncontrollably at a rapid rate from many applications like call detail records, log records, sensors applications etc. Data stream mining has grasped the attention of so many researchers. A rising problem in Data Streams is the handling of concept drift. To be a good algorithm it should adapt the changes and handle the concept drift properly. Ensemble classification method is the group of classifiers which works in collaborative manner. Overall this chapter will cover all the aspects of the data stream classification. The mission of this chapter is to discuss various techniques which use collaborative filtering for the data stream mining. The main concern of this chapter is to make reader familiar with the data stream domain and data stream mining. Instead of single classifier the group of classifiers is used to enhance the accuracy of classification. The collaborative filtering will play important role here how the different classifiers work collaborative within the ensemble to achieve a goal.


2020 ◽  
Vol 34 (04) ◽  
pp. 6680-6687
Author(s):  
Jian Yin ◽  
Chunjing Gan ◽  
Kaiqi Zhao ◽  
Xuan Lin ◽  
Zhe Quan ◽  
...  

Recently, imbalanced data classification has received much attention due to its wide applications. In the literature, existing researches have attempted to improve the classification performance by considering various factors such as the imbalanced distribution, cost-sensitive learning, data space improvement, and ensemble learning. Nevertheless, most of the existing methods focus on only part of these main aspects/factors. In this work, we propose a novel imbalanced data classification model that considers all these main aspects. To evaluate the performance of our proposed model, we have conducted experiments based on 14 public datasets. The results show that our model outperforms the state-of-the-art methods in terms of recall, G-mean, F-measure and AUC.


2019 ◽  
Vol 9 (20) ◽  
pp. 4216 ◽  
Author(s):  
Zhen Chen ◽  
Xiaoyan Han ◽  
Chengwei Fan ◽  
Zirun He ◽  
Xueneng Su ◽  
...  

In recent years, machine learning methods have shown the great potential for real-time transient stability status prediction (TSSP) application. However, most existing studies overlook the imbalanced data problem in TSSP. To address this issue, a novel data segmentation-based ensemble classification (DSEC) method for TSSP is proposed in this paper. Firstly, the effects of the imbalanced data problem on the decision boundary and classification performance of TSSP are investigated in detail. Then, a three-step DSEC method is presented. In the first step, the data segmentation strategy is utilized for dividing the stable samples into multiple non-overlapping stable subsets, ensuring that the samples in each stable subset are not more than the unstable ones, then each stable subset is combined with the unstable set into a training subset. For the second step, an AdaBoost classifier is built based on each training subset. In the final step, decision values from each AdaBoost classifier are aggregated for determining the transient stability status. The experiments are conducted on the Northeast Power Coordinating Council 140-bus system and the simulation results indicate that the proposed approach can significantly improve the classification performance of TSSP with imbalanced data.


Sign in / Sign up

Export Citation Format

Share Document