Improving Online Aggregation Performance for Skewed Data Distribution

Author(s):  
Yuxiang Wang ◽  
Junzhou Luo ◽  
Aibo Song ◽  
Jiahui Jin ◽  
Fang Dong
2020 ◽  
Vol 8 (5) ◽  
pp. 3436-3440

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.


Author(s):  
Anjali S. More ◽  
Dipti P. Rana

In today's era, multifarious data mining applications deal with leading challenges of handling imbalanced data classification and its impact on performance metrics. There is the presence of skewed data distribution in an ample range of existent time applications which engrossed the attention of researchers. Fraud detection in finance, disease diagnosis in medical applications, oil spill detection, pilfering in electricity, anomaly detection and intrusion detection in security, and other real-time applications constitute uneven data distribution. Data imbalance affects classification performance metrics and upturns the error rate. These leading challenges prompted researchers to investigate imbalanced data applications and related machine learning approaches. The intent of this research work is to review a wide variety of imbalanced data applications of skewed data distribution as binary class data unevenness and multiclass data disproportion, the problem encounters, the variety of approaches to resolve the data imbalance, and possible open research areas.


Author(s):  
Santha Subbulaxmi S ◽  
Arumugam G

Skewed data distribution prevails in many real world applications. The skewedness is due to imbalance in the class distribution and it deteriorates the performance of the traditional classification algorithms. In this paper, we provide a Grey wolf optimized K-Means cluster based oversampling algorithm to handle the skewedness and solve the imbalanced data classification problem. Experiments are conducted on the proposed algorithm and compared it with the benchmarking popular algorithms. The results reveal that the proposed algorithm outperforms the other benchmarking algorithms.


Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.


Sign in / Sign up

Export Citation Format

Share Document