Improving Online Aggregation Performance for Skewed Data Distribution

Imbalanced data classification problems endeavor to find a dependent variable in a skewed data distribution. Imbalanced data classification problems present in many application areas like, medical disease diagnosis, risk management, fault-detection, etc. It is a challenging problem in the field of machine learning and data mining. In this paper, K-Means cluster based oversampling algorithm is proposed to solve the imbalanced data classification problem. The experimental results show that the proposed algorithm outperforms the existing oversampling algorithms of previous studies.

Download Full-text

Review of Imbalanced Data Classification and Approaches Relating to Real-Time Applications

Data Preprocessing, Active Learning, and Cost Perceptive Approaches for Resolving Data Imbalance - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-7371-6.ch001 ◽

2021 ◽

pp. 1-22

Author(s):

Anjali S. More ◽

Dipti P. Rana

Keyword(s):

Real Time ◽

Performance Metrics ◽

Data Distribution ◽

Imbalanced Data ◽

Data Classification ◽

Disease Diagnosis ◽

Skewed Data ◽

Data Imbalance ◽

Imbalanced Data Classification ◽

Real Time Applications

In today's era, multifarious data mining applications deal with leading challenges of handling imbalanced data classification and its impact on performance metrics. There is the presence of skewed data distribution in an ample range of existent time applications which engrossed the attention of researchers. Fraud detection in finance, disease diagnosis in medical applications, oil spill detection, pilfering in electricity, anomaly detection and intrusion detection in security, and other real-time applications constitute uneven data distribution. Data imbalance affects classification performance metrics and upturns the error rate. These leading challenges prompted researchers to investigate imbalanced data applications and related machine learning approaches. The intent of this research work is to review a wide variety of imbalanced data applications of skewed data distribution as binary class data unevenness and multiclass data disproportion, the problem encounters, the variety of approaches to resolve the data imbalance, and possible open research areas.

Download Full-text

GWO Optimized K-Means Cluster based Oversampling Algorithm

INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING ◽

10.47164/ijngc.v12i3.694 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Santha Subbulaxmi S ◽

Arumugam G

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Classification Problem ◽

The Other ◽

Classification Algorithms ◽

Skewed Data ◽

Grey Wolf ◽

Imbalanced Data Classification ◽

Traditional Classification ◽

Real World Applications

Skewed data distribution prevails in many real world applications. The skewedness is due to imbalance in the class distribution and it deteriorates the performance of the traditional classification algorithms. In this paper, we provide a Grey wolf optimized K-Means cluster based oversampling algorithm to handle the skewedness and solve the imbalanced data classification problem. Experiments are conducted on the proposed algorithm and compared it with the benchmarking popular algorithms. The results reveal that the proposed algorithm outperforms the other benchmarking algorithms.

Download Full-text

A Synergistic Approach to Enhance the Accuracy-interpretability Trade-off of the NECLASS Classifier for Skewed Data Distribution

Proceedings of the 11th International Joint Conference on Computational Intelligence ◽

10.5220/0008072503250334 ◽

2019 ◽

Author(s):

Jamileh Yousefi ◽

Andrew Hamilton-Wright ◽

Charlie Obimbo

Keyword(s):

Data Distribution ◽

Skewed Data ◽

Trade Off ◽

Synergistic Approach

Download Full-text

An Adaptive Data Distribution Through Tree Rules in Frequent Pattern Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit183894 ◽

2018 ◽

pp. 300-305

Keyword(s):

Information Sharing ◽

Pattern Mining ◽

Data Distribution ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

General Development ◽

Secure Information ◽

Evaluation Parameters ◽

Secure Information Sharing

Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.

Download Full-text