Credit Card Fraud Detection: An Exploration of Different Sampling Methods to Solve the Class Imbalance Problem

This paper proposes an intelligent credit card fraud detection model for detecting fraud from highly imbalanced and anonymous credit card transaction datasets. The class imbalance problem is handled by finding legal as well as fraud transaction patterns for each customer by usingfrequent itemset mining.A matching algorithm is also proposed to find to which pattern (legal or fraud) the incoming transaction of a particular customer is closer and a decision is made accordingly. In order to handle the anonymous nature of the data, no preference is given to any of the attributes and each attribute is considered equally for finding the patterns. The performance evaluation of the proposed model is done on UCSD Data Mining Contest 2009 Dataset (anonymous and imbalanced) and it is found that the proposed model has very high fraud detection rate, balanced classification rate, Matthews correlation coefficient, and very less false alarm rate than other state-of-the-art classifiers.

Download Full-text

An approach to class imbalance problem based on stacking and inverse random under sampling methods

2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) ◽

10.1109/icnsc.2018.8361344 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yuwei Zhang ◽

Guanjun Liu ◽

Wenjing Luan ◽

Chungang Yan ◽

Changjun Jiang

Keyword(s):

Sampling Methods ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Download Full-text

Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem

Applied Sciences ◽

10.3390/app10041276 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1276 ◽

Cited By ~ 4

Author(s):

Eréndira Rendón ◽

Roberto Alejo ◽

Carlos Castorena ◽

Frank J. Isidro-Ortega ◽

Everardo E. Granda-Gutiérrez

Keyword(s):

Neural Network ◽

Big Data ◽

Deep Learning ◽

Sampling Methods ◽

Hybrid Approach ◽

Class Imbalance ◽

Data Sampling ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Output Noise

The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with cleaning techniques such as Editing Nearest Neighbor or Tomek’s Links (SMOTE+ENN and SMOTE+TL, respectively). In the big data context, it is noticeable that the class imbalance problem has been addressed by adaptation of traditional techniques, relatively ignoring intelligent approaches. Thus, the capabilities and possibilities of heuristic sampling methods on deep learning neural networks in big data domain are analyzed in this work, and the cleaning strategies are particularly analyzed. This study is developed on big data, multi-class imbalanced datasets obtained from hyper-spectral remote sensing images. The effectiveness of a hybrid approach on these datasets is analyzed, in which the dataset is cleaned by SMOTE followed by the training of an Artificial Neural Network (ANN) with those data, while the neural network output noise is processed with ENN to eliminate output noise; after that, the ANN is trained again with the resultant dataset. Obtained results suggest that best classification outcome is achieved when the cleaning strategies are applied on an ANN output instead of input feature space only. Consequently, the need to consider the classifier’s nature when the classical class imbalance approaches are adapted in deep learning and big data scenarios is clear.

Download Full-text

A Hybrid Method with Dynamic Weighted Entropy for Handling the Problem of Class Imbalance with Overlap in Credit Card Fraud Detection

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114750 ◽

2021 ◽

pp. 114750

Author(s):

Zhenchuan Li ◽

Mian Huang ◽

Guanjun Liu ◽

Changjun Jiang

Keyword(s):

Hybrid Method ◽

Credit Card ◽

Class Imbalance ◽

Fraud Detection ◽

Credit Card Fraud ◽

Weighted Entropy

Download Full-text

Improvising Balancing Methods for Classifying Imbalanced Data

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38225 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1535-1543

Author(s):

Himani Tiwari

Keyword(s):

Learning Community ◽

Sampling Methods ◽

Evaluation Criteria ◽

Class Imbalance ◽

Imbalanced Data ◽

Simulated Data ◽

Data Sets ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Under Sampling

Abstract: Class Imbalance problem is one of the most challenging problems faced by the machine learning community. As we refer the imbalance to various instances in class of being relatively low as compare to other data. A number of over - sampling and under-sampling approaches have been applied in an attempt to balance the classes. This study provides an overview of the issue of class imbalance and attempts to examine various balancing methods for dealing with this problem. In order to illustrate the differences, an experiment is conducted using multiple simulated data sets for comparing the performance of these oversampling methods on different classifiers based on various evaluation criteria. In addition, the effect of different parameters, such as number of features and imbalance ratio, on the classifier performance is also evaluated. Keywords: Imbalanced learning, Over-sampling methods, Under-sampling methods, Classifier performances, Evaluationmetrices

Download Full-text

Probabilistic Support Vector Regression Classification Model for Credit Card Fraud Detection

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i9.840843 ◽

2018 ◽

Vol 6 (9) ◽

pp. 840-843

Author(s):

Nikita Sawhney ◽

B. Kaur ◽

H. Kaur

Keyword(s):

Support Vector Regression ◽

Credit Card ◽

Fraud Detection ◽

Classification Model ◽

Support Vector ◽

Credit Card Fraud ◽

Probabilistic Support

Download Full-text

Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317228 ◽

2020 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali ◽

Micheal O. Arowolo ◽

Yinusa A. Olasupo

Keyword(s):

Feature Selection ◽

Random Forest ◽

Credit Card ◽

Naive Bayes ◽

Fraud Detection ◽

Naïve Bayes ◽

Credit Card Fraud

Download Full-text

Detection of Myocardial Infarction Using ECG and Multi-Scale Feature Concatenate

Sensors ◽

10.3390/s21051906 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1906

Author(s):

Jia-Zheng Jian ◽

Tzong-Rong Ger ◽

Han-Hua Lai ◽

Chi-Ming Ku ◽

Chiung-An Chen ◽

...

Keyword(s):

Myocardial Infarction ◽

Network Structure ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Multi Scale ◽

Imbalance Problem ◽

Average Accuracy ◽

Significant Difference ◽

Electrocardiogram Ecg

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.

Download Full-text

A Novel Focal Phi Loss for Power Line Segmentation with Auxiliary Classifier U-Net

Sensors ◽

10.3390/s21082803 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2803

Author(s):

Rabeea Jaffari ◽

Manzoor Ahmed Hashmani ◽

Constantino Carlos Reyes-Aldasoro

Keyword(s):

Loss Function ◽

Class Imbalance ◽

Power Line ◽

Aerial Images ◽

Class Imbalance Problem ◽

Trade Off ◽

Urban Scenes ◽

Imbalance Problem ◽

A Minor ◽

Evaluation Parameters

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.

Download Full-text