Applying and comparing multiple machine learning techniques to detect fraudulent credit card transactions

Author(s):  
D.V. Berezkin ◽  
Shi Runfang ◽  
Li Tengjiao

This experiment compared the performance of four machine learning algorithms in detecting bank card fraud. At the same time, the strong imbalance of the classes in the training sample was taken into account, as well as the difference in transaction amounts, and the ability of different machine learning methods to recognize fraudulent behavior was assessed taking into account these features. It has been found that a method that works well with indicators for assessing a classification is not necessarily the best in terms of assessing the magnitude of economic losses. Logistic regression is a good proof of this. The results of this work show that the problem of detecting fraud with bank cards cannot be regarded as a simple classification problem. AUC data is not the most appropriate metric for fraud detection tasks. The final choice of the model depends on the needs of the bank, that is, it is necessary to take into account which of the two types of errors (FN, FP) will lead to large economic losses for the bank. If the bank believes that the loss caused by identifying fraudulent transactions as regular transactions is the main one, it should choose the algorithm with the lowest FN value, which in this experiment is Adaboost. If the bank believes that the negative impact of identifying regular transactions as fraudulent is also very important, it should choose an algorithm with relatively small FN and FP data. In this experiment, the overall performance of the random forest is better. Further, by evaluating the economic losses caused by false positives (identifying an ordinary transaction as fraudulent), a quantitative analysis of the economic losses caused by each algorithm can be used to select the optimal algorithm model.

2022 ◽  
Author(s):  
Kingsley Austin

Abstract— Credit card fraud is a serious problem for e-commerce retailers with UK merchants reporting losses of $574.2M in 2020. As a result, effective fraud detection systems must be in place to ensure that payments are processed securely in an online environment. From the literature, the detection of credit card fraud is challenging due to dataset imbalance (genuine versus fraudulent transactions), real-time processing requirements, and the dynamic behavior of fraudsters and customers. It is proposed in this paper that the use of machine learning could be an effective solution for combating credit card fraud.According to research, machine learning techniques can play a role in overcoming the identified challenges while ensuring a high detection rate of fraudulent transactions, both directly and indirectly. Even though both supervised and unsupervised machine learning algorithms have been suggested, the flaws in both methods point to the necessity for hybrid approaches.


2020 ◽  
Vol 11 (3) ◽  
pp. 80-105 ◽  
Author(s):  
Vijay M. Khadse ◽  
Parikshit Narendra Mahalle ◽  
Gitanjali R. Shinde

The emerging area of the internet of things (IoT) generates a large amount of data from IoT applications such as health care, smart cities, etc. This data needs to be analyzed in order to derive useful inferences. Machine learning (ML) plays a significant role in analyzing such data. It becomes difficult to select optimal algorithm from the available set of algorithms/classifiers to obtain best results. The performance of algorithms differs when applied to datasets from different application domains. In learning, it is difficult to understand if the difference in performance is real or due to random variation in test data, training data, or internal randomness of the learning algorithms. This study takes into account these issues during a comparison of ML algorithms for binary and multivariate classification. It helps in providing guidelines for statistical validation of results. The results obtained show that the performance measure of accuracy for one algorithm differs by critical difference (CD) than others over binary and multivariate datasets obtained from different application domains.


The fraudulent transactions that occur in credit cards end in huge financial crisis. Since the web transactions has grown rapidly, the results of digitalized process hold an enormous sharing of such transactions. So, the financial institutions including banks offers much value to the applications of fraud detection. The Fraudulent transactions can occur in different ways and in various categories. Our work mainly focuses on detecting the illegal transactions effectively. Those transactions are addressed by employing some machine learning models and therefore the efficient method is chosen through an evaluation using some performance metrics. This work also helps to select an optimal algorithm with reference to the machine learning algorithms. We illustrate the evaluation with suitable performance measures. We use those performance metrics to evaluate the algorithm chosen. Within the existing system the algorithms provide less efficiency and makes the training model slow. Hence within the proposed system we used Multilayer Perceptron and Random Forest to supply high efficiency. From these algorithms efficient one is chosen through evaluation.


2021 ◽  
Vol 11 (4) ◽  
pp. 286-290
Author(s):  
Md. Golam Kibria ◽  
◽  
Mehmet Sevkli

The increased credit card defaulters have forced the companies to think carefully before the approval of credit applications. Credit card companies usually use their judgment to determine whether a credit card should be issued to the customer satisfying certain criteria. Some machine learning algorithms have also been used to support the decision. The main objective of this paper is to build a deep learning model based on the UCI (University of California, Irvine) data sets, which can support the credit card approval decision. Secondly, the performance of the built model is compared with the other two traditional machine learning algorithms: logistic regression (LR) and support vector machine (SVM). Our results show that the overall performance of our deep learning model is slightly better than that of the other two models.


Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 65 ◽  
Author(s):  
Kanadpriya Basu ◽  
Treena Basu ◽  
Ron Buckmire ◽  
Nishu Lal

Every year, academic institutions invest considerable effort and substantial resources to influence, predict and understand the decision-making choices of applicants who have been offered admission. In this study, we applied several supervised machine learning techniques to four years of data on 11,001 students, each with 35 associated features, admitted to a small liberal arts college in California to predict student college commitment decisions. By treating the question of whether a student offered admission will accept it as a binary classification problem, we implemented a number of different classifiers and then evaluated the performance of these algorithms using the metrics of accuracy, precision, recall, F-measure and area under the receiver operator curve. The results from this study indicate that the logistic regression classifier performed best in modeling the student college commitment decision problem, i.e., predicting whether a student will accept an admission offer, with an AUC score of 79.6%. The significance of this research is that it demonstrates that many institutions could use machine learning algorithms to improve the accuracy of their estimates of entering class sizes, thus allowing more optimal allocation of resources and better control over net tuition revenue.


2018 ◽  
Vol 1 (26) ◽  
pp. 461-474
Author(s):  
Hussein Altabrawee

Banks process their financial data by machine learning techniques to get knowledge from the data and use that knowledge in decision making and risk management. In this research, fourteen classification models have been built and trained using a real financial data from a bank in Taiwan. The models forecast the credit card default of a customer which is the repayment delay of the credit granted to the customer. The main idea of the research is evaluating and comparing the models based on their predictive average class accuracy


2020 ◽  
Vol 17 (1) ◽  
pp. 201-205
Author(s):  
Gina George ◽  
Anisha M. Lal ◽  
P. Gayathri ◽  
Niveditha Mahendran

Diabetes Mellitus disease is said to occur when there is not proper generation of insulin in the body which is needed for proper regulation of glucose in the body. This health disorder leads to whole degradation of several organs including the heart, kidneys, eyes, nerves. Hence diabetes disease diagnosis by means of accurate prediction is vital. When such disease related data is given as input to several machine learning techniques it becomes an important classification problem. The purpose of the work done in this paper is to compare several classic machine learning algorithms including decision tree, logistic regression and ensemble methods to identify the more accurate classification algorithm for better prediction of the diabetes mellitus disease. This in turn would help for better and effective treatment.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Sign in / Sign up

Export Citation Format

Share Document