sampled data
Recently Published Documents





2022 ◽  
Vol 2161 (1) ◽  
pp. 012072
Konduri Praveen Mahesh ◽  
Shaik Ashar Afrouz ◽  
Anu Shaju Areeckal

Abstract Every year there is an increasing loss of a huge amount of money due to fraudulent credit card transactions. Recently there is a focus on using machine learning algorithms to identify fraud transactions. The number of fraud cases to non-fraud transactions is very low. This creates a skewed or unbalanced data, which poses a challenge to training the machine learning models. The availability of a public dataset for this research problem is scarce. The dataset used for this work is obtained from Kaggle. In this paper, we explore different sampling techniques such as under-sampling, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE-Tomek, to work on the unbalanced data. Classification models, such as k-Nearest Neighbour (KNN), logistic regression, random forest and Support Vector Machine (SVM), are trained on the sampled data to detect fraudulent credit card transactions. The performance of the various machine learning approaches are evaluated for its precision, recall and F1-score. The classification results obtained is promising and can be used for credit card fraud detection.

Jing Wang ◽  
Jinglin Zhou ◽  
Xiaolu Chen

AbstractMost of the sampled data in complex industrial processes are sequential in time. Therefore, the traditional BN learning mechanisms have limitations on the value of probability and cannot be applied to the time series. The model established in Chap. 10.1007/978-981-16-8044-1_13 is a graphical model similar to a Bayesian network, but its parameter learning method can only handle the discrete variables. This chapter aims at the probabilistic graphical model directly for the continuous process variables, which avoids the assumption of discrete or Gaussian distributions.

Sign in / Sign up

Export Citation Format

Share Document