scholarly journals Credit Card Fraud Detection System

Author(s):  
Kartik Madkaikar ◽  
◽  
Manthan Nagvekar ◽  
Preity Parab ◽  
Riya Raika ◽  
...  

Credit card fraud is a serious criminal offense. It costs individuals and financial institutions billions of dollars annually. According to the reports of the Federal Trade Commission (FTC), a consumer protection agency, the number of theft reports doubled in the last two years. It makes the detection and prevention of fraudulent activities critically important to financial institutions. Machine learning algorithms provide a proactive mechanism to prevent credit card fraud with acceptable accuracy. In this paper Machine Learning algorithms such as Logistic Regression, Naïve Bayes, Random Forest, K- Nearest Neighbor, Gradient Boosting, Support Vector Machine, and Neural Network algorithms are implemented for detection of fraudulent transactions. A comparative analysis of these algorithms is performed to identify an optimal solution.

InterConf ◽  
2021 ◽  
pp. 393-403
Author(s):  
Olexander Shmatko ◽  
Volodimir Fedorchenko ◽  
Dmytro Prochukhan

Today the banking sector offers its clients many different financial services such as ATM cards, Internet banking, Debit card, and Credit card, which allows attracting a large number of new customers. This article proposes an information system for detecting credit card fraud using a machine learning algorithm. Usually, credit cards are used by the customer around the clock, so the bank's server can track all transactions using machine learning algorithms. It must find or predict fraud detection. The dataset contains characteristics for each transaction and fraudulent transactions need to be classified and detected. For these purposes, the work proposes the use of the Random Forest algorithm.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2021 ◽  
Author(s):  
Floe Foxon

Ammonoid identification is crucial to biostratigraphy, systematic palaeontology, and evolutionary biology, but may prove difficult when shell features and sutures are poorly preserved. This necessitates novel approaches to ammonoid taxonomy. This study aimed to taxonomize ammonoids by their conch geometry using supervised and unsupervised machine learning algorithms. Ammonoid measurement data (conch diameter, whorl height, whorl width, and umbilical width) were taken from the Paleobiology Database (PBDB). 11 species with ≥50 specimens each were identified providing N=781 total unique specimens. Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, K-Nearest Neighbours, and Support Vector Machine classifiers were applied to the PBDB data with a 5x5 nested cross-validation approach to obtain unbiased generalization performance estimates across a grid search of algorithm parameters. All supervised classifiers achieved ≥70% accuracy in identifying ammonoid species, with Naive Bayes demonstrating the least over-fitting. The unsupervised clustering algorithms K-Means, DBSCAN, OPTICS, Mean Shift, and Affinity Propagation achieved Normalized Mutual Information scores of ≥0.6, with the centroid-based methods having most success. This presents a reasonably-accurate proof-of-concept approach to ammonoid classification which may assist identification in cases where more traditional methods are not feasible.


2021 ◽  
Author(s):  
Mandana Modabbernia ◽  
Heather C Whalley ◽  
David Glahn ◽  
Paul M. Thompson ◽  
Rene S. Kahn ◽  
...  

Application of machine learning algorithms to structural magnetic resonance imaging (sMRI) data has yielded behaviorally meaningful estimates of the biological age of the brain (brain-age). The choice of the machine learning approach in estimating brain-age in children and adolescents is important because age-related brain changes in these age-groups are dynamic. However, the comparative performance of the multiple machine learning algorithms available has not been systematically appraised. To address this gap, the present study evaluated the accuracy (Mean Absolute Error; MAE) and computational efficiency of 21 machine learning algorithms using sMRI data from 2,105 typically developing individuals aged 5 to 22 years from five cohorts. The trained models were then tested in an independent holdout datasets, comprising 4,078 pre-adolescents (aged 9-10 years). The algorithms encompassed parametric and nonparametric, Bayesian, linear and nonlinear, tree-based, and kernel-based models. Sensitivity analyses were performed for parcellation scheme, number of neuroimaging input features, number of cross-validation folds, and sample size. The best performing algorithms were Extreme Gradient Boosting (MAE of 1.25 years for females and 1.57 years for males), Random Forest Regression (MAE of 1.23 years for females and 1.65 years for males) and Support Vector Regression with Radial Basis Function Kernel (MAE of 1.47 years for females and 1.72 years for males) which had acceptable and comparable computational efficiency. Findings of the present study could be used as a guide for optimizing methodology when quantifying age-related changes during development.


Sign in / Sign up

Export Citation Format

Share Document