scholarly journals Detection of fraudulent credit card transactions: A comparative analysis of data sampling and classification techniques

2022 ◽  
Vol 2161 (1) ◽  
pp. 012072
Author(s):  
Konduri Praveen Mahesh ◽  
Shaik Ashar Afrouz ◽  
Anu Shaju Areeckal

Abstract Every year there is an increasing loss of a huge amount of money due to fraudulent credit card transactions. Recently there is a focus on using machine learning algorithms to identify fraud transactions. The number of fraud cases to non-fraud transactions is very low. This creates a skewed or unbalanced data, which poses a challenge to training the machine learning models. The availability of a public dataset for this research problem is scarce. The dataset used for this work is obtained from Kaggle. In this paper, we explore different sampling techniques such as under-sampling, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE-Tomek, to work on the unbalanced data. Classification models, such as k-Nearest Neighbour (KNN), logistic regression, random forest and Support Vector Machine (SVM), are trained on the sampled data to detect fraudulent credit card transactions. The performance of the various machine learning approaches are evaluated for its precision, recall and F1-score. The classification results obtained is promising and can be used for credit card fraud detection.

Author(s):  
Sheela Rani P ◽  
Dhivya S ◽  
Dharshini Priya M ◽  
Dharmila Chowdary A

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.


2021 ◽  
Author(s):  
El houssaine Bouras ◽  
Lionel Jarlan ◽  
Salah Er-Raki ◽  
Riad Balaghi ◽  
Abdelhakim Amazirh ◽  
...  

<p>Cereals are the main crop in Morocco. Its production exhibits a high inter-annual due to uncertain rainfall and recurrent drought periods. Considering the importance of this resource to the country's economy, it is thus important for decision makers to have reliable forecasts of the annual cereal production in order to pre-empt importation needs. In this study, we assessed the joint use of satellite-based drought indices, weather (precipitation and temperature) and climate data (pseudo-oscillation indices including NAO and the leading modes of sea surface temperature -SST- in the mid-latitude and in the tropical area) to predict cereal yields at the level of the agricultural province using machine learning algorithms (Support Vector Machine -SVM-, Random forest -FR- and eXtreme Gradient Boost -XGBoost-) in addition to Multiple Linear Regression (MLR). Also, we evaluate the models for different lead times along the growing season from January (about 5 months before harvest) to March (2 months before harvest). The results show the combination of data from the different sources outperformed the use of a single dataset; the highest accuracy being obtained when the three data sources were all considered in the model development. In addition, the results show that the models can accurately predict yields in January (5 months before harvesting) with an R² = 0.90 and RMSE about 3.4 Qt.ha<sup>-1</sup>.  When comparing the model’s performance, XGBoost represents the best one for predicting yields. Also, considering specific models for each province separately improves the statistical metrics by approximately 10-50% depending on the province with regards to one global model applied to all the provinces. The results of this study pointed out that machine learning is a promising tool for cereal yield forecasting. Also, the proposed methodology can be extended to different crops and different regions for crop yield forecasting.</p>


2020 ◽  
Vol 187 ◽  
pp. 04001
Author(s):  
Ravipat Lapcharoensuk ◽  
Kitticheat Danupattanin ◽  
Chaowarin Kanjanapornprapa ◽  
Tawin Inkawee

This research aimed to study the combination of NIR spectroscopy and machine learning for monitoring chilli sauce adulterated with papaya smoothie. The chilli sauce was produced by the famous community enterprise of chilli sauce processing in Thailand. The ingredients of the chilli sauce consisted of 45% chilli, 25% sugar, 20% garlic, 5% vinegar, and 5% salt. The chilli sauce sample was mixed with ripened papaya (Khaek Dam variety) smoothie with 9 levels from 10 to 90 %w/w. The NIR spectra of pure chilli sauce, papaya smoothie and 9 adulterated chilli sauce samples were recorded using FT-NIR spectrometer in the wavenumber range of 12500 and 4000 cm-1. Three machine learning algorithms were applied to develop a model for monitoring adulterated chilli sauce, including partial least squares regression (PLS), support vector machine (SVM), and backpropagation neural network (BPNN). All model presented performance of prediction in the validation set with R2al = 0.99 while RMSEP of PLS, SVM and BPNN were 1.71, 2.18 and 3.27% w/w respectively. This finding indicated that NIR spectroscopy coupled with machine learning approaches were shown to be an alternative technique to monitor papaya smoothie adulterated in chilli sauce in the global food industry.


2021 ◽  
Vol 11 ◽  
Author(s):  
Qi Wan ◽  
Jiaxuan Zhou ◽  
Xiaoying Xia ◽  
Jianfeng Hu ◽  
Peng Wang ◽  
...  

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.


Drones ◽  
2020 ◽  
Vol 4 (3) ◽  
pp. 45
Author(s):  
Maria Angela Musci ◽  
Luigi Mazzara ◽  
Andrea Maria Lingua

Aircraft ground de-icing operations play a critical role in flight safety. However, to handle the aircraft de-icing, a considerable quantity of de-icing fluids is commonly employed. Moreover, some pre-flight inspections are carried out with engines running; thus, a large amount of fuel is wasted, and CO2 is emitted. This implies substantial economic and environmental impacts. In this context, the European project (reference call: MANUNET III 2018, project code: MNET18/ICT-3438) called SEI (Spectral Evidence of Ice) aims to provide innovative tools to identify the ice on aircraft and improve the efficiency of the de-icing process. The project includes the design of a low-cost UAV (uncrewed aerial vehicle) platform and the development of a quasi-real-time ice detection methodology to ensure a faster and semi-automatic activity with a reduction of applied operating time and de-icing fluids. The purpose of this work, developed within the activities of the project, is defining and testing the most suitable sensor using a radiometric approach and machine learning algorithms. The adopted methodology consists of classifying ice through spectral imagery collected by two different sensors: multispectral and hyperspectral camera. Since the UAV prototype is under construction, the experimental analysis was performed with a simulation dataset acquired on the ground. The comparison among the two approaches, and their related algorithms (random forest and support vector machine) for image processing, was presented: practical results show that it is possible to identify the ice in both cases. Nonetheless, the hyperspectral camera guarantees a more reliable solution reaching a higher level of accuracy of classified iced surfaces.


2019 ◽  
Vol 27 (1) ◽  
pp. 13-21 ◽  
Author(s):  
Qiang Wei ◽  
Zongcheng Ji ◽  
Zhiheng Li ◽  
Jingcheng Du ◽  
Jingqi Wang ◽  
...  

AbstractObjectiveThis article presents our approaches to extraction of medications and associated adverse drug events (ADEs) from clinical documents, which is the second track of the 2018 National NLP Clinical Challenges (n2c2) shared task.Materials and MethodsThe clinical corpus used in this study was from the MIMIC-III database and the organizers annotated 303 documents for training and 202 for testing. Our system consists of 2 components: a named entity recognition (NER) and a relation classification (RC) component. For each component, we implemented deep learning-based approaches (eg, BI-LSTM-CRF) and compared them with traditional machine learning approaches, namely, conditional random fields for NER and support vector machines for RC, respectively. In addition, we developed a deep learning-based joint model that recognizes ADEs and their relations to medications in 1 step using a sequence labeling approach. To further improve the performance, we also investigated different ensemble approaches to generating optimal performance by combining outputs from multiple approaches.ResultsOur best-performing systems achieved F1 scores of 93.45% for NER, 96.30% for RC, and 89.05% for end-to-end evaluation, which ranked #2, #1, and #1 among all participants, respectively. Additional evaluations show that the deep learning-based approaches did outperform traditional machine learning algorithms in both NER and RC. The joint model that simultaneously recognizes ADEs and their relations to medications also achieved the best performance on RC, indicating its promise for relation extraction.ConclusionIn this study, we developed deep learning approaches for extracting medications and their attributes such as ADEs, and demonstrated its superior performance compared with traditional machine learning algorithms, indicating its uses in broader NER and RC tasks in the medical domain.


2021 ◽  
Vol 11 (4) ◽  
pp. 286-290
Author(s):  
Md. Golam Kibria ◽  
◽  
Mehmet Sevkli

The increased credit card defaulters have forced the companies to think carefully before the approval of credit applications. Credit card companies usually use their judgment to determine whether a credit card should be issued to the customer satisfying certain criteria. Some machine learning algorithms have also been used to support the decision. The main objective of this paper is to build a deep learning model based on the UCI (University of California, Irvine) data sets, which can support the credit card approval decision. Secondly, the performance of the built model is compared with the other two traditional machine learning algorithms: logistic regression (LR) and support vector machine (SVM). Our results show that the overall performance of our deep learning model is slightly better than that of the other two models.


2020 ◽  
Vol 3 (2) ◽  
pp. 196-206
Author(s):  
Mausumi Das Nath ◽  
◽  
Tapalina Bhattasali

Due to the enormous usage of the Internet, users share resources and exchange voluminous amounts of data. This increases the high risk of data theft and other types of attacks. Network security plays a vital role in protecting the electronic exchange of data and attempts to avoid disruption concerning finances or disrupted services due to the unknown proliferations in the network. Many Intrusion Detection Systems (IDS) are commonly used to detect such unknown attacks and unauthorized access in a network. Many approaches have been put forward by the researchers which showed satisfactory results in intrusion detection systems significantly which ranged from various traditional approaches to Artificial Intelligence (AI) based approaches.AI based techniques have gained an edge over other statistical techniques in the research community due to its enormous benefits. Procedures can be designed to display behavior learned from previous experiences. Machine learning algorithms are used to analyze the abnormal instances in a particular network. Supervised learning is essential in terms of training and analyzing the abnormal behavior in a network. In this paper, we propose a model of Naïve Bayes and SVM (Support Vector Machine) to detect anomalies and an ensemble approach to solve the weaknesses and to remove the poor detection results


2021 ◽  
Vol 5 (2) ◽  
pp. 20-25
Author(s):  
Azhi Abdalmohammed Faraj ◽  
Didam Ahmed Mahmud ◽  
Bilal Najmaddin Rashid

Credit card defaults pause a business-critical threat in banking systems thus prompt detection of defaulters is a crucial and challenging research problem. Machine learning algorithms must deal with a heavily skewed dataset since the ratio of defaulters to non-defaulters is very small. The purpose of this research is to apply different ensemble methods and compare their performance in detecting the probability of defaults customer’s credit card default payments in Taiwan from the UCI Machine learning repository. This is done on both the original skewed dataset and then on balanced dataset several studies have showed the superiority of neural networks as compared to traditional machine learning algorithms, the results of our study show that ensemble methods consistently outperform Neural Networks and other machine learning algorithms in terms of F1 score and area under receiver operating characteristic curve regardless of balancing the dataset or ignoring the imbalance


2017 ◽  
Vol 56 (03) ◽  
pp. 209-216 ◽  
Author(s):  
Said Ouatik El Alaoui ◽  
Mourad Sarrouti

SummaryBackground and Objective: Biomedical question type classification is one of the important components of an automatic biomedical question answering system. The performance of the latter depends directly on the performance of its biomedical question type classification system, which consists of assigning a category to each question in order to determine the appropriate answer extraction algorithm. This study aims to automatically classify biomedical questions into one of the four categories: (1) yes/no, (2) factoid, (3) list, and (4) summary.Methods: In this paper, we propose a biomedical question type classification method based on machine learning approaches to automatically assign a category to a biomedical question. First, we extract features from biomedical questions using the proposed handcrafted lexico-syntactic patterns. Then, we feed these features for machine- learning algorithms. Finally, the class label is predicted using the trained classifiers.Results: Experimental evaluations performed on large standard annotated datasets of biomedical questions, provided by the BioASQ challenge, demonstrated that our method exhibits significant improved performance when compared to four baseline systems. The proposed method achieves a roughly 10-point increase over the best baseline in terms of accuracy. Moreover, the obtained results show that using handcrafted lexico-syntactic patterns as features’ provider of support vector machine (SVM) lead to the highest accuracy of 89.40%.Conclusion: The proposed method can automatically classify BioASQ questions into one of the four categories: yes/no, factoid, list, and summary. Furthermore, the results demonstrated that our method produced the best classification performance compared to four baseline systems.


Sign in / Sign up

Export Citation Format

Share Document