A Study on Various Applications of Data Mining and Supervised Learning Techniques in Business Fraud Detection

Author(s):  
Amit Majumder ◽  
Ira Nath

Data mining technique helps us to extract useful data from a large dataset of any raw data. It is used to analyse and identify data patterns and to find anomalies and correlations within dataset to predict outcomes. Using a broad range of techniques, we can use this information to improve customer relationships and reduce risks. Data mining and supervised learning have applications in multiple fields of science and research. Machine learning looks at patterns of data and helps to predict future behaviour by learning from the patterns. Data mining is normally used as a source of information on which machine learning can be applied to solve some of problems in our daily life. Supervised learning is one type of machine learning method which uses labelled data consisting of input along with the label of inputs and generates one learned model (or classifier for classification type work) which can be used to label unknown data. Financial accounting fraud detection has become an emerging topic in the field of academic, research and industries.

2017 ◽  
Vol 14 (1) ◽  
pp. 32-36
Author(s):  
Dan Han

Financial statement fraud has been one of the biggest challenges in the modern business world. Financial accounting fraud detection (FAFD) has become an emerging topic of great importance for academic, research and industries. In this paper, the effectiveness of Data Mining (DM) classification techniques in detecting firms that issue fraudulent financial statements (FFS) and deals with the identification of factors associated to FFS are explored. Our study investigates the usefulness of Data Mining techniques including Decision Trees, Neural Networks and Bayesian Belief Networks in the identification of fraudulent financial statements. At last, we compare the three models in terms of their performances.


Author(s):  
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.


2008 ◽  
pp. 3639-3644
Author(s):  
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.


2021 ◽  
Vol 4 (3) ◽  
pp. 139-143
Author(s):  
Mariana Vlad ◽  
◽  
Sorin Vlad ◽  

Machine learning (ML) is a subset of artificial Intelligence (AI) aiming to develop systems that can learn and continuously improve the abilities through generalization in an autonomous manner. ML is presently all around us, almost every facet of our digital and real life is embedding some ML related content. Customer recommendation systems, customer behavior prediction, fraud detection, speech recognition, image recognition, black & white movies colorization, accounting fraud detection are just some examples of the vast range of applications in which ML is involved. The techniques that this paper investigates are mainly focused on the use of neural networks in accounting and finance research fields. An artificial neural network is modelling the brain ability of learning intricate patterns from the information presented at its inputs using elementary interconnected units, named neurons, grouped in layers and trained by means of a learning algorithm. The performance of the network depends on many factors like the number of layers, the number of each neurons in each layer, the learning algorithm, activation functions, to name just a few of them. Machine learning algorithms have already started to replace humans in jobs that require document’s processing and decision making.


2020 ◽  
Vol 24 (104) ◽  
pp. 58-66
Author(s):  
Fredy Humberto Troncoso Espinosa ◽  
Fuentes Figueroa Paulina Gisselot ◽  
Italo Ramiro Belmar Arriagada

El comportamiento fraudulento en el consumo de agua potable es un problema importante que enfrentan las empresas de tratamiento de agua debido a que genera pérdidas económicas significativas. Caracterizar consumos fraudulentos es una tarea compleja, basada principalmente en la experiencia, y que presenta el desafío de la incorporación constante de nuevos clientes y la variación en el consumo mensual. En esta investigación, las técnicas de minería de datos se utilizan para caracterizar y predecir los consumos fraudulentos de agua potable. Para esto, se utilizó información histórica relacionada con el consumo. Las técnicas aplicadas mostraron un alto rendimiento predictivo y su aplicación permitirá enfocar eficientemente los recursos orientados a evitar este tipo de fraude. Palabras Clave: minería de datos, machine learning, agua potable, detección de fraude. Referencias [1]Centro de Investigación Periodística., «Producción y facturación de agua potable,» 30 Julio 2020. [En línea]. Disponible en: https://ciperchile.cl/wp-content/uploads/gestion-siis-2014-pag 88.pdf. [Último acceso: 30 Julio 2020]. [2]Bureau Veritas S.A., «https://www.bureauveritas.cl/es,» [En línea]. Disponible en: https://www.bureauveritas.cl/es/bureau-veritas-lider-mundial-en-ensayos-inspeccion-y-certificacion. [Último acceso: 1 Junio 2020]. [3]Essbio S.A., «www.essbio.cl,» [En línea]. [4]I. Monedero, F. Biscarri, J. Guerrero, M. Peña, M. Roldán y C. León, «Detection of water meter under-registration using statistical algorithms,» Journal of Water Resources Planning and Management, vol. 142, nº 1, p. 04015036, 2016. [5]I. Monedero, F. Biscarri, C. León, J. Guerrero, J. Biscarri y R. Millán, «Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees,» International Journal of Electrical Power & Energy Systems, vol. 34, nº 1, pp. 90-98, 2012. [6]S. Wang, «A comprehensive survey of data mining-based accounting-fraud detection research,» de 2010 International Conference on Intelligent Computation Technology and Automation, New York, 2010. [7]J. Bierstaker, R. Brody y C. Pacini, «Accountants' perceptions regarding fraud detection and prevention methods,» Managerial Auditing Journal, vol. 21, nº 5, pp. 520-535, 2006. [8]C. Phua, V. Lee, K. Smith y R. Gayler, «A comprehensive survey of data mining-based fraud detection research,» arXiv preprint arXiv:1009.6119, 2010. [9]S. Kotsiantis, I. Zaharakis y P. Pintelas, «Machine learning: a review of classification and combining techniques,» Artificial Intelligence Review, vol. 26, nº 3, pp. 159-190, 2006. [10]J. Han, J. Pei y M. Kamber, Data Mining: Concepts and Techniques, Elsevier, 2011.  


Machine learning have revolutionized fraud detection in various domains like telecommunication and ecommerce. Global statistics shows how billions of dollars are lost because of card frauds every year and millions of people falling the victims. Fraud detection systems used for credit card fraud detection 2 decades ago are still being used because of the trust and stability they have provided for so long. With a number of academic research being done in fraud detection their effect on the financial industry has been minimum. Even with high prediction accuracy using machine learning approaches like deep learning and stack ensemble most of these research gets directly rejected by the industry. Our research objective is to highlight the reason of rejectection which are mostly ignored by the researchers and there adverse effect on the results


Computers ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 121
Author(s):  
Marco Sánchez-Aguayo ◽  
Luis Urquiza-Aguiar ◽  
José Estrada-Jiménez

Fraud entails deception in order to obtain illegal gains; thus, it is mainly evidenced within financial institutions and is a matter of general interest. The problem is particularly complex, since perpetrators of fraud could belong to any position, from top managers to payroll employees. Fraud detection has traditionally been performed by auditors, who mainly employ manual techniques. These could take too long to process fraud-related evidence. Data mining, machine learning, and, as of recently, deep learning strategies are being used to automate this type of processing. Many related techniques have been developed to analyze, detect, and prevent fraud-related behavior, with the fraud triangle associated with the classic auditing model being one of the most important of these. This work aims to review current work related to fraud detection that uses the fraud triangle in addition to machine learning and deep learning techniques. We used the Kitchenham methodology to analyze the research works related to fraud detection from the last decade. This review provides evidence that fraud is an area of active investigation. Several works related to fraud detection using machine learning techniques were identified without the evidence that they incorporated the fraud triangle as a method for more efficient analysis.


2020 ◽  
Vol 4 (2) ◽  
pp. 98-112
Author(s):  
Hossam Eldin M. Abd Elhamid ◽  
◽  
Wael Khalif ◽  
Mohamed Roushdy ◽  
Abdel-Badeeh M. Salem ◽  
...  

The term “fraud”, it always concerned about credit card fraud in our minds. And after the significant increase in the transactions of credit card, the fraud of credit card increased extremely in last years. So the fraud detection should include surveillance of the spending attitude for the person/customer to the determination, avoidance, and detection of unwanted behavior. Because the credit card is the most payment predominant way for the online and regular purchasing, the credit card fraud raises highly. The Fraud detection is not only concerned with capturing of the fraudulent practices, but also, discover it as fast as they can, because the fraud costs millions of dollar business loss and it is rising over time, and that affects greatly the worldwide economy. . In this paper we introduce 14 different techniques of how data mining techniques can be successfully combined to obtain a high fraud coverage with a high or low false rate, the Advantage and The Disadvantages of every technique, and The Data Sets used in the researches by researcher


2021 ◽  
Vol 11 (3) ◽  
pp. 1323
Author(s):  
Medard Edmund Mswahili ◽  
Min-Jeong Lee ◽  
Gati Lother Martin ◽  
Junghyun Kim ◽  
Paul Kim ◽  
...  

Cocrystals are of much interest in industrial application as well as academic research, and screening of suitable coformers for active pharmaceutical ingredients is the most crucial and challenging step in cocrystal development. Recently, machine learning techniques are attracting researchers in many fields including pharmaceutical research such as quantitative structure-activity/property relationship. In this paper, we develop machine learning models to predict cocrystal formation. We extract descriptor values from simplified molecular-input line-entry system (SMILES) of compounds and compare the machine learning models by experiments with our collected data of 1476 instances. As a result, we found that artificial neural network shows great potential as it has the best accuracy, sensitivity, and F1 score. We also found that the model achieved comparable performance with about half of the descriptors chosen by feature selection algorithms. We believe that this will contribute to faster and more accurate cocrystal development.


Sign in / Sign up

Export Citation Format

Share Document