Medicare Fraud Detection Using Random Forest with Class Imbalanced Big Data

Author(s):  
Richard Bauder ◽  
Taghi Khoshgoftaar
Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


2019 ◽  
Vol 34 (3) ◽  
pp. 324-337 ◽  
Author(s):  
Jiali Tang ◽  
Khondkar E. Karim

PurposeThis paper aims to discuss the application of Big Data analytics to the brainstorming session in the current auditing standards.Design/methodology/approachThe authors review the literature related to fraud, brainstorming sessions and Big Data, and propose a model that auditors can follow during the brainstorming sessions by applying Big Data analytics at different steps.FindingsThe existing audit practice aimed at identifying the fraud risk factors needs enhancement, due to the inefficient use of unstructured data. The brainstorming session provides a useful setting for such concern as it draws on collective wisdom and encourages idea generation. The integration of Big Data analytics into brainstorming can broaden the information size, strengthen the results from analytical procedures and facilitate auditors’ communication. In the model proposed, an audit team can use Big Data tools at every step of the brainstorming process, including initial data collection, data integration, fraud indicator identification, group meetings, conclusions and documentation.Originality/valueThe proposed model can both address the current issues contained in brainstorming (e.g. low-quality discussions and production blocking) and improve the overall effectiveness of fraud detection.


Sign in / Sign up

Export Citation Format

Share Document