scholarly journals Application of selected data mining techniques in unintentional accounting error detection

Equilibrium ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. 185-201
Author(s):  
Mário Papík ◽  
Lenka Papíková

Research background: Even though unintentional accounting errors leading to financial restatements look like less serious distortion of publicly available information, it has been shown that financial restatements impacts on financial markets are similar to intentional fraudulent activities. Unintentional accounting errors leading to financial restatements then affect value of company shares in the short run which negatively impacts all shareholders. Purpose of the article: The aim of this manuscript is to predict unintentional accounting errors leading to financial restatements based on information from financial statements of companies. The manuscript analysis if financial statements include sufficient information which would allow detection of unintentional accounting errors. Methods: Method of classification and regression trees (decision tree) and random forest have been used in this manuscript to fulfill the aim of this manuscript. Data sample has consisted of 400 items from financial statements of 80 selected international companies. The results of developed prediction models have been compared and explained based on their accuracy, sensitivity, specificity, precision and F1 score. Statistical relationship among variables has been tested by correlation analysis. Differences between the group of companies with and without unintentional accounting error have been tested by means of Kruskal-Wallis test. Differences among the models have been tested by Levene and T-tests. Findings & value added: The results of the analysis have provided evidence that it is possible to detect unintentional accounting errors with high levels of accuracy based on financial ratios (rather than the Beneish variables) and by application of random forest method (rather than classification and regression tree method).

2019 ◽  
Vol 21 (1) ◽  
pp. 64-86
Author(s):  
Mário Papík ◽  
Lenka Papíková

The aim of manuscript is to analyze and identify determinants of honest accounting errors leading to financial restatements based on data from SEC database and from annual reports. Reason for this study is that accounting errors are expensive for companies that need to change already published financial statements and have impact on company reputation and stock price. Most of authors focus on prediction of accounting frauds and financial restatements remain in the background of research. This study initially tests existing accounting fraud detection model of Beneish on a sample of 40 financial restatement companies over 10 years and develops two new pioneer prediction models, one based on linear discriminant analysis (LDA) and another based on logistic regression. In testing dataset, LDA model has achieved accuracy 70.96%, specificity 25.00% and sensitivity 79.83% and logistic regression model has achieved accuracy 62.22%, specificity 41.66% and sensitivity 66.67%, performance of both models is better than existing Beneish model or other studies in this field. Developed models can be widely used by both internal and external users of financial statements, who would like to determine if financial statements of analyzed company include accounting errors or not, thanks to easily interpretable results in equation form.


Information ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 270 ◽  
Author(s):  
Mu-Ming Chen ◽  
Mu-Chen Chen

To reduce the damage caused by road accidents, researchers have applied different techniques to explore correlated factors and develop efficient prediction models. The main purpose of this study is to use one statistical and two nonparametric data mining techniques, namely, logistic regression (LR), classification and regression tree (CART), and random forest (RF), to compare their prediction capability, identify the significant variables (identified by LR) and important variables (identified by CART or RF) that are strongly correlated with road accident severity, and distinguish the variables that have significant positive influence on prediction performance. In this study, three prediction performance evaluation measures, accuracy, sensitivity and specificity, are used to find the best integrated method which consists of the most effective prediction model and the input variables that have higher positive influence on accuracy, sensitivity and specificity.


2021 ◽  
Vol 11 (5) ◽  
pp. 2235
Author(s):  
Haewon Byeon

It is essential to understand the voice characteristics in the normal aging process to accurately distinguish presbyphonia from neurological voice disorders. This study developed the best ensemble-based machine learning classifier that could distinguish hypokinetic dysarthria from presbyphonia using classification and regression tree (CART), random forest, gradient boosting algorithm (GBM), and XGBoost and compared the prediction performance of models. The subjects of this study were 76 elderly patients diagnosed with hypokinetic dysarthria and 174 patients with presbyopia. This study developed prediction models for distinguishing hypokinetic dysarthria from presbyphonia by using CART, GBM, XGBoost, and random forest and compared the accuracy, sensitivity, and specificity of the development models to identify the prediction performance of them. The results of this study showed that random forest had the best prediction performance when it was tested with the test dataset (accuracy = 0.83, sensitivity = 0.90, and specificity = 0.80, and area under the curve (AUC) = 0.85). The main predictors for detecting hypokinetic dysarthria were Cepstral peak prominence (CPP), jitter, shimmer, L/H ratio, L/H ratio_SD, CPP max (dB), CPP min (dB), and CPPF0 in the order of magnitude. Among them, CPP was the most important predictor for identifying hypokinetic dysarthria.


Sign in / Sign up

Export Citation Format

Share Document