Identifying Qualified Auditors' Opinions: A Data Mining Approach

2007 ◽  
Vol 4 (1) ◽  
pp. 183-197 ◽  
Author(s):  
Efstathios Kirkos ◽  
Charalambos Spathis ◽  
Alexandros Nanopoulos ◽  
Yannis Manolopoulos

Data Mining methods can be used in order to facilitate auditors to issue their opinions. Numerous of these methods have not yet been tested on the purpose of discriminating cases of qualified opinions. In this study, we employ three Data Mining classification techniques to develop models capable of identifying qualified auditors' reports. The techniques used are C4.5 Decision Tree, Multilayer Perceptron Neural Network, and Bayesian Belief Network. The sample contains 450 publicly listed, nonfinancial U.K. and Irish firms. The input vector is composed of one qualitative and several quantitative variables. The three developed models are compared in terms of their performance. Additionally, variables that are associated with qualified reports and can be used as indicators are also revealed. The results of this study can be useful to internal and external auditors and company decision-makers.

Author(s):  
Mohammad Vahid Sebt ◽  
Elahe Komijani ◽  
Shiva S. Ghasemi

<p class="0abstract">Nowadays, the banking system is known as one of the inherent sectors of customer relationship management systems. Its main advantage is to redesign a more responsive organization to satisfy the customers. The banking system aims to improve the structure of organizations to provide a better customer service through a set of automated and integrated processes. The final goal is to collect and reprocess the personal information of customers. To handle this dilemma, a number of new techniques in data mining provide a powerful tool to explore customers’ information regarding a set of data and tools for customer relationship management. Accordingly, the customers’ classification and coordination of banking system are the main challenging issues of today's world. These reasons motivate the attempts of this study to apply a composition of neural network by considering the C4.5 decision tree and the k-closest neighbor method as a variant of core boosting methodology with maximal strategy. To validate the proposed solution approach, a case study of Ansar Bank in Iran is utilized. From the results, it is observed that the proposed method provides a competitive output with the rate of 95% for the customers’ classification. It also outperforms other existing methods with the rate of C4.5 decision tree, neural network, Naive Bayes and KNN with the rate of 1.04%. The main finding of this research is to propose an algorithm with the error rate of 1.9% and error squared of 0.72% as the best performance among other methods from the literature.<strong></strong></p>


2020 ◽  
Author(s):  
Vahid Farrahi ◽  
Maisa Niemelä ◽  
Mikko Kärmeniemi ◽  
Soile Puhakka ◽  
Maarit Kangas ◽  
...  

Abstract Purpose: A data mining approach was applied to establish a multilevel hierarchy predicting physical activity (PA) behavior, and to methodologically identify the correlates of PA behavior. Methods: Cross-sectional data from the population-based Northern Finland Birth Cohort 1966 study, collected in the most recent follow-up at age 46, were used to create a hierarchy using the chi-square automatic interaction detection (CHAID) decision tree technique for predicting PA behavior. PA behavior is defined as active or inactive depending on participants’ activity profiles, which were previously created through a multidimensional (clustering) approach on continuous accelerometer-measured activity intensities in one week. The input variables (predictors) used for decision tree fitting consisted of individual, demographical, psychological, behavioral, environmental, and physical factors. Using generalized linear mixed models, we also analyzed how factors emerging from the model were associated with three PA metrics, including daily time (minutes per day) in sedentary (SED), light PA (LPA), and moderate-to-vigorous PA (MVPA), to assure the relative importance of methodologically identified factors. Results: Of the 4,582 participants with valid accelerometer data at the latest follow-up, 2,701 and 1,881 had active and inactive profiles, respectively. We used a total of 168 factors as input variables to classify these two PA behaviors. Out of these 168 factors, the decision tree selected 36 factors of different domains from which 54 subgroups of participants were formed. The emerging factors from the model explained minutes per day in SED, LPA, and/or MVPA, including body fat percentage (SED: B=26.5, LPA: B=-16.1, and MVPA: B=-11.7), normalized heart rate recovery 60 seconds after exercise (SED: B=-16.1, LPA: B=9.9, and MVPA: B=9.6), average weekday total sitting time (SED: B=34.1, LPA: B=-25.3, and MVPA: B=-5.8), and extravagance score (SED: B=6.3 and LPA: B=-3.7). Conclusions: Using data mining, we established a data-driven model composed of 36 different factors of relative importance from empirical data. This model may be used to identify subgroups for multilevel intervention allocation and design. Additionally, this study methodologically discovered an extensive set of factors that can be a basis for additional hypothesis testing in PA correlates research.


2008 ◽  
Vol 07 (03) ◽  
pp. 209-217 ◽  
Author(s):  
S. Appavu Alias Balamurugan ◽  
G. Athiappan ◽  
M. Muthu Pandian ◽  
R. Rajaram

Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of suspicious emails during the past few years. This paper proposes to apply classification data mining for the task of suspicious email detection based on deception theory. In this paper, email data was classified using four different classifiers (Neural Network, SVM, Naïve Bayesian and Decision Tree). The experiment was performed using weka on the basis of different data size by which the suspicious emails are detected from the email corpus. Experimental results show that simple ID3 classifier which make a binary tree, will give a promising detection rates.


2018 ◽  
Vol 22 (3) ◽  
pp. 225-242 ◽  
Author(s):  
K. Mathan ◽  
Priyan Malarvizhi Kumar ◽  
Parthasarathy Panchatcharam ◽  
Gunasekaran Manogaran ◽  
R. Varadharajan

2021 ◽  
Vol 7 ◽  
pp. e424
Author(s):  
G Sekhar Reddy ◽  
Suneetha Chittineni

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.


Author(s):  
Moloud Abdar ◽  
Sharareh R. Niakan Kalhori ◽  
Tole Sutikno ◽  
Imam Much Ibnu Subroto ◽  
Goli Arji

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.


Sign in / Sign up

Export Citation Format

Share Document