Audit Opinion Prediction: A Comparison of Data Mining Techniques

Journal of Emerging Technologies in Accounting ◽

10.2308/jeta-19-10-02-40 ◽

2020 ◽

Author(s):

Ali Saeedi

Keyword(s):

Data Mining ◽

New York ◽

Prediction Models ◽

Stock Exchange ◽

Support Vector ◽

Type I ◽

K Nearest Neighbors ◽

Data Mining Techniques ◽

Financial Variables ◽

Audit Opinions

This study compares the ability of four data mining techniques in the prediction of audit opinions on companies' financial statements. The research data consists of 37,325 firm-year observations for companies listed on the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX), and the NASDAQ from 2001 to 2017. The dataset consists of U.S. companies' variousfinancial and non-financial variables. This study uses Decision Trees (DT), Support Vector Machines (SVM), K-Nearest Neighbors (k-NN), and Rough Sets (RS) to develop the prediction models. While all models developed by these four techniques predict the audit opinions with relatively high accuracy, the SVM models developed by RBF kernel demonstrate the highest performance in terms of overall prediction accuracy rates and Type I and Type II errors. The results indicate that all models developed using different algorithms demonstrate their highest performance in predicting going-concern modifications, ranging from 84.2 to 100 percent.

Download Full-text

Cardiovascular Disease Prediction System Using Extra Trees Classifier

10.21203/rs.2.14454/v1 ◽

2019 ◽

Author(s):

Rahman Shafique ◽

Arif Mehmood ◽

Saleem ullah ◽

Gyu Sang Choi

Keyword(s):

Data Mining ◽

Cardiovascular Disease ◽

Health Care ◽

Support Vector Machine ◽

Prediction Models ◽

Support Vector ◽

Prediction System ◽

Classification Techniques ◽

Data Mining Techniques ◽

Tree Classifier

Abstract Heart Disease as cardiovascular disease is the leading cause of death for both men and women. It is the major cause of morbidity and mortality in present society. Therefore, researchers are working to help health care professionals in diagnosing process by using data mining techniques. Although the health care industry is richer in the database this data is not properly mined in order to discover hidden patterns and can able to make decisions based on these patterns. The major goal of this learning refers the extraction of hidden layers by applying numerous data mining techniques that probably give remarkable results in order to ensure the presence of cardiovascular disease among peoples. Data mining classification techniques are used to discover these patterns for research in medical industry. The dataset containing 13 attributes has analyzed for prediction system. The dataset contains some commonly used medical terms like blood pressure, cholesterol level, chest pain and 11 other attributes used to predict cardiovascular disease. The most common and effective classification techniques that are used in mining process are Verdict Tree commonly known as Decision Tree, Extra Trees Classifier, Random Forest, Support Vector Machine, Naive Bays and Logistic Regression has analyzed in this paper. Diagnosing and controlling ratio of deaths from cardiovascular disease Extra classifier trees consider is the best approach. We evaluate these prediction models by using evaluation parameters which are Accuracy, Precision, Recall, and F1-score. As per our experimental results shows accuracy of Extra trees classifier, Logistic Model tree classifier, support vector machine, and naive bays classifiers are 90%, 88%, 87%, 86% respectively. So as per our experiment analysis Extra Tree classifier with highest accuracy considered best approach for predication cardiovascular disease.

Download Full-text

Data Mining Techniques for Identification and Classification of Various Diseases in Plants

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1110.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 676-680

Keyword(s):

Neural Network ◽

Data Mining ◽

Nearest Neighbors ◽

Crop Productivity ◽

Vital Role ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbors ◽

Data Mining Techniques

Data mining is currently being used in various applications; In research community it plays a vital role. This paper specify about data mining techniques for the preprocessing and classification of various disease in plants. Since various plants has different diseases based on that each of them has different data sets and different objectives for knowledge discovery. Data Mining Techniques applied on plants that it helps in segmentation and classification of diseased plants, it avoids Oral Inspection and helps to increase in crop productivity. This paper provides various classification techniques Such as K-Nearest Neighbors, Support Vector Machine, Principle component Analysis, Neural Network. Thus among various techniques neural network is effective for disease detection in plants.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

The Spinning Quality Control Management Based on Decision Making by Data Mining Techniques

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v7i1.25 ◽

2018 ◽

Vol 7 (1) ◽

pp. 72

Author(s):

Khalid AA Abakar ◽

Chongwen Yu

Keyword(s):

Data Mining ◽

Kernel Functions ◽

Support Vector ◽

Ann Model ◽

Data Mining Techniques ◽

Yarn Quality ◽

Yarn Properties ◽

Svm Model ◽

Rbf Kernel

This work demonstrated the possibility of using the data mining techniques such as artificial neural networks (ANN) and support vector machine (SVM) based model to predict the quality of the spinning yarn parameters. Three different kernel functions were used as SVM kernel functions which are Polynomial and Radial Basis Function (RBF) and Pearson VII Function-based Universal Kernel (PUK) and ANN model were used as data mining techniques to predict yarn properties. In this paper, it was found that the SVM model based on Person VII kernel function (PUK) have the same performance in prediction of spinning yarn quality in comparison with SVM based RBF kernel. The comparison with the ANN model showed that the two SVM models give a better prediction performance than an ANN model.

Download Full-text

A Quantitative Assessment of Pre-Operative MRI Reports in Glioma Patients: Report Metrics and IDH Prediction Ability

Frontiers in Oncology ◽

10.3389/fonc.2020.600327 ◽

2021 ◽

Vol 10 ◽

Author(s):

Hang Cao ◽

E. Zeynep Erson-Omay ◽

Murat Günel ◽

Jennifer Moliterno ◽

Robert K. Fulbright

Keyword(s):

High Performance ◽

Prediction Models ◽

Rank Correlation ◽

Support Vector ◽

Wild Type ◽

K Nearest Neighbors ◽

Prediction Ability ◽

Spearman’S Rank Correlation ◽

Negative Findings ◽

T1 Contrast

ObjectivesTo measure the metrics of glioma pre-operative MRI reports and build IDH prediction models.MethodsPre-operative MRI reports of 144 glioma patients in a single institution were collected retrospectively. Words were transformed to lowercase letters. White spaces, punctuations, and stop words were removed. Stemming was performed. A word cloud method applied to processed text matrix visualized language behavior. Spearman’s rank correlation assessed the correlation between the subjective descriptions of the enhancement pattern. The T1-contrast images associated with enhancement descriptions were selected. The keywords associated with IDH status were evaluated by χ2 value ranking. Random forest, k-nearest neighbors and Support Vector Machine algorithms were used to train models based on report features and age. All statistical analysis used two-tailed test with significance at p <.05.ResultsLonger word counts occurred in reports of older patients, higher grade gliomas, and wild type IDH gliomas. We identified 30 glioma enhancement descriptions, eight of which were commonly used: peripheral, heterogeneous, irregular, nodular, thick, rim, large, and ring. Five of eight patterns were correlated. IDH mutant tumors were characterized by words related to normal, symmetric or negative findings. IDH wild type tumors were characterized words by related to pathological MR findings like enhancement, necrosis and FLAIR foci. An integrated KNN model based on report features and age demonstrated high-performance (AUC: 0.89, 95% CI: 0.88–0.90).ConclusionReport length depended on age, glioma grade, and IDH status. Description of glioma enhancement was varied. Report descriptions differed for IDH wild and mutant gliomas. Report features can be used to predict glioma IDH status.

Download Full-text

THE EFFICIENCY OF ENSEMBLE CLASSIFIERS IN PREDICTING THE JOHANNESBURG STOCK EXCHANGE ALL-SHARE INDEX DIRECTION

Journal of Financial Management Markets and Institutions ◽

10.1142/s2282717x19500014 ◽

2019 ◽

Vol 07 (02) ◽

pp. 1950001

Author(s):

THABANG MOKOALELI-MOKOTELI ◽

SHAUN RAMSUMAR ◽

HIMA VADAPALLI

Keyword(s):

Logistic Regression ◽

Stock Market ◽

Prediction Models ◽

Stock Exchange ◽

Machine Learning Techniques ◽

Ensemble Prediction ◽

Support Vector ◽

Ensemble Classifiers ◽

Ensemble Models ◽

Johannesburg Stock Exchange

The success of investors in obtaining huge financial rewards from the stock market depends on their ability to predict the direction of the stock market index. The purpose of this study is to evaluate the efficacy of several ensemble prediction models (Boosted, RUS-Boosted, Subspace Disc, Bagged, and Subspace KNN) in predicting the daily direction of the Johannesburg Stock Exchange (JSE) All-Share index compared to other commonly used machine learning techniques including support vector machines (SVM), logistic regression and [Formula: see text]-nearest neighbor (KNN). The findings in this study show that, among all ensemble models, Boosted algorithm is the best performer followed by RUS-Boosted. When compared to the other techniques, ensemble technique (represented by Boosted) outperformed these techniques, followed by KNN, logistic regression and SVM, respectively. These findings suggest that investors should include ensemble models among the index prediction models if they want to make huge profits in the stock markets. However, not all investors can benefit from this as models may suffer from alpha decay as more and more investors use them, implying that the successful algorithms have limited shelf life.

Download Full-text

Application of Data Mining Techniques in Weather Forecasting

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch010 ◽

2019 ◽

pp. 162-174 ◽

Cited By ~ 1

Author(s):

ThippaReddy Gadekallu ◽

Bushra Kidwai ◽

Saksham Sharma ◽

Rishabh Pareek ◽

Sudheer Karnam

Keyword(s):

Data Mining ◽

Performance Metrics ◽

Weather Forecasting ◽

Meteorological Data ◽

Maximum Temperature ◽

K Nearest Neighbors ◽

Data Mining Techniques ◽

Use Of Data ◽

The City ◽

Case Data

Weather forecasting is a vital application in meteorology and has been one of the most scientifically and technologically challenging problems around the world in the last century. In this chapter, the authors investigate the use of data mining techniques in forecasting maximum temperature, rainfall, evaporation, and wind speed. This was carried out using artificial decision tree, naive Bayes, random forest, K-nearest neighbors (IBk) algorithms, and meteorological data collected between 2013 and 2014 from the city of Delhi. The performances of these algorithms were compared using standard performance metrics, and the algorithm which gave the best results used to generate classification rules for the mean weather variables. The results show that given enough case data, data mining techniques can be used for weather forecasting and climate change studies.

Download Full-text

Predicting the Insolvency of SMEs Using Technological Feasibility Assessment Information and Data Mining Techniques

Sustainability ◽

10.3390/su12239790 ◽

2020 ◽

Vol 12 (23) ◽

pp. 9790

Author(s):

Sanghoon Lee ◽

Keunho Choi ◽

Donghee Yoo

Keyword(s):

Data Mining ◽

Decision Tree ◽

Prediction Model ◽

Internal Control ◽

Prediction Models ◽

Influential Factors ◽

Financial Information ◽

Data Mining Techniques ◽

Feasibility Assessment ◽

Using Data

The government makes great efforts to maintain the soundness of policy funds raised by the national budget and lent to corporate. In general, previous research on the prediction of company insolvency has dealt with large and listed companies using financial information with conventional statistical techniques. However, small- and medium-sized enterprises (SMEs) do not have to undergo mandatory external audits, and the quality of accounting information is low due to weak internal control. To overcome this problem, we developed an insolvency prediction model for SMEs using data mining techniques and technological feasibility assessment information as non-financial information. We divided the dataset into two types of data based on three years of corporate age. The synthetic minority over-sampling technique (SMOTE) was used to solve the data imbalance that occurred at this time. Six insolvency prediction models were created using logistic regression, a decision tree, an artificial neural network, and an ensemble (i.e., boosting) of each algorithm. By applying a boosted decision tree, the best accuracies of 69.1% and 82.7% were derived, and by applying a decision tree, nine and seven influential factors affected the insolvency of SMEs established for fewer than three years and more than three years, respectively. In addition, we derived several insolvency rules for the two types of SMEs from the decision tree-based prediction model and proposed ways to enhance the health of loans given to potentially insolvent companies using these derived rules. The results of this study show that it is possible to predict SMEs’ insolvency using data mining techniques with technological feasibility assessment information and find meaningful rules related to insolvency.

Download Full-text

Analysis of flight delays in aviation system using different classification algorithms and feature selection methods

The Aeronautical Journal ◽

10.1017/aer.2019.72 ◽

2019 ◽

Vol 123 (1267) ◽

pp. 1415-1436 ◽

Cited By ~ 1

Author(s):

A. B. A. Anderson ◽

A. J. Sanjeev Kumar ◽

A. B. Arockia Christopher

Keyword(s):

Data Mining ◽

Feature Selection ◽

Classification Model ◽

System Level ◽

Support Vector ◽

Flight Delays ◽

Data Mining Techniques ◽

Mining Methods ◽

Artificial Neural Network Ann ◽

Aircraft System

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.

Download Full-text

Artificial Counselor System for Stock Investment

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019558 ◽

2019 ◽

Vol 33 ◽

pp. 9558-9564 ◽

Cited By ~ 1

Author(s):

Hadi NekoeiQachkanloo ◽

Benyamin Ghojogh ◽

Ali Saheb Pasand ◽

Mark Crowley

Keyword(s):

New York ◽

Stock Exchange ◽

Risk Tolerance ◽

Support Vector ◽

New York Stock Exchange ◽

Stock Investment ◽

York Stock Exchange ◽

Markowitz Portfolio ◽

Technical Features

This paper proposes a novel trading system which plays the role of an artificial counselor for stock investment. In this paper, the stock future prices (technical features) are predicted using Support Vector Regression. Thereafter, the predicted prices are used to recommend which portions of the budget an investor should invest in different existing stocks to have an optimum expected profit considering their level of risk tolerance. Two different methods are used for suggesting best portions, which are Markowitz portfolio theory and fuzzy investment counselor. The first approach is an optimization-based method which considers merely technical features, while the second approach is based on Fuzzy Logic taking into account both technical and fundamental features of the stock market. The experimental results on New York Stock Exchange (NYSE) show the effectiveness of the proposed system.

Download Full-text