Applying data mining algorithms to real estate appraisals: a comparative study

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Thiago Cesar de Oliveira ◽  
Lúcio de Medeiros ◽  
Daniel Henrique Marco Detzel

Purpose Real estate appraisals are becoming an increasingly important means of backing up financial operations based on the values of these kinds of assets. However, in very large databases, there is a reduction in the predictive capacity when traditional methods, such as multiple linear regression (MLR), are used. This paper aims to determine whether in these cases the application of data mining algorithms can achieve superior statistical results. First, real estate appraisal databases from five towns and cities in the State of Paraná, Brazil, were obtained from Caixa Econômica Federal bank. Design/methodology/approach After initial validations, additional databases were generated with both real, transformed and nominal values, in clean and raw data. Each was assisted by the application of a wide range of data mining algorithms (multilayer perceptron, support vector regression, K-star, M5Rules and random forest), either isolated or combined (regression by discretization – logistic, bagging and stacking), with the use of 10-fold cross-validation in Weka software. Findings The results showed more varied incremental statistical results with the use of algorithms than those obtained by MLR, especially when combined algorithms were used. The largest increments were obtained in databases with a large amount of data and in those where minor initial data cleaning was carried out. The paper also conducts a further analysis, including an algorithmic ranking based on the number of significant results obtained. Originality/value The authors did not find similar studies or research studies conducted in Brazil.

Author(s):  
Efat Jabarpour ◽  
Amin Abedini ◽  
Abbasali Keshtkar

Introduction: Osteoporosis is a disease that reduces bone density and loses the quality of bone microstructure leading to an increased risk of fractures. It is one of the major causes of inability and death in elderly people. The current study aims at determining the factors influencing the incidence of osteoporosis and providing a predictive model for the disease diagnosis to increase the diagnostic speed and reduce diagnostic costs. Methods: An Individual's data including personal information, lifestyle, and disease information were reviewed. A new model has been presented based on the Cross-Industry Standard Process CRISP methodology. Besides, Support Vector Machine (SVM) and Bayes methods (Tree Augmented Naïve Bayes (TAN)) and Clementine12 have been used as data mining tools. Results: Some features have been detected to affect this disease. The rules have been extracted that can be used as a pattern for the prediction of the patients' status. Classification precision was calculated to be 88.39% for SVM, and 91.29% for  (TAN) when the precision of  TAN  is higher comparing to other methods. Conclusion: The most effective factors concerning osteoporosis are detected and can be used for a new sample with defined characteristics to predict the possibility of osteoporosis in a person.  


2015 ◽  
Vol 813-814 ◽  
pp. 1104-1113 ◽  
Author(s):  
A. Sumesh ◽  
Dinu Thomas Thekkuden ◽  
Binoy B. Nair ◽  
K. Rameshkumar ◽  
K. Mohandas

The quality of weld depends upon welding parameters and exposed environment conditions. Improper selection of welding process parameter is one of the important reasons for the occurrence of weld defect. In this work, arc sound signals are captured during the welding of carbon steel plates. Statistical features of the sound signals are extracted during the welding process. Data mining algorithms such as Naive Bayes, Support Vector Machines and Neural Network were used to classify the weld conditions according to the features of the sound signal. Two weld conditions namely good weld and weld with defects namely lack of fusion, and burn through were considered in this study. Classification efficiencies of machine learning algorithms were compared. Neural network is found to be producing better classification efficiency comparing with other algorithms considered in this study.


Data mining can be considered to be an important aspects of information industry. Data mining has found a wide applicability in almost every field which deals with data. Out of the various techniques employed for data mining, Classification is a very commonly used tool for knowledge discovery. Various alternatives methods are available which can be used to create a classification model, out of which the most common and apprehensible one is KNN. In spite of KNN having a number of shortcomings and limitations in it, these can be overcome by with the help of alterations which can be made to the basic KNN algorithm. Due to its wide applicability, kNN has been the focus of extensive research and as a result, many alternatives have been performed with wide range of success in performance improvement. A major hardship being faced by the data mining applications is the large number of dimensions which render most of the data mining algorithms inefficient. The problem can be solved to some extent by using dimensionality reduction methods like PCA. Further improvements in the efficiency of the classification based mining algorithms can be achieved by using optimization methods. Meta-heuristic algorithms inspired by natural phenomenon like particle swarm optimization can be used very effectively for the purpose.


Author(s):  
Moloud Abdar ◽  
Sharareh R. Niakan Kalhori ◽  
Tole Sutikno ◽  
Imam Much Ibnu Subroto ◽  
Goli Arji

Heart diseases are among the nation’s leading couse of mortality and moribidity. Data mining teqniques can predict the likelihood of patients getting a heart disease. The purpose of this study is comparison of different data mining algorithm on prediction of heart diseases. This work applied and compared data mining techniques to predict the risk of heart diseases. After feature analysis, models by five algorithms including decision tree (C5.0), neural network, support vector machine (SVM), logistic regression and k-nearest neighborhood (KNN) were developed and validated. C5.0 Decision tree has been able to build a model with greatest accuracy 93.02%, KNN, SVM, Neural network have been 88.37%, 86.05% and 80.23% respectively. Produced results of decision tree can be simply interpretable and applicable; their rules can be understood easily by different clinical practitioner.


Author(s):  
M. Jupri ◽  
Riyanarto Sarno

The achievement of accepting optimal tax need effective and efficient tax supervision can be achieved by classifying taxpayer compliance to tax regulations. Considering this issue, this paper proposes the classification of taxpayer compliance using data mining algorithms; i.e. C4.5, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, and Multilayer Perceptron based on the compliance of taxpayer data. The taxpayer compliance can be classified into four classes, which are (1) formal and material compliant taxpayers, (2) formal compliant taxpayers, (3) material compliant taxpayers, and (4) formal and material non-compliant taxpayers. Furthermore, the results of data mining algorithms are compared by using Fuzzy AHP and TOPSIS to determine the best performance classification based on the criteria of Accuracy, F-Score, and Time required. Selection of the taxpayer's priority for more detailed supervision at each level of taxpayer compliance is ranked using Fuzzy AHP and TOPSIS based on criteria of dataset variables. The results show that C4.5 is the best performance classification and achieves preference value of 0.998; whereas the MLP algorithm results from the lowest preference value of 0.131. Alternative taxpayer A233 is the top priority taxpayer with a preference value of 0.433; whereas alternative taxpayer A051 is the lowest priority taxpayer with a preference value of 0.036.


2019 ◽  
Vol 16 (9) ◽  
pp. 3849-3853
Author(s):  
Dar Masroof Amin ◽  
Atul Garg

The globalisation of Internet is creating enormous amount of data on servers. The data created during last two years is itself equivalent to the data created during all these years. This exponential creation of data is due to the easy access to devices based on Internet of things. This information has become a source of predictive analysis for future happenings. The versatile use of computing devices is creating data of diverse nature and the analysts are predicting the future trend using data of their respective domain. The technology used to analyse the data has become a bottleneck over the time. The main reason behind this is that the rate with which the data is getting created is much more than the technology used to access the same. There are various mining techniques used to explore the useful information. In this research there is detailed analysis of how data is used and perceived by various data mining algorithms. Mining algorithms like Naïve Bayes, Support Vector Machines, Linear Discriminant Analysis Algorithm, Artificial Neural Networks, C4.5, C5.0, K-Nearest Neighbour are analysed. The input data used in these algorithms is big data files. This research mainly focuses on how the existing data algorithms are interacting with big data files. The research has been done on twitter comments.


2019 ◽  
Vol 36 (4) ◽  
pp. 299-313 ◽  
Author(s):  
Armelle Brun ◽  
Geoffray Bonnin ◽  
Sylvain Castagnos ◽  
Azim Roussanaly ◽  
Anne Boyer

Purpose The purpose of this paper is to present the METAL project, a French open learning analytics (LA) project for secondary school, that aims at improving the quality of teaching. The originality of METAL is that it relies on research through exploratory activities and focuses on all the aspects of a learning analytics environment. Design/methodology/approach This work introduces the different concerns of the project: collection and storage of multi-source data owned by a variety of stakeholders, selection and promotion of standards, design of an open-source LRS, conception of dashboards with their final users, trust, usability, design of explainable multi-source data-mining algorithms. Findings All the dimensions of METAL are presented, as well as the way they are approached: data sources, data storage, through the implementation of an LRS, design of dashboards for secondary school, based on co-design sessions data mining algorithms and experiments, in line with privacy and ethics concerns. Originality/value The issue of a global dissemination of LA at an institution level or at a broader level such as a territory or a study level is still a hot topic in the literature, and is one of the focus and originality of this paper, associated with the large spectrum of different concerns.


Water ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 2292 ◽  
Author(s):  
Vali Vakhshoori ◽  
Hamid Reza Pourghasemi ◽  
Mohammad Zare ◽  
Thomas Blaschke

The aim of this study was to apply data mining algorithms to produce a landslide susceptibility map of the national-scale catchment called Bandar Torkaman in northern Iran. As it was impossible to directly use the advanced data mining methods due to the volume of data at this scale, an intermediate approach, called normalized frequency-ratio unique condition units (NFUC), was devised to reduce the data volume. With the aid of this technique, different data mining algorithms such as fuzzy gamma (FG), binary logistic regression (BLR), backpropagation artificial neural network (BPANN), support vector machine (SVM), and C5 decision tree (C5DT) were employed. The success and prediction rates of the models, which were calculated by receiver operating characteristic curve, were 0.859 and 0.842 for FG, 0.887 and 0.855 for BLR, 0.893 and 0.856 for C5DT, 0.891 and 0.875 for SVM, and 0.896 and 0.872 for BPANN that showed the highest validation rates as compared with the other methods. The proposed approach of NFUC proved highly efficient in data volume reduction, and therefore the application of computationally demanding algorithms for large areas with voluminous data was feasible.


2009 ◽  
Vol 131 (3) ◽  
Author(s):  
Haiyang Zheng ◽  
Andrew Kusiak

In this paper, multivariate time series models were built to predict the power ramp rates of a wind farm. The power changes were predicted at 10 min intervals. Multivariate time series models were built with data-mining algorithms. Five different data-mining algorithms were tested using data collected at a wind farm. The support vector machine regression algorithm performed best out of the five algorithms studied in this research. It provided predictions of the power ramp rate for a time horizon of 10–60 min. The boosting tree algorithm selects parameters for enhancement of the prediction accuracy of the power ramp rate. The data used in this research originated at a wind farm of 100 turbines. The test results of multivariate time series models were presented in this paper. Suggestions for future research were provided.


2021 ◽  
Author(s):  
Mustafa Yağcı ◽  
Yusuf Ziya Olpak

Abstract This study proposes a new model to analyze the grade point averages (GPAs) in the previous semester using data mining algorithms and to predict the final GPAs that students may receive in the following semesters in three gradually expanding categories (department, faculty, and university). The performances of the Random Forest, Linear Regression, Support Vector Machines, and k-Nearest Neighbors algorithms, which are among the data mining algorithms, were calculated and compared to estimate the GPAs of the students at the end of the semester. This study focused on three parameters. The first was to predict academic performance with a single independent variable. The second was to compare the performance indicators of four algorithms. The third was to compare the predictions made in three different categories. All algorithms applied correctly classified the samples at rates varying between 92% and 94%. The proposed model correctly estimated students’ grade point averages at the end of the semester with an average deviation of 0.28 points over a 4 with a single variable. Students with a high risk of failure can be determined in advance by estimating their final grade point averages.


Sign in / Sign up

Export Citation Format

Share Document