scholarly journals A Machine Learning Web Application to Estimate Listing Prices of South African Homes

2020 ◽  
Vol 4 (2) ◽  
pp. 1-23
Author(s):  
Dane Bax ◽  
◽  
Temesgen Zewotir ◽  
Delia North ◽  

Due to the heterogeneous nature of residential properties, determining selling prices which will reconcile supply and demand is difficult. Establishing realistic listing prices is vitally important for sellers to prevent prolonged time on market. Sellers have several resources available to assist in this endeavour, all of which involve understanding current market dynamics through analysing recent sales and listing data. Property portals which aggregate real estate agencies’ data, hosting it on online platforms, are one such resource, along with individual real estate agencies. Leveraging this data to develop solutions that could aid sellers in listing price decision making is a potential business objective that could not only add value to sellers but create a competitive advantage by increasing traffic to an online real estate platform. Using data provided by a South African online property portal, this paper creates a web application using machine learning to estimate listing prices for different types of homes throughout South Africa. This study compared log linear and gradient boosted models, estimating residential listing prices over a four-year period. The results indicate that although log linear models are suitable to account for spatial dependency in the data through the inclusion of a fixed location effect, the assumption of linear functional form was not satisfied. The gradient boosted models do not impose explicit functional form requirements, making them flexible candidates. Similarly, these models were able to handle the spatial dependency adequately. The gradient boosted models also achieved a lower out of sample error compared to the log linear models. The findings show that over observation periodperiod, larger properties consistently experience a diminishing return at some point over the marginal distribution of physical characteristics. The web application details how sellers are easily able to obtain mean listing price estimates and gauge the growth thereof, by simply inputting their property interest criteria.

2018 ◽  
Vol 13 (2) ◽  
pp. 235-250 ◽  
Author(s):  
Yixuan Ma ◽  
Zhenji Zhang ◽  
Alexander Ihler ◽  
Baoxiang Pan

Boosted by the growing logistics industry and digital transformation, the sharing warehouse market is undergoing a rapid development. Both supply and demand sides in the warehouse rental business are faced with market perturbations brought by unprecedented peer competitions and information transparency. A key question faced by the participants is how to price warehouses in the open market. To understand the pricing mechanism, we built a real world warehouse dataset using data collected from the classified advertisements websites. Based on the dataset, we applied machine learning techniques to relate warehouse price with its relevant features, such as warehouse size, location and nearby real estate price. Four candidate models are used here: Linear Regression, Regression Tree, Random Forest Regression and Gradient Boosting Regression Trees. The case study in the Beijing area shows that warehouse rent is closely related to its location and land price. Models considering multiple factors have better skill in estimating warehouse rent, compared to singlefactor estimation. Additionally, tree models have better performance than the linear model, with the best model (Random Forest) achieving correlation coefficient of 0.57 in the test set. Deeper investigation of feature importance illustrates that distance from the city center plays the most important role in determining warehouse price in Beijing, followed by nearby real estate price and warehouse size.


Author(s):  
Kellyn F Arnold ◽  
Vinny Davies ◽  
Marc de Kamps ◽  
Peter W G Tennant ◽  
John Mbotwa ◽  
...  

Abstract Prediction and causal explanation are fundamentally distinct tasks of data analysis. In health applications, this difference can be understood in terms of the difference between prognosis (prediction) and prevention/treatment (causal explanation). Nevertheless, these two concepts are often conflated in practice. We use the framework of generalized linear models (GLMs) to illustrate that predictive and causal queries require distinct processes for their application and subsequent interpretation of results. In particular, we identify five primary ways in which GLMs for prediction differ from GLMs for causal inference: (i) the covariates that should be considered for inclusion in (and possibly exclusion from) the model; (ii) how a suitable set of covariates to include in the model is determined; (iii) which covariates are ultimately selected and what functional form (i.e. parameterization) they take; (iv) how the model is evaluated; and (v) how the model is interpreted. We outline some of the potential consequences of failing to acknowledge and respect these differences, and additionally consider the implications for machine learning (ML) methods. We then conclude with three recommendations that we hope will help ensure that both prediction and causal modelling are used appropriately and to greatest effect in health research.


1986 ◽  
Vol 16 (S1) ◽  
pp. S31-S43 ◽  
Author(s):  
Scott E. Harrington

AbstractEstimation of pure premiums for alternative rate classes using regression methods requires the choice of a functional form for the statistical model. Common choices include linear and log-linear models. This paper considers maximum likelihood estimation and testing for functional form using the power transformation suggested by Box and Cox. The linear and log-linear models are special cases of this transformation. Application of the procedure is illustrated using auto insurance claims data from the state of Massachusetts and from the United Kingdom. The predictive accuracy of the method compares favorably to that for the linear and log-linear models for both data sets.


2018 ◽  
Author(s):  
S. Nurick ◽  
L. Boyle ◽  
O. Allen ◽  
G. Morris ◽  
J. Potgieter
Keyword(s):  

2015 ◽  
Author(s):  
Jacob Andreas ◽  
Dan Klein
Keyword(s):  

Author(s):  
Navid Asadizanjani ◽  
Sachin Gattigowda ◽  
Mark Tehranipoor ◽  
Domenic Forte ◽  
Nathan Dunn

Abstract Counterfeiting is an increasing concern for businesses and governments as greater numbers of counterfeit integrated circuits (IC) infiltrate the global market. There is an ongoing effort in experimental and national labs inside the United States to detect and prevent such counterfeits in the most efficient time period. However, there is still a missing piece to automatically detect and properly keep record of detected counterfeit ICs. Here, we introduce a web application database that allows users to share previous examples of counterfeits through an online database and to obtain statistics regarding the prevalence of known defects. We also investigate automated techniques based on image processing and machine learning to detect different physical defects and to determine whether or not an IC is counterfeit.


2011 ◽  
Vol 368-373 ◽  
pp. 3078-3082
Author(s):  
Zhou Ji Meng ◽  
Tao Zhou ◽  
Shu Hua Gao

In the passage, the indicators of supply and demand of real estate market in Xi'an are established, and such indicators are synthesized into a class of synthetic indicators using “principal component analysis”. After the spectral analysis of synthetic indicators, periodic change of supply and demand of real estate through spectral density could be determined. Through the analysis, great randomness existed in supply and demand of real estate in Xi’an. Furthermore, in the medium term, a 3.3 years’ secondary cycle still existed in synthetic indicators of demand, while randomness existed in synthetic indicators of supply. Such findings suggest a declined trend existed in real estate price in medium term of Xi’an.


Sign in / Sign up

Export Citation Format

Share Document