scholarly journals CAN MACHINE LEARNING ALGORITHMS ASSOCIATED WITH TEXT MINING FROM INTERNET DATA IMPROVE HOUSING PRICE PREDICTION PERFORMANCE?

2020 ◽  
Vol 24 (5) ◽  
pp. 300-312
Author(s):  
Jian-qiang Guo ◽  
Shu-hen Chiang ◽  
Min Liu ◽  
Chi-Chun Yang ◽  
Kai-yi Guo

Housing frenzies in China have attracted widespread global attention over the past few years, but the key is how to more accurately forecast housing prices in order to establish an effective real estate policy. Based on the ubiquitousness and immediacy of Internet data, this research adopts a broader version of text mining to search for keywords in relation to housing prices and then evaluates the predictive abilities using machine learning algorithms. Our findings indicate that this new method, especially random forest, not only detects turning points, but also offers prediction ability that clearly outperforms traditional regression analysis. Overall, the prediction based on online search data through a machine learning mechanism helps us better understand the trends of house prices in China.

This paper demonstrates the utilization of machine learning algorithms in the prediction of housing selling prices on real dataset collected from the Petaling Jaya area, Selangor, Malaysia. To date, literature about research on machine learning prediction of housing selling price in Malaysia is scarce. This paper provides a brief review of the existing machine learning algorithms for the prediction problem and presents the characteristics of the collected datasets with different groups of feature selection. The findings indicate that using irrelevant features from the dataset can decrease the accuracy of the prediction models.


2021 ◽  
Author(s):  
MIGUEL ANGEL CORREA MANRIQUE ◽  
Omar Becerra Sierra ◽  
Daniel Otero Gomez ◽  
Henry Laniado ◽  
Rafael Mateus C ◽  
...  

It is a common practice to price a house without proper evaluation studies being performed for assurance. That is why the purpose of this study provide an explanatory model by establishing parameters for accuracy in interpretation and projection of housing prices. In addition, it is intentioned to establish proper data preprocessing practices in order to increase the accuracy of machine learning algorithms. Indeed, according to our literature review, there are few articles and reports on the use of Machine Learning tools for the prediction of property prices in Colombia. The dataset in which the research is built upon was provided by an existing real estate company. It contains near 940,000 items (housing advertisements) posted on the platform from the year 2018 to 2020. The database was enriched using statistical imputation techniques. Housing prices prediction was performed using Decision Tree Regressors and LightGBM methods, thus deriving in better alternatives for house price prediction in Colombia. Moreover, to measure the accuracy of the proposed models, the Root Mean Squared Logarithmic Error (RMSLE) statistical indicator was used. The best cross validation results obtained were 0.25354±0.00699 for the LightGBM, 0.25296 ±0.00511 for the Bagging Regressor, and 0.25312±0.00559 for the ExtraTree Regressor with Bagging Regressor, and it was not found a statistical difference between their performances.


2021 ◽  
Vol 1916 (1) ◽  
pp. 012042
Author(s):  
Ranjani Dhanapal ◽  
A AjanRaj ◽  
S Balavinayagapragathish ◽  
J Balaji

2021 ◽  
pp. 1-29
Author(s):  
Fikrewold H. Bitew ◽  
Corey S. Sparks ◽  
Samuel H. Nyarko

Abstract Objective: Child undernutrition is a global public health problem with serious implications. In this study, estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. Design: This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five machine learning algorithms including eXtreme gradient boosting (xgbTree), k-nearest neighbors (K-NN), random forest (RF), neural network (NNet), and the generalized linear models (GLM) were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. Setting: Households in Ethiopia. Participants: A total of 9,471 children below five years of age. Results: The descriptive results show substantial regional variations in child stunting, wasting, and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalized linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anemia history, child age greater than 30 months, small birth size, and maternal underweight, among others. Conclusions: The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security, and fertility regulation among others in the quest to considerably improve childhood nutrition in Ethiopia.


Sign in / Sign up

Export Citation Format

Share Document