Clustering Analysis with Embedding Vectors: An Application to  Real Estate Market Delineation

Changro Lee

doi:10.46604/aiti.2021.8492

Clustering Analysis with Embedding Vectors: An Application to Real Estate Market Delineation

Advances in Technology Innovation ◽

10.46604/aiti.2021.8492 ◽

2021 ◽

Vol 7 (1) ◽

pp. 30-40

Author(s):

Changro Lee

Keyword(s):

Real Estate ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Hedonic Pricing ◽

Real Estate Market ◽

Categorical Variables ◽

The Real Estate ◽

The Real Estate Market ◽

Gyeonggi Province ◽

Vector Representations

Although clustering analysis is a popular tool in unsupervised learning, it is inefficient for the datasets dominated by categorical variables, e.g., real estate datasets. To apply clustering analysis to real estate datasets, this study proposes an entity embedding approach that transforms categorical variables into vector representations. Three variants of a clustering algorithm, i.e., the clustering based on the traditional Euclidean distance, the Gower distance, and the embedding vectors, are applied to the land sales records to delineate the real estate market in Gwacheon-si, Gyeonggi province, South Korea. Then, the relevance of the resultant submarkets is evaluated using the root mean squared errors (RMSE) obtained from a hedonic pricing model. The results show that the RMSE in the embedding vector-based algorithm decreases substantially from 0.076-0.077 to 0.069. This study shows that the clustering algorithm empowered by embedding vectors outperforms the conventional algorithms, thereby enhancing the relevance of the delineated submarkets.

Download Full-text

A Machine Learning Approach to Delineating Neighborhoods from Geocoded Appraisal Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9070451 ◽

2020 ◽

Vol 9 (7) ◽

pp. 451

Author(s):

Rao Hamza Ali ◽

Josh Graves ◽

Stanley Wu ◽

Jenny Lee ◽

Erik Linstead

Keyword(s):

Real Estate ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Real Estate Market ◽

Spatial Filters ◽

Census Tracts ◽

The Real ◽

The Real Estate ◽

The Real Estate Market ◽

Machine Learning Approach

Identification of neighborhoods is an important, financially-driven topic in real estate. It is known that the real estate industry uses ZIP (postal) codes and Census tracts as a source of land demarcation to categorize properties with respect to their price. These demarcated boundaries are static and are inflexible to the shift in the real estate market and fail to represent its dynamics, such as in the case of an up-and-coming residential project. Delineated neighborhoods are also used in socioeconomic and demographic analyses where statistics are computed at a neighborhood level. Current practices of delineating neighborhoods have mostly ignored the information that can be extracted from property appraisals. This paper demonstrates the potential of using only the distance between subjects and their comparable properties, identified in an appraisal, to delineate neighborhoods that are composed of properties with similar prices and features. Using spatial filters, we first identify regions with the most appraisal activity, and through the application of a spatial clustering algorithm, generate neighborhoods composed of properties sharing similar characteristics. Through an application of bootstrapped linear regression, we find that delineating neighborhoods using geolocation of subjects and comparable properties explains more variation in a property’s features, such as valuation, square footage, and price per square foot, than ZIP codes or Census tracts. We also discuss the ability of the neighborhoods to grow and shrink over the years, due to shifts in each housing submarket.

Download Full-text