scholarly journals Machine-Learning-Based Prediction of Land Prices in Seoul, South Korea

2021 ◽  
Vol 13 (23) ◽  
pp. 13088
Author(s):  
Jungsun Kim ◽  
Jaewoong Won ◽  
Hyeongsoon Kim ◽  
Joonghyeok Heo

The accurate estimation of real estate value helps the development of real estate policies that can respond to the complexities and instability of the real estate market. Previously, statistical methods were used to estimate real estate value, but machine learning methods have gained popularity because their predictions are more accurate. In contrast to existing studies that use various machine learning methods to estimate the transactions or list prices of real estate properties without separating the building and land prices, this study estimates land price using a large amount of land-use information obtained from various land- and building-related datasets. The random forest and XGBoost methods were used to estimate 52,900 land prices in Seoul, South Korea, from January 2017 to December 2020. The models were also separately trained for different land uses and different time periods. Overall, the results revealed that XGBoost yields a higher prediction accuracy. Whereas the XGBoost models were more accurate on the 2020 data than on the 2017–2020 data when analyzing residential areas, the random forest models were more accurate on the 2017–2020 data than on the 2020 data. Further analysis will extend the prediction model to consider submarkets determined by price volatility and locality.

Energies ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4595
Author(s):  
Parisa Asadi ◽  
Lauren E. Beckingham

X-ray CT imaging provides a 3D view of a sample and is a powerful tool for investigating the internal features of porous rock. Reliable phase segmentation in these images is highly necessary but, like any other digital rock imaging technique, is time-consuming, labor-intensive, and subjective. Combining 3D X-ray CT imaging with machine learning methods that can simultaneously consider several extracted features in addition to color attenuation, is a promising and powerful method for reliable phase segmentation. Machine learning-based phase segmentation of X-ray CT images enables faster data collection and interpretation than traditional methods. This study investigates the performance of several filtering techniques with three machine learning methods and a deep learning method to assess the potential for reliable feature extraction and pixel-level phase segmentation of X-ray CT images. Features were first extracted from images using well-known filters and from the second convolutional layer of the pre-trained VGG16 architecture. Then, K-means clustering, Random Forest, and Feed Forward Artificial Neural Network methods, as well as the modified U-Net model, were applied to the extracted input features. The models’ performances were then compared and contrasted to determine the influence of the machine learning method and input features on reliable phase segmentation. The results showed considering more dimensionality has promising results and all classification algorithms result in high accuracy ranging from 0.87 to 0.94. Feature-based Random Forest demonstrated the best performance among the machine learning models, with an accuracy of 0.88 for Mancos and 0.94 for Marcellus. The U-Net model with the linear combination of focal and dice loss also performed well with an accuracy of 0.91 and 0.93 for Mancos and Marcellus, respectively. In general, considering more features provided promising and reliable segmentation results that are valuable for analyzing the composition of dense samples, such as shales, which are significant unconventional reservoirs in oil recovery.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Animals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 771
Author(s):  
Toshiya Arakawa

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.


2020 ◽  
Author(s):  
Ki-Jin Ryu ◽  
Kyong Wook Yi ◽  
Yong Jin Kim ◽  
Jung Ho Shin ◽  
Jun Young Hur ◽  
...  

Abstract Background To analyze the determinants of women’s vasomotor symptoms (VMS) using machine learning. Methods Data came from Korea University Anam Hospital in Seoul, Korea, with 3298 women, aged 40–80 years, who attended their general health check from January 2010 to December 2012. Five machine learning methods were applied and compared for the prediction of VMS, measured by a Menopause Rating Scale. Variable importance, the effect of a variable on model performance, was used for identifying major determinants of VMS. Results In terms of the mean squared error, the random forest (0.9326) was much better than linear regression (12.4856) and artificial neural networks with one, two and three hidden layers (1.5576, 1.5184 and 1.5833, respectively). Based on variable importance from the random forest, the most important determinants of VMS were age, menopause age, thyroid stimulating hormone, monocyte and triglyceride, as well as gamma glutamyl transferase, blood urea nitrogen, cancer antigen 19 − 9, C-reactive protein and low-density-lipoprotein cholesterol. Indeed, the following determinants ranked within the top 20 in terms of variable importance: cancer antigen 125, total cholesterol, insulin, free thyroxine, forced vital capacity, alanine aminotransferase, forced expired volume in one second, height, homeostatic model assessment for insulin resistance and carcinoembryonic antigen. Conclusions Machine learning provides an invaluable decision support system for the prediction of VMS. For preventing VMS, preventive measures would be needed regarding the thyroid function, the lipid profile, the liver function, inflammation markers, insulin resistance, the monocyte, cancer antigens and the lung function.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


Energies ◽  
2019 ◽  
Vol 12 (1) ◽  
pp. 150 ◽  
Author(s):  
Feiyan Chen ◽  
Zhigao Zhou ◽  
Aiwen Lin ◽  
Jiqiang Niu ◽  
Wenmin Qin ◽  
...  

Accurate estimation of direct horizontal irradiance (DHI) is a prerequisite for the design and location of concentrated solar power thermal systems. Previous studies have shown that DHI observation stations are too sparsely distributed to meet requirements, as a result of the high construction and maintenance costs of observation platforms. Satellite retrieval and reanalysis have been widely used for estimating DHI, but their accuracy needs to be further improved. In addition, numerous modelling techniques have been used for this purpose worldwide. In this study, we apply five machine learning methods: back propagation neural networks (BP), general regression neural networks (GRNN), genetic algorithm (Genetic), M5 model tree (M5Tree), multivariate adaptive regression splines (MARS); and a physically based model, Yang’s hybrid model (YHM). Daily meteorological variables, including air temperature (T), relative humidity (RH), surface pressure (SP), and sunshine duration (SD) were obtained from 839 China Meteorological Administration (CMA) stations in different climatic zones across China and were used as data inputs for the six models. DHI observations at 16 CMA radiation stations were used to validate their accuracy. The results indicate that the capability of M5Tree was superior to BP, GRNN, Genetic, MARS and YHM, with the lowest values of daily root mean square (RMSE) of 1.989 MJ m−2day−1, and the highest correlation coefficient (R = 0.956), respectively. Then, monthly and annual mean DHI during 1960–2016 were calculated to reveal the spatiotemporal variation of DHI across China, using daily meteorological data based on the M5tree model. The results indicated a significantly decreasing trend with a rate of −0.019 MJ m−2during 1960–2016, and the monthly and annual DHI values of the Tibetan Plateau are the highest, while whereas the lowest values occur in the southeastern part of the Yunnan−Guizhou Plateau, the Sichuan Basin and most of the southern Yangtze River Basin. The possible causes for spatiotemporal variation of DHI across China were investigated by discussing cloud and aerosol loading.


Sign in / Sign up

Export Citation Format

Share Document