House-price Prediction Based on OLS Linear Regression and Random Forest

Author(s):  
YIGE WANG
Author(s):  
Akash Dagar and Shreya Kapoor

Machine learning plays a major role from past years in image detection, spam reorganization, normal speech command, product recommendation and medical diagnosis. Present machine learning algorithm helps us in enhancing security alerts, ensuring public safety and improve medical enhancements. Due to increase in urbanization, there is an increase in demand for renting houses and purchasing houses. Therefore, to determine a more effective way to calculate house price accurately is the need of the hour. So, an effort has been made to determine the most accurate way of predicting house price by using machine learning algorithms: Multivariable Linear Regression, Decision Tree Regression and Random Forest Regression and it is determined that Multivariable Linear Regression has showed most accuracy and less error.


The New York City Taxi & Limousine Commission’s (NYC TLC) Yellow cabs are facing increased competition from app-based car services such as Ola, Uber, Didi, Lyft and Grab which is rapidly eating away its revenue and market share. Research work: In response to this, the study proposes to do profitability profiling of the taxi trips to focus on various key aspects that generate more revenue in future, visualization to assess the departure and arrival counts of the trips in various locations based on time of the day to maintain demand and supply equilibrium and also build a dynamic price prediction model to balance both margins as well as conversion rates. Methodology/Techniques used: The NYC TLC yellow taxi trip data is analysed through a cross-industry standard process for data mining (CRISP-DM) methodology. Firstly, the taxi trips are grouped into two profitability segments according to the fare amount, trip duration and trip distance by applying K means clustering. Secondly, spatiotemporal data analysis is carried to assess the demand for taxi trips at various locations at various times of the day. Thirdly, multiple linear regression, decision tree, and random forest models are adopted for dynamic price prediction. The findings of the study are as follows, high profitable segments are characterized by airport pickup and drop trips, Count of trip arrivals to airports are more compared to departures from airports at any time of the day, and further analysis revealed that drivers making only a few numbers of airport trips can earn more revenue compared to making more number of trips in local destinations. Compared to multiple linear regression and decision tree, the random forest regression model is considered to be most reliable for dynamic pricing prediction with an accuracy of 91%. Application of research work: The practical implication of the study is the deployment of a dynamic pricing model that can increase the revenue of the NYC TLC cabs along with balancing margin and conversion rates.


Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 256
Author(s):  
Pengfei Han ◽  
Han Mei ◽  
Di Liu ◽  
Ning Zeng ◽  
Xiao Tang ◽  
...  

Pollutant gases, such as CO, NO2, O3, and SO2 affect human health, and low-cost sensors are an important complement to regulatory-grade instruments in pollutant monitoring. Previous studies focused on one or several species, while comprehensive assessments of multiple sensors remain limited. We conducted a 12-month field evaluation of four Alphasense sensors in Beijing and used single linear regression (SLR), multiple linear regression (MLR), random forest regressor (RFR), and neural network (long short-term memory (LSTM)) methods to calibrate and validate the measurements with nearby reference measurements from national monitoring stations. For performances, CO > O3 > NO2 > SO2 for the coefficient of determination (R2) and root mean square error (RMSE). The MLR did not increase the R2 after considering the temperature and relative humidity influences compared with the SLR (with R2 remaining at approximately 0.6 for O3 and 0.4 for NO2). However, the RFR and LSTM models significantly increased the O3, NO2, and SO2 performances, with the R2 increasing from 0.3–0.5 to >0.7 for O3 and NO2, and the RMSE decreasing from 20.4 to 13.2 ppb for NO2. For the SLR, there were relatively larger biases, while the LSTMs maintained a close mean relative bias of approximately zero (e.g., <5% for O3 and NO2), indicating that these sensors combined with the LSTMs are suitable for hot spot detection. We highlight that the performance of LSTM is better than that of random forest and linear methods. This study assessed four electrochemical air quality sensors and different calibration models, and the methodology and results can benefit assessments of other low-cost sensors.


2022 ◽  
Vol 14 (2) ◽  
pp. 279
Author(s):  
Qiong Wu ◽  
Zhaoyi Li ◽  
Changbao Yang ◽  
Hongqing Li ◽  
Liwei Gong ◽  
...  

Urbanization processes greatly change urban landscape patterns and the urban thermal environment. Significant multi-scale correlation exists between the land surface temperature (LST) and landscape pattern. Compared with traditional linear regression methods, the regression model based on random forest has the advantages of higher accuracy and better learning ability, and can remove the linear correlation between regression features. Taking Beijing’s metropolitan area as an example, this paper conducted multi-scale relationship analysis between 3D landscape patterns and LST using Pearson Correlation Coefficient (PCC), Multiple Linear Regression and Random Forest Regression (RFR). The results indicated that LST was relatively high in the central area of Beijing, and decreased from the center to the surrounding areas. The interpretation effect of 3D landscape metrics on LST was more obvious than that of the 2D landscape metrics, and 3D landscape diversity and evenness played more important roles than the other metrics in the change of LST. The multi-scale relationship between LST and the landscape pattern was discovered in the fourth ring road of Beijing, the effect of the extent of change on the landscape pattern is greater than that of the grain size change, and the interpretation effect and correlation of landscape metrics on LST increase with the increase in the rectangle size. Impervious surfaces significantly increased the LST, while the impervious surfaces located at low building areas were more likely to increase LST than those located at tall building areas. It seems that increasing the distance between buildings to improve the rate of energy exchange between urban and rural areas can effectively decrease LST. Vegetation and water can effectively reduce LST, but large, clustered and irregularly shaped patches have a better effect on land surface cooling than small and discrete patches. The Coefficients of Rectangle Variation (CORV) power function fitting results of landscape metrics showed that the optimal rectangle size for studying the relationship between the 3D landscape pattern and LST is about 700 m. Our study is useful for future urban planning and provides references to mitigate the daytime urban heat island (UHI) effect.


Sign in / Sign up

Export Citation Format

Share Document