Machine learning methods for short-term probability of default: A comparison of classification, regression and ranking methods

The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.

Download Full-text

Bootstrap Aggregating Approach to Short-Term Load Forecasting Using Meteorological Parameters for Demand Side Management in The North-Eastern Region of India

10.21203/rs.3.rs-610295/v1 ◽

2021 ◽

Author(s):

Dipu Sarkar ◽

Taliakum AO ◽

Sravan Kumar Gunturi

Keyword(s):

Machine Learning ◽

Power Plants ◽

Meteorological Parameters ◽

Load Forecasting ◽

Data Availability ◽

Production Costs ◽

Eastern Region ◽

Short Term ◽

Learning Methods ◽

Machine Learning Methods

Abstract Electricity is an essential commodity that must be generated in response to demand. Hydroelectric power plants, fossil fuels, nuclear energy, and wind energy are just a few examples of energy sources that significantly impact production costs. Accurate load forecasting for a specific region would allow for more efficient management, planning, and scheduling of low-cost generation units and ensuring on-time energy delivery for full monetary benefit. Machine learning methods are becoming more effective on power grids as data availability increases. Ensemble learning models are hybrid algorithms that combine various machine learning methods and intelligently incorporate them into a single predictive model to reduce uncertainty and bias. In this study, several ensemble methods were implemented and tested for short-term electric load forecasting. The suggested method is trained using the influential meteorological variables obtained through correlation analysis and the past load. We used real-time load data from Nagaland's load dispatch centre in India and meteorological parameters of the Nagaland region for data analysis. The synthetic minority over-sampling technique for regression (SMOTE-R) is also employed to avoid data imbalance issues. The experimental results show that the Bagging methods outperform other models with respect to mean squared error and mean absolute percentage error.

Download Full-text