Effects of numbers of observations and predictors for various model types on the performance of forest inventory with airborne laser scanning
Nonparametric models are popular in area-based approach (ABA) using airborne laser scanning. It is unclear, however, what are the number of predictors and the number of observations needed for different modeling approaches to provide accurate predictions without overfitting. This work aims to determine these limits for various approaches: ordinary least squares regression (OLS), generalized additive models (GAM), least absolute shrinkage and selection operator (LASSO), random forest (RF), support vector machine (SVM), and Gaussian process regression (GPR). We modeled timber volume (m³ ha-1) using ABA with 2–39 predictors and 20–500 training plots. OLS, GAM, LASSO, and SVM overfitted as the number of predictors approached the number of training plots. They required ≥15 plots per predictor to provide accurate predictions (RMSE ≤30%). GAM required ≥250 plots regardless of the number of predictors. The number of predictors hardly affected RF and GPR, but they required ≥200 and ≥250 training plots, respectively, to ensure accurate predictions. RF did not overfit in any circumstances, whereas GPR overfitted even with 500 training plots. Overall, increasing model predictors up to 39 did not necessarily result in overfitting and, in most models, it resulted in better accuracy as long as the training dataset was sufficiently large (≥250 plots).