scholarly journals Parsimonious statistical learning models for low-flow estimation

2022 ◽  
Vol 26 (1) ◽  
pp. 129-148
Author(s):  
Johannes Laimighofer ◽  
Michael Melcher ◽  
Gregor Laaha

Abstract. Statistical learning methods offer a promising approach for low-flow regionalization. We examine seven statistical learning models (Lasso, linear, and nonlinear-model-based boosting, sparse partial least squares, principal component regression, random forest, and support vector regression) for the prediction of winter and summer low flow based on a hydrologically diverse dataset of 260 catchments in Austria. In order to produce sparse models, we adapt the recursive feature elimination for variable preselection and propose using three different variable ranking methods (conditional forest, Lasso, and linear model-based boosting) for each of the prediction models. Results are evaluated for the low-flow characteristic Q95 (Pr(Q>Q95)=0.95) standardized by catchment area using a repeated nested cross-validation scheme. We found a generally high prediction accuracy for winter (RCV2 of 0.66 to 0.7) and summer (RCV2 of 0.83 to 0.86). The models perform similarly to or slightly better than a top-kriging model that constitutes the current benchmark for the study area. The best-performing models are support vector regression (winter) and nonlinear model-based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all the models with only a small loss of performance. The so-obtained learning models are more parsimonious and thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and nonlinear models reveals that nonlinear processes can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add nonlinear effects. When performing low-flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.

2021 ◽  
Author(s):  
Johannes Laimighofer ◽  
Michael Melcher ◽  
Gregor Laaha

Abstract. Statistical learning methods offer a promising approach for low flow regionalization. We examine seven statistical learning models (lasso, linear and non-linear model based boosting, sparse partial least squares, principal component regression, random forest, and support vector machine regression) for the prediction of winter and summer low flow based on a hydrological diverse dataset of 260 catchments in Austria. In order to produce sparse models we adapt the recursive feature elimination for variable preselection and propose to use three different variable ranking methods (conditional forest, lasso and linear model based boosting) for each of the prediction models. Results are evaluated for the low flow characteristic Q95 (Pr(Q>Q95) = 0.95) standardized by catchment area using a repeated nested cross validation scheme. We found a generally high prediction accuracy for winter (R2CV of 0.66 to 0.7) and summer (R2CV of 0.83 to 0.86). The models perform similar or slightly better than a Top-kriging model that constitutes the current benchmark for the study area. The best performing models are support vector machine regression (winter) and non-linear model based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all models with only a small loss of performance. The so obtained learning models are more parsimonious, thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and non-linear models reveals that non-linear relationships can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add non-liner effects. When performing low flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Fu-Qing Cui ◽  
Wei Zhang ◽  
Zhi-Yun Liu ◽  
Wei Wang ◽  
Jian-bing Chen ◽  
...  

The comprehensive understanding of the variation law of soil thermal conductivity is the prerequisite of design and construction of engineering applications in permafrost regions. Compared with the unfrozen soil, the specimen preparation and experimental procedures of frozen soil thermal conductivity testing are more complex and challengeable. In this work, considering for essentially multiphase and porous structural characteristic information reflection of unfrozen soil thermal conductivity, prediction models of frozen soil thermal conductivity using nonlinear regression and Support Vector Regression (SVR) methods have been developed. Thermal conductivity of multiple types of soil samples which are sampled from the Qinghai-Tibet Engineering Corridor (QTEC) are tested by the transient plane source (TPS) method. Correlations of thermal conductivity between unfrozen and frozen soil has been analyzed and recognized. Based on the measurement data of unfrozen soil thermal conductivity, the prediction models of frozen soil thermal conductivity for 7 typical soils in the QTEC are proposed. To further facilitate engineering applications, the prediction models of two soil categories (coarse and fine-grained soil) have also been proposed. The results demonstrate that, compared with nonideal prediction accuracy of using water content and dry density as the fitting parameter, the ternary fitting model has a higher thermal conductivity prediction accuracy for 7 types of frozen soils (more than 98% of the soil specimens’ relative error are within 20%). The SVR model can further improve the frozen soil thermal conductivity prediction accuracy and more than 98% of the soil specimens’ relative error are within 15%. For coarse and fine-grained soil categories, the above two models still have reliable prediction accuracy and determine coefficient (R2) ranges from 0.8 to 0.91, which validates the applicability for small sample soils. This study provides feasible prediction models for frozen soil thermal conductivity and guidelines of the thermal design and freeze-thaw damage prevention for engineering structures in cold regions.


2021 ◽  
Author(s):  
Dehe Xu ◽  
Qi Zhang ◽  
Yan Ding ◽  
De Zhang

Abstract Drought forecasting can effectively reduce the risk of drought. We proposed a hybrid model based on deep learning methods that integrates an autoregressive integrated moving average (ARIMA) model and a long short-term memory (LSTM) model to improve the accuracy of short-term drought prediction. Taking China as an example, this paper compares and analyzes the prediction accuracy of six drought prediction models, ARIMA, support vector regression (SVR), LSTM, ARIMA-SVR, least square-SVR (LS-SVR) and ARIMA-LSTM, for SPEI. The performance of all the models was compared using measures of persistence, such as the Nash-Sutcliffe efficiency (NSE) and so on. The results show that all three hybrid models (ARIMA-SVR, LS-SVR and ARIMA-LSTM) had higher prediction accuracy than the single model. (ARIMA, SVR and LSTM), for a given lead time, at different scales. the NSEs of the hybrid ARIMA-SVR, LS-SVR and ARIMA-LSTM models for the predicted SPEI1 are 0.043,0.168 and 0.368, respectively, and the NSEs of SPEI24 is 0.781, 0.543 and 0.93, respectively. This finding indicates that when the lead time remains unchanged, the prediction accuracy of the hybrid ARIMA-SVR, LS-SVR and ARIMA-LSTM models for the SPEI at various scales is gradually improved with increasing time scale, and the prediction accuracy of the model with a one-month lead time is higher than that of the model with a two-month lead time. In addition, the ARIMA-LSTM model has the highest prediction accuracy at the 6-, 12-, and 24-month scales, indicating that the model is more suitable for the forecasting of long-term drought in China.


2019 ◽  
Vol 9 (15) ◽  
pp. 2983 ◽  
Author(s):  
Jiao Liu ◽  
Guoyou Shi ◽  
Kaige Zhu

There are difficulties in obtaining accurate modeling of ship trajectories with traditional prediction methods. For example, neural networks are prone to falling into local optima and there are a small number of Automatic Identification System (AIS) information samples regarding target ships acquired in real time at sea. In order to improve the accuracy of ship trajectory predictions and solve these problems, a trajectory prediction model based on support vector regression (SVR) is proposed. Ship speed, course, time stamp, longitude and latitude from AIS data were selected as sample features and the wavelet threshold de-noising method was used to process the ship position data. The adaptive chaos differential evolution (ACDE) algorithm was used to optimize the internal model parameters to improve convergence speed and prediction accuracy. AIS sensor data corresponding to a certain section of the Tianjin Port ships were selected, on which SVR, Recurrent Neural Network (RNN) and Back Propagation (BP) neural network model trajectory prediction simulations were carried out. A comparison of the results shows that the trajectory prediction model based on ACDE-SVR has higher and more stable prediction accuracy, requires less time and is simple, feasible and efficient.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1166
Author(s):  
Bashir Musa ◽  
Nasser Yimen ◽  
Sani Isah Abba ◽  
Humphrey Hugh Adun ◽  
Mustafa Dagbasi

The prediction accuracy of support vector regression (SVR) is highly influenced by a kernel function. However, its performance suffers on large datasets, and this could be attributed to the computational limitations of kernel learning. To tackle this problem, this paper combines SVR with the emerging Harris hawks optimization (HHO) and particle swarm optimization (PSO) algorithms to form two hybrid SVR algorithms, SVR-HHO and SVR-PSO. Both the two proposed algorithms and traditional SVR were applied to load forecasting in four different states of Nigeria. The correlation coefficient (R), coefficient of determination (R2), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE) were used as indicators to evaluate the prediction accuracy of the algorithms. The results reveal that there is an increase in performance for both SVR-HHO and SVR-PSO over traditional SVR. SVR-HHO has the highest R2 values of 0.9951, 0.8963, 0.9951, and 0.9313, the lowest MSE values of 0.0002, 0.0070, 0.0002, and 0.0080, and the lowest MAPE values of 0.1311, 0.1452, 0.0599, and 0.1817, respectively, for Kano, Abuja, Niger, and Lagos State. The results of SVR-HHO also prove more advantageous over SVR-PSO in all the states concerning load forecasting skills. This paper also designed a hybrid renewable energy system (HRES) that consists of solar photovoltaic (PV) panels, wind turbines, and batteries. As inputs, the system used solar radiation, temperature, wind speed, and the predicted load demands by SVR-HHO in all the states. The system was optimized by using the PSO algorithm to obtain the optimal configuration of the HRES that will satisfy all constraints at the minimum cost.


2018 ◽  
Vol 23 (2) ◽  
pp. 923-934 ◽  
Author(s):  
Bibhuti Bhusan Sahoo ◽  
Ramakar Jha ◽  
Anshuman Singh ◽  
Deepak Kumar

2018 ◽  
Vol 11 (1) ◽  
pp. 64 ◽  
Author(s):  
Kyoung-jae Kim ◽  
Kichun Lee ◽  
Hyunchul Ahn

Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.


2013 ◽  
Vol 25 (5) ◽  
pp. 445-455 ◽  
Author(s):  
Fang Zong ◽  
Jia Hongfei ◽  
Pan Xiang ◽  
Wu Yang

This paper presents a model system to predict the time allocation in commuters’ daily activity-travel pattern. The departure time and the arrival time are estimated with Ordered Probit model and Support Vector Regression is introduced for travel time and activity duration prediction. Applied in a real-world time allocation prediction experiment, the model system shows a satisfactory level of prediction accuracy. This study provides useful insights into commuters’ activity-travel time allocation decision by identifying the important influences, and the results are readily applied to a wide range of transportation practice, such as travel information system, by providing reliable forecast for variations in travel demand over time. By introducing the Support Vector Regression, it also makes a methodological contribution in enhancing prediction accuracy of travel time and activity duration prediction.


Sign in / Sign up

Export Citation Format

Share Document