scholarly journals Comparison of Stochastic and Machine Learning Methods for Multi-Step Ahead Forecasting of Hydrological Processes

Author(s):  
Georgia A. Papacharalampous ◽  
Hristos Tyralis ◽  
Demetris Koutsoyiannis

We perform an extensive comparison between 11 stochastic to 9 machine learning methods regarding their multi-step ahead forecasting properties by conducting 12 large-scale computational experiments. Each of these experiments uses 2 000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 110 values and the second time using time series of 310 values. Additionally, we conduct 92 real-world case studies using mean monthly time series of streamflow and particularly focus on one of them to reinforce the findings and highlight important facts. We quantify the performance of the methods using 18 metrics. The results indicate that the machine learning methods do not differ dramatically from the stochastic, while none of the methods under comparison is uniformly better or worse than the rest. However, there are methods that are regularly better or worse than others according to specific metrics.

Author(s):  
Georgia Papacharalampous ◽  
Hristos Tyralis ◽  
Demetris Koutsoyiannis

Research within the field of hydrology often focuses on comparing stochastic to machine learning (ML) forecasting methods. The comparisons performed are all based on case studies, while an extensive study aiming to provide generalized results on the subject is missing. Herein, we compare 11 stochastic and 9 ML methods regarding their multi-step ahead forecasting properties by conducting 12 large-scale computational experiments based on simulations. Each of these experiments uses 2 000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 100 values and the second time using time series of 300 values. Additionally, we conduct a real-world experiment using 405 mean annual river discharge time series of 100 values. We quantify the performance of the methods using 18 metrics. The results indicate that stochastic and ML methods perform equally well.


Author(s):  
Georgia Papacharalampous ◽  
Hristos Tyralis ◽  
Demetris Koutsoyiannis

Research within the field of hydrology often focuses on comparing stochastic to machine learning (ML) forecasting methods. The comparisons performed are all based on case studies, while an extensive study aiming to provide generalized results on the subject is missing. Herein, we compare 11 stochastic and 9 ML methods regarding their multi-step ahead forecasting properties by conducting 12 large-scale computational experiments based on simulations. Each of these experiments uses 2 000 time series generated by linear stationary stochastic processes. We conduct each simulation experiment twice; the first time using time series of 100 values and the second time using time series of 300 values. Additionally, we conduct a real-world experiment using 405 mean annual river discharge time series of 100 values. We quantify the performance of the methods using 18 metrics. The results indicate that stochastic and ML methods perform equally well.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.


2021 ◽  
Vol 13 (5) ◽  
pp. 974
Author(s):  
Lorena Alves Santos ◽  
Karine Ferreira ◽  
Michelle Picoli ◽  
Gilberto Camara ◽  
Raul Zurita-Milla ◽  
...  

The use of satellite image time series analysis and machine learning methods brings new opportunities and challenges for land use and cover changes (LUCC) mapping over large areas. One of these challenges is the need for samples that properly represent the high variability of land used and cover classes over large areas to train supervised machine learning methods and to produce accurate LUCC maps. This paper addresses this challenge and presents a method to identify spatiotemporal patterns in land use and cover samples to infer subclasses through the phenological and spectral information provided by satellite image time series. The proposed method uses self-organizing maps (SOMs) to reduce the data dimensionality creating primary clusters. From these primary clusters, it uses hierarchical clustering to create subclusters that recognize intra-class variability intrinsic to different regions and periods, mainly in large areas and multiple years. To show how the method works, we use MODIS image time series associated to samples of cropland and pasture classes over the Cerrado biome in Brazil. The results prove that the proposed method is suitable for identifying spatiotemporal patterns in land use and cover samples that can be used to infer subclasses, mainly for crop-types.


Water ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 1342 ◽  
Author(s):  
Yong Fan ◽  
Litang Hu ◽  
Hongliang Wang ◽  
Xin Liu

Pumping tests are very important means for investigating aquifer properties; however, interpreting the data using common analytical solutions become invalid in complex aquifer systems. The paper aims to explore the potential of machine learning methods in retrieving the pumping tests information in a field site in the Democratic Republic of Congo. A newly planned mining site with a pumping test of three pumping wells and 28 observation wells over one month was chosen to analyze the significance of machine learning methods in the pumping test analysis. Widely used machine learning methods, including correlation, cluster, time-series analysis, artificial neural network (ANN), support vector machine (SVR), random forest (RF) method, and linear regression, are all used in this study. Correlation and cluster analyses among wells provide visual pictures of possible hydraulic connections. The pathway with the best permeability ranges from the depth of 250 m to 350 m. Time-series analysis perfectly captured changes of drawdowns within the three pumping wells. The RF method is found to have the higher accuracy and the lower sensitivity to model parameters than ANN and SVR methods. The coupling of the linear regressive model and analytical solutions is applied to estimate hydraulic conductivities. The results found that ML methods can significantly and effectively improve our understanding of pumping tests by revealing inherent information hidden in those tests.


2021 ◽  
Author(s):  
Dhairya Vyas

In terms of Machine Learning, the majority of the data can be grouped into four categories: numerical data, category data, time-series data, and text. We use different classifiers for different data properties, such as the Supervised; Unsupervised; and Reinforcement. Each Categorises has classifier we have tested almost all machine learning methods and make analysis among them.


Sign in / Sign up

Export Citation Format

Share Document