Suitability of satellite-based hydro-climate variables and machine learning for streamflow modeling at various scale watersheds

Oceanic and coastal ecosystems have undergone complex environmental changes in recent years, amid a context of climate change. These changes are also reflected in the dynamics of water-borne diseases as some of the causative agents of these illnesses are ubiquitous in the aquatic environment and their survival rates are impacted by changes in climatic conditions. Previous studies have established strong relationships between essential climate variables and the coastal distribution and seasonal dynamics of the bacteria Vibrio cholerae, pathogenic types of which are responsible for human cholera disease. In this study we provide a novel exploration of the potential of a machine learning approach to forecast environmental cholera risk in coastal India, home to more than 200 million inhabitants, utilising atmospheric, terrestrial and oceanic satellite-derived essential climate variables. A Random Forest classifier model is developed, trained and tested on a cholera outbreak dataset over the period 2010–2018 for districts along coastal India. The random forest classifier model has an Accuracy of 0.99, an F1 Score of 0.942 and a Sensitivity score of 0.895, meaning that 89.5% of outbreaks are correctly identified. Spatio-temporal patterns emerged in terms of the model’s performance based on seasons and coastal locations. Further analysis of the specific contribution of each Essential Climate Variable to the model outputs shows that chlorophyll-a concentration, sea surface salinity and land surface temperature are the strongest predictors of the cholera outbreaks in the dataset used. The study reveals promising potential of the use of random forest classifiers and remotely-sensed essential climate variables for the development of environmental cholera-risk applications. Further exploration of the present random forest model and associated essential climate variables is encouraged on cholera surveillance datasets in other coastal areas affected by the disease to determine the model’s transferability potential and applicative value for cholera forecasting systems.

Download Full-text

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features, and tree-rings

10.21203/rs.3.rs-303081/v1 ◽

2021 ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Tree Rings ◽

Test Site ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Growing Seasons ◽

Extreme Gradient Boosting ◽

Streamflow Modeling

Abstract Monitoring temporal variation of streamflow is necessary for many water resources management plans, yet, such practices are constrained by the absence or paucity of data in many rivers around the world. Using a permanent river in the north of Iran as a test site, a machine learning framework was proposed to model the streamflow data in the three periods of growing seasons based on tree-rings and vessel features of the Zelkova carpinifolia species. First, full-disc samples were taken from 30 trees near the river, and the samples went through preprocessing, cross-dating, standardization, and time series analysis. Two machine learning algorithms, namely random forest (RF) and extreme gradient boosting (XGB), were used to model the relationships between dendrochronology variables (tree-rings and vessel features in the three periods of growing seasons) and the corresponding streamflow rates. The performance of each model was evaluated using statistical coefficients (coefficient of determination (R-squared), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (NRMSE)). Findings demonstrate that consideration should be given to the XGB model in streamflow modeling given its apparent enhanced performance (R-squared: 0.87; NSE: 0.81; and NRMSE: 0.43) over the RF model (R-squared: 0.82; NSE: 0.71; and NRMSE: 0.52). Further, the results showed that the models perform better in modeling the normal and low flows compared to extremely high flows. Finally, the tested models were used to reconstruct the temporal streamflow during the past decades (1970–1981).

Download Full-text

Modeling the Relationship between Rice Yield and Climate Variables Using Statistical and Machine Learning Techniques

Journal of Mathematics ◽

10.1155/2021/6646126 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Lasini Wickramasinghe ◽

Rukmal Weliwatta ◽

Piyal Ekanayake ◽

Jeevani Jayasinghe

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Rice Yield ◽

Machine Learning Techniques ◽

Support Vector ◽

Climatic Data ◽

Climate Variables ◽

Squared Error ◽

Learning Techniques ◽

The Relationship

This paper presents the application of a multiple number of statistical methods and machine learning techniques to model the relationship between rice yield and climate variables of a major region in Sri Lanka, which contributes significantly to the country’s paddy harvest. Rainfall, temperature (minimum and maximum), evaporation, average wind speed (morning and evening), and sunshine hours are the climatic factors considered for modeling. Rice harvest and yield data over the last three decades and monthly climatic data were used to develop the prediction model by applying artificial neural networks (ANNs), support vector machine regression (SVMR), multiple linear regression (MLR), Gaussian process regression (GPR), power regression (PR), and robust regression (RR). The performance of each model was assessed in terms of the mean squared error (MSE), correlation coefficient (R), mean absolute percentage error (MAPE), root mean squared error ratio (RSR), BIAS value, and the Nash number, and it was found that the GPR-based model is the most accurate among them. Climate data collected until early 2019 (Maha season of year 2018) were used to develop the model, and an independent validation was performed by applying data of the Yala season of year 2019. The developed model can be used to forecast the future rice yield with very high accuracy.

Download Full-text

Application of ensemble machine learning model in downscaling and projecting climate variables over different climate regions in Iran

Environmental Science and Pollution Research ◽

10.1007/s11356-021-16964-y ◽

2021 ◽

Author(s):

Seyed Babak Haji Seyed Asadollah ◽

Ahmad Sharafati ◽

Shamsuddin Shahid

Keyword(s):

Machine Learning ◽

Learning Model ◽

Climate Variables ◽

Ensemble Machine Learning ◽

Machine Learning Model

Download Full-text

Machine-learning based reconstructions of primary and secondary climate variables from North American and European fossil pollen data

Scientific Reports ◽

10.1038/s41598-019-52293-4 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 4

Author(s):

J. Sakari Salonen ◽

Mikko Korpela ◽

John W. Williams ◽

Miska Luoto

Keyword(s):

Machine Learning ◽

North American ◽

Cross Validation ◽

Climate Models ◽

Boosted Regression Trees ◽

Fossil Pollen ◽

Calibration Model ◽

Learning Approaches ◽

Climate Variables ◽

Pollen Data

Abstract We test several quantitative algorithms as palaeoclimate reconstruction tools for North American and European fossil pollen data, using both classical methods and newer machine-learning approaches based on regression tree ensembles and artificial neural networks. We focus on the reconstruction of secondary climate variables (here, January temperature and annual water balance), as their comparatively small ecological influence compared to the primary variable (July temperature) presents special challenges to palaeo-reconstructions. We test the pollen–climate models using a novel and comprehensive cross-validation approach, running a series of h-block cross-validations using h values of 100–1500 km. Our study illustrates major benefits of this variable h-block cross-validation scheme, as the effect of spatial autocorrelation is minimized, while the cross-validations with increasing h values can reveal instabilities in the calibration model and approximate challenges faced in palaeo-reconstructions with poor modern analogues. We achieve well-performing calibration models for both primary and secondary climate variables, with boosted regression trees providing the overall most robust performance, while the palaeoclimate reconstructions from fossil datasets show major independent features for the primary and secondary variables. Our results suggest that with careful variable selection and consideration of ecological processes, robust reconstruction of both primary and secondary climate variables is possible.

Download Full-text

A Machine Learning Approach to Non-uniform Spatial Downscaling of Climate Variables

2017 IEEE International Conference on Data Mining Workshops (ICDMW) ◽

10.1109/icdmw.2017.49 ◽

2017 ◽

Cited By ~ 2

Author(s):

Soukayna Mouatadid ◽

Steve Easterbrook ◽

Andre R. Erler

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Climate Variables ◽

Machine Learning Approach ◽

Spatial Downscaling

Download Full-text

Analyzing the Applicability of Random Forest-Based Models for the Forecast of Run-of-River Hydropower Generation

Clean Technologies ◽

10.3390/cleantechnol3040050 ◽

2021 ◽

Vol 3 (4) ◽

pp. 858-880

Author(s):

Valentina Sessa ◽

Edi Assoumou ◽

Mireille Bossy ◽

Sofia G. Simões

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Power Systems ◽

Machine Learning Techniques ◽

Climate Variables ◽

Climate Data ◽

Hydropower Generation ◽

The Impact ◽

Run Of River

Analyzing the impact of climate variables into the operational planning processes is essential for the robust implementation of a sustainable power system. This paper deals with the modeling of the run-of-river hydropower production based on climate variables on the European scale. A better understanding of future run-of-river generation patterns has important implications for power systems with increasing shares of solar and wind power. Run-of-river plants are less intermittent than solar or wind but also less dispatchable than dams with storage capacity. However, translating time series of climate data (precipitation and air temperature) into time series of run-of-river-based hydropower generation is not an easy task as it is necessary to capture the complex relationship between the availability of water and the generation of electricity. This task is also more complex when performed for a large interconnected area. In this work, a model is built for several European countries by using machine learning techniques. In particular, we compare the accuracy of models based on the Random Forest algorithm and show that a more accurate model is obtained when a finer spatial resolution of climate data is introduced. We then discuss the practical applicability of a machine learning model for the medium term forecasts and show that some very context specific but influential events are hard to capture.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text

Machine Learning for Speaker Recognition

10.1017/9781108552332 ◽

2020 ◽

Cited By ~ 2

Author(s):

Man-Wai Mak ◽

Jen-Tzung Chien

Keyword(s):

Machine Learning ◽

Speaker Recognition

Download Full-text