Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling

Abstract. Despite showing a great success of applications in many commercial fields, machine learning and data science models in general, show a limited use in scientific fields including hydrology. The approach is often criticized for lack of interpretability and physical consistency. This has led to the emergence of new paradigms, such as Theory Guided Data Science (TGDS) and physics informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles, in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on Genetic Programming (GP) namely Machine Learning Rainfall-Runoff Model Induction Toolkit (ML-RR-MI). ML-RR-MI is cable of developing fully-fledged lumped conceptual rainfall-runoff models for a watershed of interest using the building blocks of two flexible rainfall-runoff modelling frameworks (FUSE and SUPERFLEX). In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall-runoff models. This effort is motivated by the desire to address the decreasing meaningfulness of lumped models which tend to particularly deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties are significant. Henceforth, our machine learning approach for rainfall-runoff modelling titled Machine Induction Knowledge-Augmented System Hydrologique Asiatique (MIKA-SHA) captures spatial variabilities and automatically induces rainfall-runoff models for the catchment of interest without any subjectivity in model selection. Currently, MIKA-SHA learns models utilizing the model building components of FUSE and SUPERFLEX. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA’s model induction capabilities have been tested on the Red Creek catchment near Vestry, Mississippi, United States. The resulted model architectures through MIKA-SHA are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists.

Download Full-text

Hydrologically informed machine learning for rainfall–runoff modelling: towards distributed modelling

Hydrology and Earth System Sciences ◽

10.5194/hess-25-4373-2021 ◽

2021 ◽

Vol 25 (8) ◽

pp. 4373-4401

Author(s):

Herath Mudiyanselage Viraj Vidura Herath ◽

Jayashree Chadalawada ◽

Vladan Babovic

Keyword(s):

Machine Learning ◽

River Basin ◽

Data Science ◽

Model Building ◽

Building Blocks ◽

Great Success ◽

Rainfall Runoff ◽

Lumped Models ◽

Building Components ◽

Runoff Dynamics

Abstract. Despite showing great success of applications in many commercial fields, machine learning and data science models generally show limited success in many scientific fields, including hydrology (Karpatne et al., 2017). The approach is often criticized for its lack of interpretability and physical consistency. This has led to the emergence of new modelling paradigms, such as theory-guided data science (TGDS) and physics-informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on genetic programming (GP), namely the Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI) toolkit. ML-RR-MI is capable of developing fully fledged lumped conceptual rainfall–runoff models for a watershed of interest using the building blocks of two flexible rainfall–runoff modelling frameworks. In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall–runoff models. The meaningfulness and reliability of hydrological inferences gained from lumped models may tend to deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties is significant. This was the motivation behind developing our machine learning approach for distributed rainfall–runoff modelling titled Machine Induction Knowledge Augmented – System Hydrologique Asiatique (MIKA-SHA). MIKA-SHA captures spatial variabilities and automatically induces rainfall–runoff models for the catchment of interest without any explicit user selections. Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA's model induction capabilities have been tested on the Rappahannock River basin near Fredericksburg, Virginia, USA. MIKA-SHA builds and tests many model configurations using the model building components of the two flexible modelling frameworks and quantitatively identifies the optimal model for the watershed of concern. In this study, MIKA-SHA is utilized to identify two optimal models (one from each flexible modelling framework) to capture the runoff dynamics of the Rappahannock River basin. Both optimal models achieve high-efficiency values in hydrograph predictions (both at catchment and subcatchment outlets) and good visual matches with the observed runoff response of the catchment. Furthermore, the resulting model architectures are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists. MIKA-SHA-induced semi-distributed model performances were compared against existing lumped model performances for the same basin. MIKA-SHA-induced optimal models outperform the lumped models used in this study in terms of efficiency values while benefitting hydrologists with more meaningful hydrological inferences about the runoff dynamics of the Rappahannock River basin.

Download Full-text

Genetic programming for hydrological applications: to model or forecast that is the question

Journal of Hydroinformatics ◽

10.2166/hydro.2021.179 ◽

2021 ◽

Author(s):

Herath Mudiyanselage Viraj Vidura Herath ◽

Jayashree Chadalawada ◽

Vladan Babovic

Keyword(s):

Water Resources ◽

Genetic Programming ◽

Data Science ◽

Model Building ◽

Building Blocks ◽

Rainfall Runoff ◽

New Paradigm ◽

Distributed Modelling ◽

Rainfall Runoff Model ◽

Runoff Model

Abstract Genetic programming (GP) is a widely used machine learning (ML) algorithm that has been applied in water resources science and engineering since its conception in the early 1990s. However, similar to other ML applications, the GP algorithm is often used as a data fitting tool rather than as a model building instrument. We find this a gross underutilization of the GP capabilities. The most unique and distinct feature of GP that makes it distinctly different from the rest of ML techniques is its capability to produce explicit mathematical relationships between input and output variables. In the context of theory-guided data science (TGDS) which recently emerged as a new paradigm in ML with the main goal of blending the existing body of knowledge with ML techniques to induce physically sound models. Hence, TGDS has evolved into a popular data science paradigm, especially in scientific disciplines including water resources. Following these ideas, in our prior work, we developed two hydrologically informed rainfall-runoff model induction toolkits for lumped modelling and distributed modelling based on GP. In the current work, the two toolkits are applied using a different hydrological model building library. Here, the model building blocks are derived from the Sugawara TANK model template which represents the elements of hydrological knowledge. Results are compared against the traditional GP approach and suggest that GP as a rainfall-runoff model induction toolkit preserves the prediction power of the traditional GP short-term forecasting approach while benefiting to better understand the catchment runoff dynamics through the readily interpretable induced models.

Download Full-text

Physics Informed Machine Learning of Rainfall-Runoff Processes

10.5194/egusphere-egu2020-12303 ◽

2020 ◽

Author(s):

Vladan Babovic ◽

Jayashree Chadalawada ◽

Herath Mudiyanselage Viraj Vidura Herath

Keyword(s):

Machine Learning ◽

Model Building ◽

Hydrological Model ◽

Building Blocks ◽

Data Driven ◽

Machine Learning Techniques ◽

Rainfall Runoff ◽

Learning Techniques ◽

Highly Nonlinear ◽

Wide Range

Modelling of rainfall-runoff phenomenon continues to be a challenging task at hand of hydrologists as the underlying processes are highly nonlinear, dynamic and interdependent. Numerous modelling strategies like empirical, conceptual, physically based, data driven, are used to develop rainfall-runoff models as no model type can be considered to be universally pertinent for a wide range of problems. Latest literature review emphasizes that the crucial step of hydrological model selection is often subjective and is based on legacy. As the research outcome depends on model choice, there is a necessity to automate the process of model evolution, evaluation and selection based on research objectives, temporal and spatial characteristics of available data and catchment properties. Therefore, this study proposes a novel automated model building algorithm relying on machine learning technique Genetic Programming (GP).State of art GP applications in rainfall-runoff modelling as yet used the algorithm as a short-term forecasting tool which produces an expected future time series very much alike to neural networks application. Such simplistic applications of data driven black-box machine learning techniques may lead to development of accurate yet meaningless models which do not satisfy basic hydrological insights and may have severe difficulties with interpretation. Concurrently, it should be admitted that there is a vast amount of knowledge and understanding of physical processes that should not just be thrown away. Thus, we strongly believe that the most suitable way forward is to couple the already existing body of knowledge with machine learning techniques in a guided manner to enhance the meaningfulness and interpretability of the induced models.In this suggested algorithm the domain knowledge is introduced through the incorporation of process knowledge by adding model building blocks from prevailing rainfall-runoff modelling frameworks into the GP function set. Presently, the function set library consists with Sugawara TANK model functions, generic components of two flexible rainfall-runoff modelling frameworks (FUSE and SUPERFLEX) and model equations of 46 existing hydrological models (MARRMoT). Nevertheless, perhaps more importantly, the algorithm is readily integratable with any other internal coherence building blocks. This approach contrasts from rest of machine learning applications in rainfall-runoff modelling as it not only produces the runoff predictions but develops a physically meaningful hydrological model which helps the hydrologist to better understand the catchment dynamics. The proposed algorithm considers the model space and automatically identifies the appropriate model configurations for a catchment of interest by optimizing user-defined learning objectives in a multi-objective optimization framework. The model induction capabilities of the proposed algorithm have been evaluated on the Blackwater River basin, Alabama, United States. The model configurations evolved through the model-building algorithm are compatible with the fieldwork investigations and previously reported research findings.

Download Full-text

Review: Hydrologically Informed Machine Learning for Rainfall-Runoff Modelling: Towards Distributed Modelling

10.5194/hess-2020-487-rc2 ◽

2020 ◽

Author(s):

Anonymous

Keyword(s):

Machine Learning ◽

Rainfall Runoff ◽

Distributed Modelling

Download Full-text

HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a community

Hydrology and Earth System Sciences ◽

10.5194/hess-22-5639-2018 ◽

2018 ◽

Vol 22 (11) ◽

pp. 5639-5656 ◽

Cited By ~ 45

Author(s):

Chaopeng Shen ◽

Eric Laloy ◽

Amin Elshorbagy ◽

Adrian Albert ◽

Jerad Bales ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

Model Building ◽

Scientific Discovery ◽

Machine Learning Algorithms ◽

Data Limitations ◽

Industry Applications ◽

Versatile Tool ◽

Process Based Models

Abstract. Recently, deep learning (DL) has emerged as a revolutionary and versatile tool transforming industry applications and generating new and improved capabilities for scientific discovery and model building. The adoption of DL in hydrology has so far been gradual, but the field is now ripe for breakthroughs. This paper suggests that DL-based methods can open up a complementary avenue toward knowledge discovery in hydrologic sciences. In the new avenue, machine-learning algorithms present competing hypotheses that are consistent with data. Interrogative methods are then invoked to interpret DL models for scientists to further evaluate. However, hydrology presents many challenges for DL methods, such as data limitations, heterogeneity and co-evolution, and the general inexperience of the hydrologic field with DL. The roadmap toward DL-powered scientific advances will require the coordinated effort from a large community involving scientists and citizens. Integrating process-based models with DL models will help alleviate data limitations. The sharing of data and baseline models will improve the efficiency of the community as a whole. Open competitions could serve as the organizing events to greatly propel growth and nurture data science education in hydrology, which demands a grassroots collaboration. The area of hydrologic DL presents numerous research opportunities that could, in turn, stimulate advances in machine learning as well.

Download Full-text

Identifying and Harnessing the Building Blocks of Machine Learning Pipelines for Sensible Initialization of a Data Science Automation Tool

Genetic and Evolutionary Computation - Genetic Programming Theory and Practice XIV ◽

10.1007/978-3-319-97088-2_14 ◽

2018 ◽

pp. 211-223

Author(s):

Randal S. Olson ◽

Jason H. Moore

Keyword(s):

Machine Learning ◽

Data Science ◽

Building Blocks

Download Full-text

Automated Machine Learning Tool: The First Stop for Data Science and Statistical Model Building

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110253 ◽

2020 ◽

Vol 11 (2) ◽

Author(s):

DeepaRani Gopagoni ◽

P V

Keyword(s):

Machine Learning ◽

Statistical Model ◽

Data Science ◽

Model Building ◽

Learning Tool ◽

Automated Machine Learning ◽

Machine Learning Tool

Download Full-text

Probabilistic Machine Learning for Healthcare

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-092820-033938 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Irene Y. Chen ◽

Shalmali Joshi ◽

Marzyeh Ghassemi ◽

Rajesh Ranganath

Keyword(s):

Machine Learning ◽

Data Science ◽

Model Building ◽

Generative Models ◽

Annual Review ◽

Publication Date ◽

Biomedical Data ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

How to Land Modern Data Science in Petroleum Engineering

10.2118/205689-ms ◽

2021 ◽

Author(s):

Hongbao Zhang ◽

Yijin Zeng ◽

Lulu Liao ◽

Ruiyao Wang ◽

Xutian Hou ◽

...

Keyword(s):

Machine Learning ◽

Prior Knowledge ◽

Language Processing ◽

Data Science ◽

Parameters Optimization ◽

Data Driven ◽

Petroleum Engineering ◽

Great Success ◽

Well Production ◽

Data Product

Abstract Digitalization and intelligence are attracting increasing attention in petroleum engineering. Amounts of published research indicates modern data science has been applied in almost every corner of petroleum engineering where data generates, however, mature products are few or the performance are not up to peoples’ expectations. Despite the great success in other industries (internet, transportation, and finance, etc.), the "amazing" data science algorithms seem to be challenged when "landing" in petroleum engineering. It is time to calmly analyze current situations and discuss the methodology to apply modern data science in petroleum engineering, for safety ensuring, efficiency improvement and cost saving. Based on the experiences of several data products in petroleum engineering and wide investigation of literatures, the methodology is summarized by answering some important questions: what is the difference between petroleum engineering and other industries and what are the greatest challenges for algorithms "landing"? how could we build a data product development team? why the machine learning models didn't work well in real world, which are derived by typical procedures in textbooks? are current artificial intelligent algorithms perfect and is there any limit? how could we deal with the relationship between prior knowledge and data-driven methods? what is the key point to keep data product competitive? Several specific scenarios are introduced as examples, such as ROP modelling, drilling parameters optimization, text mining of drilling reports and well production prediction, etc. where deep learning, traditional machine learning, incremental learning and natural language processing methods, etc. are used. Besides detailed discussions in the paper, conclusions are summarized as: 1) the strengths and weakness of current artificial intelligence should be viewed objectively, practical suggestions to make up the weakness are provided; 2) the combination of prior knowledge (from lab tests or expert experiences) and data-driven methods are always necessary and methods for the combination are summarized; 3) data volume and solution portability are the key points to improve data product competitiveness; 4) suggestions on how to build a multi-disciplinary R&D team and how to plan a product are provided. This paper conducts an objective analysis on challenges for modern data science applying in petroleum engineering and provides a clear methodology and specific suggestions on how to improve the success rate of R&D projects which apply data science to solve problems in petroleum engineering.

Download Full-text

Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using DFT and Machine Learning

10.26434/chemrxiv.6254756.v1 ◽

2018 ◽

Author(s):

Sherif Tawfik ◽

Olexandr Isayev ◽

Catherine Stampfl ◽

Joseph Shapter ◽

David Winkler ◽

...

Keyword(s):

Machine Learning ◽

Band Gap ◽

Density Functional ◽

2D Materials ◽

Van Der Waals ◽

Building Blocks ◽

Machine Learning Techniques ◽

Interlayer Distance ◽

Computational Screening ◽

Wide Range

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.

Download Full-text