Data-driven parametrizations in numerical models using data assimilation and machine learning.

Can we build a machine learning parametrization in a numerical model using sparse and noisy observations?In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most of the cases, ML is trained by coarse-graining high-resolution simulations to provide a dense, unnoisy target state (or even the tendency of the model).Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data. Furthermore, we intentionally place ourselves in the realistic scenario of noisy and sparse observations.The algorithm proposed in this work derives from the algorithm presented by the same authors in https://arxiv.org/abs/2001.01520.The principle is to first apply data assimilation (DA) techniques to estimate the full state of the system from a non-parametrized model, referred hereafter as the physical model. The parametrization term to be estimated is viewed as a model error in the DA system. In a second step, ML is used to define the parametrization, e.g., a predictor of the model error given the state of the system. Finally, the ML system is incorporated within the physical model to produce a hybrid model, combining a physical core with a ML-based parametrization.The approach is applied to dynamical systems from low to intermediate complexity. The DA component of the proposed approach relies on an ensemble Kalman filter/smoother while the parametrization is represented by a convolutional neural network. &#160;We show that the hybrid model yields better performance than the physical model in terms of both short-term (forecast skill) and long-term (power spectrum, Lyapunov exponents) properties. Sensitivity to the noise and density of observation is also assessed.

Download Full-text

Combining data assimilation and machine learning to infer unresolved scale parametrization

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2020.0086 ◽

2021 ◽

Vol 379 (2194) ◽

pp. 20200086

Author(s):

Julien Brajard ◽

Alberto Carrassi ◽

Marc Bocquet ◽

Laurent Bertino

Keyword(s):

Machine Learning ◽

High Resolution ◽

Data Assimilation ◽

Hybrid Model ◽

Numerical Models ◽

Model Error ◽

Climate Modelling ◽

Combining Data ◽

Full State ◽

Direct Data

In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most cases, the ML training leverages high-resolution simulations to provide a dense, noiseless target state. Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data, in the realistic scenario of noisy and sparse observations. The algorithm proposed in this work is a two-step process. First, data assimilation (DA) techniques are applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as a model error in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrization model is added to the physical core truncated model to produce a hybrid model. The DA component of the proposed method relies on an ensemble Kalman filter while the ML parametrization is represented by a neural network. The approach is applied to the two-scale Lorenz model and to MAOOAM, a reduced-order coupled ocean-atmosphere model. We show that in both cases, the hybrid model yields forecasts with better skill than the truncated model. Moreover, the attractor of the system is significantly better represented by the hybrid model than by the truncated model. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.

Download Full-text

Data-driven components in a model of inner-shelf sorted bedforms: a new hybrid model

Earth Surface Dynamics ◽

10.5194/esurf-2-67-2014 ◽

2014 ◽

Vol 2 (1) ◽

pp. 67-82 ◽

Cited By ~ 8

Author(s):

E. B. Goldstein ◽

G. Coco ◽

A. B. Murray ◽

M. O. Green

Keyword(s):

Machine Learning ◽

Numerical Model ◽

Hybrid Model ◽

Learning Algorithm ◽

Numerical Models ◽

Data Driven ◽

Data Sets ◽

Inner Shelf ◽

Reference Concentration ◽

Deterministic Description

Abstract. Numerical models rely on the parameterization of processes that often lack a deterministic description. In this contribution we demonstrate the applicability of using machine learning, a class of optimization tools from the discipline of computer science, to develop parameterizations when extensive data sets exist. We develop a new predictor for near-bed suspended sediment reference concentration under unbroken waves using genetic programming, a machine learning technique. We demonstrate that this newly developed parameterization performs as well or better than existing empirical predictors, depending on the chosen error metric. We add this new predictor into an established model for inner-shelf sorted bedforms. Additionally we incorporate a previously reported machine-learning-derived predictor for oscillatory flow ripples into the sorted bedform model. This new "hybrid" sorted bedform model, whereby machine learning components are integrated into a numerical model, demonstrates a method of incorporating observational data (filtered through a machine learning algorithm) directly into a numerical model. Results suggest that the new hybrid model is able to capture dynamics previously absent from the model – specifically, two observed pattern modes of sorted bedforms. Lastly we discuss the challenge of integrating data-driven components into morphodynamic models and the future of hybrid modeling.

Download Full-text

Data driven components in a model of inner shelf sorted bedforms: a new hybrid model

Earth Surface Dynamics Discussions ◽

10.5194/esurfd-1-531-2013 ◽

2013 ◽

Vol 1 (1) ◽

pp. 531-569 ◽

Cited By ~ 2

Author(s):

E. B. Goldstein ◽

G. Coco ◽

A. B. Murray ◽

M. O. Green

Keyword(s):

Machine Learning ◽

Numerical Model ◽

Hybrid Model ◽

Learning Algorithm ◽

Numerical Models ◽

Data Driven ◽

Data Sets ◽

Inner Shelf ◽

Reference Concentration ◽

Deterministic Description

Abstract. Numerical models rely on the parameterization of processes that often lack a deterministic description. In this contribution we demonstrate the applicability of using machine learning, optimization tools from the discipline of computer science, to develop parameterizations when extensive data sets exist. We develop a new predictor for near bed suspended sediment reference concentration under unbroken waves using genetic programming, a machine learning technique. This newly developed parameterization performs better than existing empirical predictors. We add this new predictor into an established model for inner shelf sorted bedforms. Additionally we incorporate a previously reported machine learning derived predictor for oscillatory flow ripples into the sorted bedform model. This new "hybrid" sorted bedform model, whereby machine learning components are integrated into a numerical model, demonstrates a method of incorporating observational data (filtered through a machine learning algorithm) directly into a numerical model. Results suggest that the new hybrid model is able to capture dynamics previously absent from the model, specifically, the two observed pattern modes of sorted bedforms. However, caveats exist when data driven components do not have parity with traditional theoretical components of morphodynamic models, and we discuss the challenges of integrating these disparate pieces and the future of this type of modeling.

Download Full-text

Using machine learning to correct model error in data assimilation and forecast applications

Quarterly Journal of the Royal Meteorological Society ◽

10.1002/qj.4116 ◽

2021 ◽

Author(s):

Alban Farchi ◽

Patrick Laloyaux ◽

Massimo Bonavita ◽

Marc Bocquet

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Model Error ◽

Correct Model

Download Full-text

Combining Data Assimilation and Machine Learning to build data‐driven models for unknown long time dynamics –Applications in cardiovascular modeling

International Journal for Numerical Methods in Biomedical Engineering ◽

10.1002/cnm.3471 ◽

2021 ◽

Author(s):

Francesco Regazzoni ◽

Dominique Chapelle ◽

Philippe Moireau

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Data Driven ◽

Time Dynamics ◽

Cardiovascular Modeling ◽

Combining Data ◽

Long Time ◽

Long Time Dynamics

Download Full-text

Solar farm voltage anomaly detection using high-resolution μPMU data-driven unsupervised machine learning

Applied Energy ◽

10.1016/j.apenergy.2021.117656 ◽

2021 ◽

Vol 303 ◽

pp. 117656

Author(s):

Maitreyee Dey ◽

Soumya Prakash Rana ◽

Clarke V. Simmons ◽

Sandra Dudley

Keyword(s):

Machine Learning ◽

High Resolution ◽

Anomaly Detection ◽

Data Driven ◽

Unsupervised Machine Learning

Download Full-text

Combining Physics-Based and Data-Driven Models for Estimation of WOB During Ultra-Deep Ocean Drilling

Volume 8: Polar and Arctic Sciences and Technology; Petroleum Technology ◽

10.1115/omae2018-78229 ◽

2018 ◽

Author(s):

Tatsuya Kaneko ◽

Ryota Wada ◽

Masahiko Ozaki ◽

Tomoya Inoue

Keyword(s):

Hybrid Model ◽

Numerical Models ◽

Drill String ◽

Operating Conditions ◽

Data Driven ◽

High Frequency Oscillation ◽

Lumped Mass ◽

Unknown Parameters ◽

Mass Model ◽

Time Operation

Offshore drilling with drill string over 10,000m long has many technical challenges. Among them, the challenge to control the weight on bit (WOB) between a certain range is inevitable for the integrity of drill pipes and the efficiency of the drilling operation. Since WOB cannot be monitored directly during drilling, the tension at the top of the drill string is used as an indicator of the WOB. However, WOB and the surface measured tension are known to show different features. The deviation among the two is due to the dynamic longitudinal behavior of the drill string, which becomes stronger as the drill string gets longer and more elastic. One feature of the difference is related to the occurrence of high-frequency oscillation. We have analyzed the longitudinal behavior of drill string with lumped-mass model and captured the descriptive behavior of such phenomena. However, such physics-based models are not sufficient for real-time operation. There are many unknown parameters that need to be tuned to fit the actual operating conditions. In addition, the huge and complex drilling system will have non-linear behavior, especially near the drilling annulus. These features will only be captured in the data obtained during operation. The proposed hybrid model is a combination of physics-based models and data-driven models. The basic idea is to utilize data-driven techniques to integrate the obtained data during operation into the physics-based model. There are many options on how far we integrate the data-driven techniques to the physics-based model. For example, we have been successful in estimating the WOB from the surface measured tension and the displacement of the drill string top with only recurrent neural networks (RNNs), provided we have enough data of WOB. Lack of WOB measurement cannot be avoided, so the amount of data needs to be increased by utilizing results from physics-based numerical models. The aim of the research is to find a good combination of the two models. In this paper, we will discuss several hybrid model configurations and its performance.

Download Full-text

Data-driven and interpretable machine-learning modeling to explore the fine-scale environmental determinants of malaria vectors biting rates in rural Burkina Faso

10.1101/2021.04.13.439583 ◽

2021 ◽

Author(s):

Paul Taconet ◽

Angélique Porciani ◽

Dieudonné Diloma Soma ◽

Karine Mouline ◽

Frédéric Simard ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Burkina Faso ◽

Environmental Variables ◽

Data Driven ◽

Malaria Vectors ◽

Environmental Determinants ◽

Breeding Sites ◽

Landscape Variables ◽

Interpretable Machine Learning

AbstractBackgroundImproving the knowledge and understanding of the environmental determinants of malaria vectors abundances at fine spatiotemporal scales is essential to design locally tailored vector control intervention. This work aimed at exploring the environmental tenets of human-biting activity in the main malaria vectors (Anopheles gambiae s.s., Anopheles coluzzii and Anopheles funestus) in the health district of Diébougou, rural Burkina Faso.MethodsAnopheles human-biting activity was monitored in 27 villages during 15 months (in 2017-2018), and environmental variables (meteorological and landscape) were extracted from high resolution satellite imagery. A two-step data-driven modeling study was then carried-out. Correlation coefficients between the biting rates of each vector species and the environmental variables taken at various temporal lags and spatial distances from the biting events were first calculated. Then, multivariate machine-learning models were generated and interpreted to i) pinpoint primary and secondary environmental drivers of variation in the biting rates of each species and ii) identify complex associations between the environmental conditions and the biting rates.ResultsMeteorological and landscape variables were often significantly correlated with the vectors’ biting rates. Many nonlinear associations and thresholds were unveiled by the multivariate models, both for meteorological and landscape variables. From these results, several aspects of the bio-ecology of the main malaria vectors were precised or hypothesized for the Diébougou area, including breeding sites typologies, development and survival rates in relation to weather, flight ranges from breeding sites, dispersal related to landscape openness.ConclusionsUsing high resolution data in an interpretable machine-learning modeling framework proved to be an efficient way to enhance the knowledge of the complex links between the environment and the malaria vectors at a local scale. More broadly, the emerging field of interpretable machine-learning has significant potential to help improving our understanding of the complex processes leading to malaria transmission.

Download Full-text

Using machine learning to correct model error in data assimilation and forecast applications

10.5194/egusphere-egu21-4007 ◽

2021 ◽

Cited By ~ 2

Author(s):

Alban Farchi ◽

Patrick Laloyaux ◽

Massimo Bonavita ◽

Marc Bocquet

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Data Science ◽

Weather Prediction ◽

Realistic Model ◽

Model Error ◽

Underlying Assumption ◽

Correct Model ◽

Recent Developments ◽

Spatiotemporal Processes

Recent developments in machine learning (ML) have demonstrated impressive skills in reproducing complex spatiotemporal processes. However, contrary to data assimilation (DA), the underlying assumption behind ML methods is that the system is fully observed and without noise, which is rarely the case in numerical weather prediction. In order to circumvent this issue, it is possible to embed the ML problem into a DA formalism characterised by a cost function similar to that of the weak-constraint 4D-Var (Bocquet et al., 2019; Bocquet et al., 2020). In practice ML and DA are combined to solve the problem: DA is used to estimate the state of the system while ML is used to estimate the full model.&#160;In realistic systems, the model dynamics can be very complex and it may not be possible to reconstruct it from scratch. An alternative could be to learn the model error of an already existent model using the same approach combining DA and ML. In this presentation, we test the feasibility of this method using a quasi geostrophic (QG) model. After a brief description of the QG model model, we introduce a realistic model error to be learnt. We then asses the potential of ML methods to reconstruct this model error, first with perfect (full and noiseless) observation and then with sparse and noisy observations. We show in either case to what extent the trained ML models correct the mid-term forecasts. Finally, we show how the trained ML models can be used in a DA system and to what extent they correct the analysis.Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models, Nonlin. Processes Geophys., 26, 143&#8211;162, 2019Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2 (1), 55-80, 2020Farchi, A., Laloyaux, P., Bonavita, M., and Bocquet, M.: Using machine learning to correct model error in data assimilation and forecast applications, arxiv:2010.12605, submitted.&#160;

Download Full-text

A Comparison of Model Error Representations in Mesoscale Ensemble Data Assimilation

Monthly Weather Review ◽

10.1175/mwr-d-14-00395.1 ◽

2015 ◽

Vol 143 (10) ◽

pp. 3893-3911 ◽

Cited By ~ 11

Author(s):

Soyoung Ha ◽

Judith Berner ◽

Chris Snyder

Keyword(s):

Data Assimilation ◽

Numerical Models ◽

Wrf Model ◽

Forecast Model ◽

Model Error ◽

Ensemble Forecasts ◽

Ensemble Data Assimilation ◽

Ensemble Data ◽

Independent Observations ◽

Almost All

Abstract Mesoscale forecasts are strongly influenced by physical processes that are either poorly resolved or must be parameterized in numerical models. In part because of errors in these parameterizations, mesoscale ensemble data assimilation systems generally suffer from underdispersiveness, which can limit the quality of analyses. Two explicit representations of model error for mesoscale ensemble data assimilation are explored: a multiphysics ensemble in which each member’s forecast is based on a distinct suite of physical parameterization, and stochastic kinetic energy backscatter in which small noise terms are included in the forecast model equations. These two model error techniques are compared with a baseline experiment that includes spatially and temporally adaptive covariance inflation, in a domain over the continental United States using the Weather Research and Forecasting (WRF) Model for mesoscale ensemble forecasts and the Data Assimilation Research Testbed (DART) for the ensemble Kalman filter. Verification against independent observations and Rapid Update Cycle (RUC) 13-km analyses for the month of June 2008 showed that including the model error representation improved not only the analysis ensemble, but also short-range forecasts initialized from these analyses. Explicitly accounting for model uncertainty led to a better-tuned ensemble spread, a more skillful ensemble mean, and higher probabilistic scores, as well as significantly reducing the need for inflation. In particular, the stochastic backscatter scheme consistently outperformed both the multiphysics approach and the control run with adaptive inflation over almost all levels of the atmosphere both deterministically and probabilistically.

Download Full-text