Combining data assimilation and machine learning to infer unresolved scale parametrization

In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most cases, the ML training leverages high-resolution simulations to provide a dense, noiseless target state. Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data, in the realistic scenario of noisy and sparse observations. The algorithm proposed in this work is a two-step process. First, data assimilation (DA) techniques are applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as a model error in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrization model is added to the physical core truncated model to produce a hybrid model. The DA component of the proposed method relies on an ensemble Kalman filter while the ML parametrization is represented by a neural network. The approach is applied to the two-scale Lorenz model and to MAOOAM, a reduced-order coupled ocean-atmosphere model. We show that in both cases, the hybrid model yields forecasts with better skill than the truncated model. Moreover, the attractor of the system is significantly better represented by the hybrid model than by the truncated model. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.

Download Full-text

Data-driven parametrizations in numerical models using data assimilation and machine learning.

10.5194/egusphere-egu2020-13794 ◽

2020 ◽

Author(s):

Julien Brajard ◽

Alberto Carrassi ◽

Marc Bocquet ◽

Laurent Bertino

Keyword(s):

Machine Learning ◽

High Resolution ◽

Data Assimilation ◽

Physical Model ◽

Hybrid Model ◽

Numerical Models ◽

Coarse Graining ◽

Model Error ◽

Data Driven ◽

Model Combining

Can we build a machine learning parametrization in a numerical model using sparse and noisy observations?In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most of the cases, ML is trained by coarse-graining high-resolution simulations to provide a dense, unnoisy target state (or even the tendency of the model).Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data. Furthermore, we intentionally place ourselves in the realistic scenario of noisy and sparse observations.The algorithm proposed in this work derives from the algorithm presented by the same authors in https://arxiv.org/abs/2001.01520.The principle is to first apply data assimilation (DA) techniques to estimate the full state of the system from a non-parametrized model, referred hereafter as the physical model. The parametrization term to be estimated is viewed as a model error in the DA system. In a second step, ML is used to define the parametrization, e.g., a predictor of the model error given the state of the system. Finally, the ML system is incorporated within the physical model to produce a hybrid model, combining a physical core with a ML-based parametrization.The approach is applied to dynamical systems from low to intermediate complexity. The DA component of the proposed approach relies on an ensemble Kalman filter/smoother while the parametrization is represented by a convolutional neural network. &#160;We show that the hybrid model yields better performance than the physical model in terms of both short-term (forecast skill) and long-term (power spectrum, Lyapunov exponents) properties. Sensitivity to the noise and density of observation is also assessed.

Download Full-text

Using machine learning to correct model error in data assimilation and forecast applications

Quarterly Journal of the Royal Meteorological Society ◽

10.1002/qj.4116 ◽

2021 ◽

Author(s):

Alban Farchi ◽

Patrick Laloyaux ◽

Massimo Bonavita ◽

Marc Bocquet

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Model Error ◽

Correct Model

Download Full-text

Combining Data Assimilation and Machine Learning to build data‐driven models for unknown long time dynamics –Applications in cardiovascular modeling

International Journal for Numerical Methods in Biomedical Engineering ◽

10.1002/cnm.3471 ◽

2021 ◽

Author(s):

Francesco Regazzoni ◽

Dominique Chapelle ◽

Philippe Moireau

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Data Driven ◽

Time Dynamics ◽

Cardiovascular Modeling ◽

Combining Data ◽

Long Time ◽

Long Time Dynamics

Download Full-text

Using machine learning to correct model error in data assimilation and forecast applications

10.5194/egusphere-egu21-4007 ◽

2021 ◽

Cited By ~ 2

Author(s):

Alban Farchi ◽

Patrick Laloyaux ◽

Massimo Bonavita ◽

Marc Bocquet

Keyword(s):

Machine Learning ◽

Data Assimilation ◽

Data Science ◽

Weather Prediction ◽

Realistic Model ◽

Model Error ◽

Underlying Assumption ◽

Correct Model ◽

Recent Developments ◽

Spatiotemporal Processes

Recent developments in machine learning (ML) have demonstrated impressive skills in reproducing complex spatiotemporal processes. However, contrary to data assimilation (DA), the underlying assumption behind ML methods is that the system is fully observed and without noise, which is rarely the case in numerical weather prediction. In order to circumvent this issue, it is possible to embed the ML problem into a DA formalism characterised by a cost function similar to that of the weak-constraint 4D-Var (Bocquet et al., 2019; Bocquet et al., 2020). In practice ML and DA are combined to solve the problem: DA is used to estimate the state of the system while ML is used to estimate the full model.&#160;In realistic systems, the model dynamics can be very complex and it may not be possible to reconstruct it from scratch. An alternative could be to learn the model error of an already existent model using the same approach combining DA and ML. In this presentation, we test the feasibility of this method using a quasi geostrophic (QG) model. After a brief description of the QG model model, we introduce a realistic model error to be learnt. We then asses the potential of ML methods to reconstruct this model error, first with perfect (full and noiseless) observation and then with sparse and noisy observations. We show in either case to what extent the trained ML models correct the mid-term forecasts. Finally, we show how the trained ML models can be used in a DA system and to what extent they correct the analysis.Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models, Nonlin. Processes Geophys., 26, 143&#8211;162, 2019Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2 (1), 55-80, 2020Farchi, A., Laloyaux, P., Bonavita, M., and Bocquet, M.: Using machine learning to correct model error in data assimilation and forecast applications, arxiv:2010.12605, submitted.&#160;

Download Full-text

A Comparison of Model Error Representations in Mesoscale Ensemble Data Assimilation

Monthly Weather Review ◽

10.1175/mwr-d-14-00395.1 ◽

2015 ◽

Vol 143 (10) ◽

pp. 3893-3911 ◽

Cited By ~ 11

Author(s):

Soyoung Ha ◽

Judith Berner ◽

Chris Snyder

Keyword(s):

Data Assimilation ◽

Numerical Models ◽

Wrf Model ◽

Forecast Model ◽

Model Error ◽

Ensemble Forecasts ◽

Ensemble Data Assimilation ◽

Ensemble Data ◽

Independent Observations ◽

Almost All

Abstract Mesoscale forecasts are strongly influenced by physical processes that are either poorly resolved or must be parameterized in numerical models. In part because of errors in these parameterizations, mesoscale ensemble data assimilation systems generally suffer from underdispersiveness, which can limit the quality of analyses. Two explicit representations of model error for mesoscale ensemble data assimilation are explored: a multiphysics ensemble in which each member’s forecast is based on a distinct suite of physical parameterization, and stochastic kinetic energy backscatter in which small noise terms are included in the forecast model equations. These two model error techniques are compared with a baseline experiment that includes spatially and temporally adaptive covariance inflation, in a domain over the continental United States using the Weather Research and Forecasting (WRF) Model for mesoscale ensemble forecasts and the Data Assimilation Research Testbed (DART) for the ensemble Kalman filter. Verification against independent observations and Rapid Update Cycle (RUC) 13-km analyses for the month of June 2008 showed that including the model error representation improved not only the analysis ensemble, but also short-range forecasts initialized from these analyses. Explicitly accounting for model uncertainty led to a better-tuned ensemble spread, a more skillful ensemble mean, and higher probabilistic scores, as well as significantly reducing the need for inflation. In particular, the stochastic backscatter scheme consistently outperformed both the multiphysics approach and the control run with adaptive inflation over almost all levels of the atmosphere both deterministically and probabilistically.

Download Full-text

Air Quality Forecasts Improved by Combining Data Assimilation and Machine Learning with Satellite AOD

Geophysical Research Letters ◽

10.1029/2021gl096066 ◽

2021 ◽

Author(s):

Seunghee Lee ◽

Seohui Park ◽

Myong‐In Lee ◽

Ganghan Kim ◽

Jungho Im ◽

...

Keyword(s):

Machine Learning ◽

Air Quality ◽

Data Assimilation ◽

Combining Data

Download Full-text

Treatment of Observation Error due to Unresolved Scales in Atmospheric Data Assimilation

Monthly Weather Review ◽

10.1175/mwr3229.1 ◽

2006 ◽

Vol 134 (10) ◽

pp. 2900-2915 ◽

Cited By ~ 63

Author(s):

Tijana Janjić ◽

Stephen E. Cohn

Keyword(s):

Kalman Filter ◽

Data Assimilation ◽

Covariance Matrix ◽

Covariance Function ◽

Model Problem ◽

Numerical Models ◽

Observation Error ◽

State Dependent ◽

Full State ◽

Assimilation Process

Abstract Observations of the atmospheric state include scales of motion that are not resolved by numerical models into which the observed data are assimilated. The resulting observation error due to unresolved scales, part of the “representativeness error,” is state dependent and correlated in time. A mathematical formalism and algorithmic approach has been developed for treating this error in the data assimilation process, under an assumption that there is no model error. The approach is based on approximating the continuum Kalman filter in such a way as to maintain terms that account for the observation error due to unresolved scales. The two resulting approximate filters resemble the Schmidt–Kalman filter and the traditional discrete Kalman filter. The approach is tested for the model problem of a passive tracer undergoing advection in a shear flow on the sphere. The state contains infinitely many spherical harmonics, with a nonstationary spectrum, and the problem is to estimate the projection of this state onto a finite spherical harmonic expansion, using observations of the full state. Numerical experiments demonstrate that approximate filters work well for the model problem provided that the exact covariance function of the unresolved scales is known. The traditional filter is more convenient in practice since it requires only the covariance matrix obtained by evaluating this covariance function at the observation points. A method for modeling this covariance matrix in the traditional filter is successful for the model problem.

Download Full-text

A novel methodological approach for land subsidence prediction through data assimilation techniques

Computational Geosciences ◽

10.1007/s10596-021-10062-1 ◽

2021 ◽

Author(s):

Laura Gazzola ◽

Gazzola Ferronato ◽

Matteo Frigo ◽

Carlo Janna ◽

Pietro Teatini ◽

...

Keyword(s):

Data Assimilation ◽

Land Subsidence ◽

Numerical Models ◽

Methodological Approach ◽

Constitutive Behavior ◽

Test Case ◽

Combining Data ◽

Physically Based ◽

Synthetic Test ◽

Adriatic Basin

AbstractAnthropogenic land subsidence can be evaluated and predicted by numerical models, which are often built over deterministic analyses. However, uncertainties and approximations are present, as in any other modeling activity of real-world phenomena. This study aims at combining data assimilation techniques with a physically-based numerical model of anthropogenic land subsidence in a novel and comprehensive workflow, to overcome the main limitations concerning the way traditional deterministic analyses use the available measurements. The proposed methodology allows to reduce uncertainties affecting the model, identify the most appropriate rock constitutive behavior and characterize the most significant governing geomechanical parameters. The proposed methodological approach has been applied in a synthetic test case representative of the Upper Adriatic basin, Italy. The integration of data assimilation techniques into geomechanical modeling appears to be a useful and effective tool for a more reliable study of anthropogenic land subsidence.

Download Full-text