scholarly journals Combining data assimilation and machine learning to infer unresolved scale parametrization

Author(s):  
Julien Brajard ◽  
Alberto Carrassi ◽  
Marc Bocquet ◽  
Laurent Bertino

In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most cases, the ML training leverages high-resolution simulations to provide a dense, noiseless target state. Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data, in the realistic scenario of noisy and sparse observations. The algorithm proposed in this work is a two-step process. First, data assimilation (DA) techniques are applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as a model error in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrization model is added to the physical core truncated model to produce a hybrid model. The DA component of the proposed method relies on an ensemble Kalman filter while the ML parametrization is represented by a neural network. The approach is applied to the two-scale Lorenz model and to MAOOAM, a reduced-order coupled ocean-atmosphere model. We show that in both cases, the hybrid model yields forecasts with better skill than the truncated model. Moreover, the attractor of the system is significantly better represented by the hybrid model than by the truncated model. This article is part of the theme issue ‘Machine learning for weather and climate modelling’.

2020 ◽  
Author(s):  
Julien Brajard ◽  
Alberto Carrassi ◽  
Marc Bocquet ◽  
Laurent Bertino

<p>Can we build a machine learning parametrization in a numerical model using sparse and noisy observations?</p><p>In recent years, machine learning (ML) has been proposed to devise data-driven parametrizations of unresolved processes in dynamical numerical models. In most of the cases, ML is trained by coarse-graining high-resolution simulations to provide a dense, unnoisy target state (or even the tendency of the model).</p><p>Our goal is to go beyond the use of high-resolution simulations and train ML-based parametrization using direct data. Furthermore, we intentionally place ourselves in the realistic scenario of noisy and sparse observations.</p><p>The algorithm proposed in this work derives from the algorithm presented by the same authors in https://arxiv.org/abs/2001.01520.The principle is to first apply data assimilation (DA) techniques to estimate the full state of the system from a non-parametrized model, referred hereafter as the physical model. The parametrization term to be estimated is viewed as a model error in the DA system. In a second step, ML is used to define the parametrization, e.g., a predictor of the model error given the state of the system. Finally, the ML system is incorporated within the physical model to produce a hybrid model, combining a physical core with a ML-based parametrization.</p><p>The approach is applied to dynamical systems from low to intermediate complexity. The DA component of the proposed approach relies on an ensemble Kalman filter/smoother while the parametrization is represented by a convolutional neural network.  </p><p>We show that the hybrid model yields better performance than the physical model in terms of both short-term (forecast skill) and long-term (power spectrum, Lyapunov exponents) properties. Sensitivity to the noise and density of observation is also assessed.</p>


Author(s):  
Alban Farchi ◽  
Patrick Laloyaux ◽  
Massimo Bonavita ◽  
Marc Bocquet

<p>Recent developments in machine learning (ML) have demonstrated impressive skills in reproducing complex spatiotemporal processes. However, contrary to data assimilation (DA), the underlying assumption behind ML methods is that the system is fully observed and without noise, which is rarely the case in numerical weather prediction. In order to circumvent this issue, it is possible to embed the ML problem into a DA formalism characterised by a cost function similar to that of the weak-constraint 4D-Var (Bocquet et al., 2019; Bocquet et al., 2020). In practice ML and DA are combined to solve the problem: DA is used to estimate the state of the system while ML is used to estimate the full model. </p><p>In realistic systems, the model dynamics can be very complex and it may not be possible to reconstruct it from scratch. An alternative could be to learn the model error of an already existent model using the same approach combining DA and ML. In this presentation, we test the feasibility of this method using a quasi geostrophic (QG) model. After a brief description of the QG model model, we introduce a realistic model error to be learnt. We then asses the potential of ML methods to reconstruct this model error, first with perfect (full and noiseless) observation and then with sparse and noisy observations. We show in either case to what extent the trained ML models correct the mid-term forecasts. Finally, we show how the trained ML models can be used in a DA system and to what extent they correct the analysis.</p><p>Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models, Nonlin. Processes Geophys., 26, 143–162, 2019</p><p>Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2 (1), 55-80, 2020</p><p>Farchi, A., Laloyaux, P., Bonavita, M., and Bocquet, M.: Using machine learning to correct model error in data assimilation and forecast applications, arxiv:2010.12605, submitted. </p>


2015 ◽  
Vol 143 (10) ◽  
pp. 3893-3911 ◽  
Author(s):  
Soyoung Ha ◽  
Judith Berner ◽  
Chris Snyder

Abstract Mesoscale forecasts are strongly influenced by physical processes that are either poorly resolved or must be parameterized in numerical models. In part because of errors in these parameterizations, mesoscale ensemble data assimilation systems generally suffer from underdispersiveness, which can limit the quality of analyses. Two explicit representations of model error for mesoscale ensemble data assimilation are explored: a multiphysics ensemble in which each member’s forecast is based on a distinct suite of physical parameterization, and stochastic kinetic energy backscatter in which small noise terms are included in the forecast model equations. These two model error techniques are compared with a baseline experiment that includes spatially and temporally adaptive covariance inflation, in a domain over the continental United States using the Weather Research and Forecasting (WRF) Model for mesoscale ensemble forecasts and the Data Assimilation Research Testbed (DART) for the ensemble Kalman filter. Verification against independent observations and Rapid Update Cycle (RUC) 13-km analyses for the month of June 2008 showed that including the model error representation improved not only the analysis ensemble, but also short-range forecasts initialized from these analyses. Explicitly accounting for model uncertainty led to a better-tuned ensemble spread, a more skillful ensemble mean, and higher probabilistic scores, as well as significantly reducing the need for inflation. In particular, the stochastic backscatter scheme consistently outperformed both the multiphysics approach and the control run with adaptive inflation over almost all levels of the atmosphere both deterministically and probabilistically.


Author(s):  
Seunghee Lee ◽  
Seohui Park ◽  
Myong‐In Lee ◽  
Ganghan Kim ◽  
Jungho Im ◽  
...  

2006 ◽  
Vol 134 (10) ◽  
pp. 2900-2915 ◽  
Author(s):  
Tijana Janjić ◽  
Stephen E. Cohn

Abstract Observations of the atmospheric state include scales of motion that are not resolved by numerical models into which the observed data are assimilated. The resulting observation error due to unresolved scales, part of the “representativeness error,” is state dependent and correlated in time. A mathematical formalism and algorithmic approach has been developed for treating this error in the data assimilation process, under an assumption that there is no model error. The approach is based on approximating the continuum Kalman filter in such a way as to maintain terms that account for the observation error due to unresolved scales. The two resulting approximate filters resemble the Schmidt–Kalman filter and the traditional discrete Kalman filter. The approach is tested for the model problem of a passive tracer undergoing advection in a shear flow on the sphere. The state contains infinitely many spherical harmonics, with a nonstationary spectrum, and the problem is to estimate the projection of this state onto a finite spherical harmonic expansion, using observations of the full state. Numerical experiments demonstrate that approximate filters work well for the model problem provided that the exact covariance function of the unresolved scales is known. The traditional filter is more convenient in practice since it requires only the covariance matrix obtained by evaluating this covariance function at the observation points. A method for modeling this covariance matrix in the traditional filter is successful for the model problem.


Author(s):  
Laura Gazzola ◽  
Gazzola Ferronato ◽  
Matteo Frigo ◽  
Carlo Janna ◽  
Pietro Teatini ◽  
...  

AbstractAnthropogenic land subsidence can be evaluated and predicted by numerical models, which are often built over deterministic analyses. However, uncertainties and approximations are present, as in any other modeling activity of real-world phenomena. This study aims at combining data assimilation techniques with a physically-based numerical model of anthropogenic land subsidence in a novel and comprehensive workflow, to overcome the main limitations concerning the way traditional deterministic analyses use the available measurements. The proposed methodology allows to reduce uncertainties affecting the model, identify the most appropriate rock constitutive behavior and characterize the most significant governing geomechanical parameters. The proposed methodological approach has been applied in a synthetic test case representative of the Upper Adriatic basin, Italy. The integration of data assimilation techniques into geomechanical modeling appears to be a useful and effective tool for a more reliable study of anthropogenic land subsidence.


Sign in / Sign up

Export Citation Format

Share Document