A New Scheme of Adaptive Covariance Inflation for Ensemble Filtering Data Assimilation

Due to the model and sampling errors of the finite ensemble, the background ensemble spread becomes small and the error covariance is underestimated during filtering for data assimilation. Because of the constraint of computational resources, it is difficult to use a large ensemble size to reduce sampling errors in high-dimensional real atmospheric and ocean models. Here, based on Bayesian theory, we explore a new spatially and temporally varying adaptive covariance inflation algorithm. To increase the statistical presentation of a finite background ensemble, the prior probability of inflation obeys the inverse chi-square distribution, and the likelihood function obeys the t distribution, which are used to obtain prior or posterior covariance inflation schemes. Different ensemble sizes are used to compare the assimilation quality with other inflation schemes within both the perfect and biased model frameworks. With two simple coupled models, we examined the performance of the new scheme. The results show that the new inflation scheme performed better than existing schemes in some cases, with more stability and fewer assimilation errors, especially when a small ensemble size was used in the biased model. Due to better computing performance and relaxed demand for computational resources, the new scheme has more potential applications in more comprehensive models for prediction initialization and reanalysis. In a word, the new inflation scheme performs well for a small ensemble size, and it may be more suitable for large-scale models.

Download Full-text

Improving Weather Forecast Skill through Reduced-Precision Data Assimilation

Monthly Weather Review ◽

10.1175/mwr-d-17-0132.1 ◽

2017 ◽

Vol 146 (1) ◽

pp. 49-62 ◽

Cited By ~ 10

Author(s):

Sam Hatfield ◽

Aneesh Subramanian ◽

Tim Palmer ◽

Peter Düben

Keyword(s):

Data Assimilation ◽

Observation Error ◽

Double Precision ◽

Ensemble Size ◽

Error Statistics ◽

Rounding Errors ◽

Precision Data ◽

Ensemble Data Assimilation ◽

Ensemble Data ◽

Computational Resources

Abstract A new approach for improving the accuracy of data assimilation, by trading numerical precision for ensemble size, is introduced. Data assimilation is inherently uncertain because of the use of noisy observations and imperfect models. Thus, the larger rounding errors incurred from reducing precision may be within the tolerance of the system. Lower-precision arithmetic is cheaper, and so by reducing precision in ensemble data assimilation, computational resources can be redistributed toward, for example, a larger ensemble size. Because larger ensembles provide a better estimate of the underlying distribution and are less reliant on covariance inflation and localization, lowering precision could actually permit an improvement in the accuracy of weather forecasts. Here, this idea is tested on an ensemble data assimilation system comprising the Lorenz ’96 toy atmospheric model and the ensemble square root filter. The system is run at double-, single-, and half-precision (the latter using an emulation tool), and the performance of each precision is measured through mean error statistics and rank histograms. The sensitivity of these results to the observation error and the length of the observation window are addressed. Then, by reinvesting the saved computational resources from reducing precision into the ensemble size, assimilation error can be reduced for (hypothetically) no extra cost. This results in increased forecasting skill, with respect to double-precision assimilation.

Download Full-text

Inherent Predictability, Requirements on the Ensemble Size, and Complementarity

Monthly Weather Review ◽

10.1175/mwr-d-15-0022.1 ◽

2015 ◽

Vol 143 (8) ◽

pp. 3192-3203 ◽

Cited By ~ 16

Author(s):

Arun Kumar ◽

Mingyue Chen

Keyword(s):

Long Range ◽

Prediction Skill ◽

Ensemble Size ◽

Perceived Need ◽

High Predictability ◽

Small Ensemble ◽

Computational Resources ◽

General Scope

Abstract Faced with the scenario when prediction skill is low, particularly in conjunction with long-range predictions, a commonly proposed solution is that an increase in ensemble size will rectify the issue of low skill. Although it is well known that an increase in ensemble size does lead to an increase in prediction skill, the general scope of this supposition, however, is that low prediction skill is not a consequence of constraints imposed by inherent predictability limits, but an artifact of small ensemble sizes, and further, increases in ensemble sizes (that are often limited by computational resources) are the major bottlenecks for improving long-range predictions. In proposing that larger ensemble sizes will remedy the issue of low skill, a fact that is not well appreciated is that for scenarios with high inherent predictability, a small ensemble size is sufficient to realize high predictability, while for scenarios with low inherent predictability, much larger ensemble sizes are needed to realize low predictability. In other words, requirements on ensemble size (to realize the inherent predictability) and inherent predictability are complementary variables. A perceived need for larger ensembles, therefore, may also imply the presence of low predictability.

Download Full-text

Assessment of a Nonlinear Ensemble Transform Filter for High-Dimensional Data Assimilation

Monthly Weather Review ◽

10.1175/mwr-d-15-0073.1 ◽

2016 ◽

Vol 144 (1) ◽

pp. 409-427 ◽

Cited By ~ 8

Author(s):

Julian Tödter ◽

Paul Kirchgessner ◽

Lars Nerger ◽

Bodo Ahrens

Keyword(s):

Data Assimilation ◽

Particle Filtering ◽

General Circulation ◽

Large Scale ◽

Circulation Model ◽

Second Order ◽

High Dimensional ◽

Deterministic System ◽

Ensemble Size ◽

Ensemble Transform

Abstract This work assesses the large-scale applicability of the recently proposed nonlinear ensemble transform filter (NETF) in data assimilation experiments with the NEMO ocean general circulation model. The new filter constitutes a second-order exact approximation to fully nonlinear particle filtering. Thus, it relaxes the Gaussian assumption contained in ensemble Kalman filters. The NETF applies an update step similar to the local ensemble transform Kalman filter (LETKF), which allows for efficient and simple implementation. Here, simulated observations are assimilated into a simplified ocean configuration that exhibits globally high-dimensional dynamics with a chaotic mesoscale flow. The model climatology is used to initialize an ensemble of 120 members. The number of observations in each local filter update is of the same order resulting from the use of a realistic oceanic observation scenario. Here, an importance sampling particle filter (PF) would require at least 106 members. Despite the relatively small ensemble size, the NETF remains stable and converges to the truth. In this setup, the NETF achieves at least the performance of the LETKF. However, it requires a longer spinup period because the algorithm only relies on the particle weights at the analysis time. These findings show that the NETF can successfully deal with a large-scale assimilation problem in which the local observation dimension is of the same order as the ensemble size. Thus, the second-order exact NETF does not suffer from the PF’s curse of dimensionality, even in a deterministic system.

Download Full-text

On the Selection of Localization Radius in Ensemble Filtering for Multiscale Quasigeostrophic Dynamics

Monthly Weather Review ◽

10.1175/mwr-d-17-0336.1 ◽

2018 ◽

Vol 146 (2) ◽

pp. 543-560 ◽

Cited By ~ 8

Author(s):

Yue Ying ◽

Fuqing Zhang ◽

Jeffrey L. Anderson

Keyword(s):

Correlation Length ◽

Large Scale ◽

Observation Error ◽

State Variables ◽

Model Resolution ◽

Ensemble Size ◽

Sampling Errors ◽

Localization Radius ◽

Spectral Variance ◽

Best Localization

Covariance localization remedies sampling errors due to limited ensemble size in ensemble data assimilation. Previous studies suggest that the optimal localization radius depends on ensemble size, observation density and accuracy, as well as the correlation length scale determined by model dynamics. A comprehensive localization theory for multiscale dynamical systems with varying observation density remains an active area of research. Using a two-layer quasigeostrophic (QG) model, this study systematically evaluates the sensitivity of the best Gaspari–Cohn localization radius to changes in model resolution, ensemble size, and observing networks. Numerical experiment results show that the best localization radius is smaller for smaller-scale components of a QG flow, indicating its scale dependency. The best localization radius is rather insensitive to changes in model resolution, as long as the key dynamical processes are reasonably well represented by the low-resolution model with inflation methods that account for representation errors. As ensemble size decreases, the best localization radius shifts to smaller values. However, for nonlocal correlations between an observation and state variables that peak at a certain distance, decreasing localization radii further within this distance does not reduce analysis errors. Increasing the density of an observing network has two effects that both reduce the best localization radius. First, the reduced observation error spectral variance further constrains prior ensembles at large scales. Less large-scale contribution results in a shorter overall correlation length, which favors a smaller localization radius. Second, a denser network provides more independent pieces of information, thus a smaller localization radius still allows the same number of observations to constrain each state variable.

Download Full-text

Methods of investigating forecast error sensitivity to ensemble size in a limited-area convection-permitting ensemble

10.5194/gmd-2017-260 ◽

2017 ◽

Cited By ~ 2

Author(s):

Ross Noel Bannister ◽

Stefano Migliorini ◽

Alison Clare Rudd ◽

Laura Hart Baker

Keyword(s):

Data Assimilation ◽

Weather Forecasting ◽

Sampling Error ◽

Forecast Error ◽

Significant Degree ◽

Limited Area ◽

Ensemble Size ◽

Sampling Errors ◽

Convective Scale

Abstract. Ensemble-based predictions are increasingly used as an aid to weather forecasting and to data assimilation, where the aim is to capture the range of possible outcomes consistent with the underlying uncertainties. Constraints on computing resources mean that ensembles have a relatively small size, which can lead to an incomplete range of possible outcomes, and to inherent sampling errors. This paper discusses how an existing ensemble can be relatively easily increased in size, it develops a range of standard and extended diagnostics to help determine whether a given ensemble is large enough to be useful for forecasting and data assimilation purposes, and it applies the diagnostics to a convective-scale case study for illustration. Diagnostics include the effect of ensemble size on various aspects of rainfall forecasts, kinetic energy spectra, and (co)-variance statistics in the spatial and spectral domains. The work here extends the Met Office's 24 ensemble members to 93. It is found that the extra members do develop a significant degree of linear independence, they increase the ensemble spread (although with caveats to do with non-Gaussianity), they reduce sampling error in many statistical quantities (namely variances, correlations, and length-scales), and improve the effective spatial resolution of the ensemble. The extra members though do not improve the probabilistic rain rate forecasts. It is assumed that the 93-member ensemble approximates the error-free statistics, which is a practical assumption, but the data suggests that this number of members is ultimately not enough to justify this assumption, and therefore more ensembles are likely required for such convective-scale systems to further reduce sampling errors, especially for ensemble data assimilation purposes.

Download Full-text

Time-Expanded Sampling for Ensemble-Based Data Assimilation Applied to Conventional and Satellite Observations

Weather and Forecasting ◽

10.1175/waf-d-14-00108.1 ◽

2015 ◽

Vol 30 (4) ◽

pp. 855-872 ◽

Cited By ~ 4

Author(s):

Qingyun Zhao ◽

Qin Xu ◽

Yi Jin ◽

Justin McLay ◽

Carolyn Reynolds

Keyword(s):

Data Assimilation ◽

Weather Prediction ◽

Numerical Weather Prediction Model ◽

Naval Research ◽

Ensemble Size ◽

Nested Grids ◽

Effectiveness And Efficiency ◽

Time Critical ◽

Computational Resources ◽

Operational Constraints

Abstract The time-expanded sampling (TES) method, designed to improve the effectiveness and efficiency of ensemble-based data assimilation and subsequent forecast with reduced ensemble size, is tested with conventional and satellite data for operational applications constrained by computational resources. The test uses the recently developed ensemble Kalman filter (EnKF) at the Naval Research Laboratory (NRL) for mesoscale data assimilation with the U.S. Navy’s mesoscale numerical weather prediction model. Experiments are performed for a period of 6 days with a continuous update cycle of 12 h. Results from the experiments show remarkable improvements in both the ensemble analyses and forecasts with TES compared to those without. The improvements in the EnKF analyses by TES are very similar across the model’s three nested grids of 45-, 15-, and 5-km grid spacing, respectively. This study demonstrates the usefulness of the TES method for ensemble-based data assimilation when the ensemble size cannot be sufficiently large because of operational constraints in situations where a time-critical environment assessment is needed or the computational resources are limited.

Download Full-text

Accounting for model error in air quality forecasts: an application of 4DEnVar to the assimilation of atmospheric composition using QG-Chem 1.0

Geoscientific Model Development ◽

10.5194/gmd-9-3933-2016 ◽

2016 ◽

Vol 9 (11) ◽

pp. 3933-3959 ◽

Cited By ~ 1

Author(s):

Emanuele Emili ◽

Selime Gürol ◽

Daniel Cariolle

Keyword(s):

Air Quality ◽

Data Assimilation ◽

Atmospheric Chemistry ◽

Large Scale ◽

Model Error ◽

Tropospheric Chemistry ◽

Atmospheric Composition ◽

Forecast Errors ◽

Model Errors ◽

Ensemble Size

Abstract. Model errors play a significant role in air quality forecasts. Accounting for them in the data assimilation (DA) procedures is decisive to obtain improved forecasts. We address this issue using a reduced-order coupled chemistry–meteorology model based on quasi-geostrophic dynamics and a detailed tropospheric chemistry mechanism, which we name QG-Chem. This model has been coupled to the software library for the data assimilation Object Oriented Prediction System (OOPS) and used to assess the potential of the 4DEnVar algorithm for air quality analyses and forecasts. The assets of 4DEnVar include the possibility to deal with multivariate aspects of atmospheric chemistry and to account for model errors of a generic type. A simple diagnostic procedure for detecting model errors is proposed, based on the 4DEnVar analysis and one additional model forecast. A large number of idealized data assimilation experiments are shown for several chemical species of relevance for air quality forecasts (O3, NOx, CO and CO2) with very different atmospheric lifetimes and chemical couplings. Experiments are done both under a perfect model hypothesis and including model error through perturbation of surface chemical emissions. Some key elements of the 4DEnVar algorithm such as the ensemble size and localization are also discussed. A comparison with results of 3D-Var, widely used in operational centers, shows that, for some species, analysis and next-day forecast errors can be halved when model error is taken into account. This result was obtained using a small ensemble size, which remains affordable for most operational centers. We conclude that 4DEnVar has a promising potential for operational air quality models. We finally highlight areas that deserve further research for applying 4DEnVar to large-scale chemistry models, i.e., localization techniques, propagation of analysis covariance between DA cycles and treatment for chemical nonlinearities. QG-Chem can provide a useful tool in this regard.

Download Full-text

Assessment of multilevel ensemble-based data assimilation for reservoir history matching

Computational Geosciences ◽

10.1007/s10596-019-09911-x ◽

2019 ◽

Vol 24 (1) ◽

pp. 217-239

Author(s):

Kristian Fossum ◽

Trond Mannseth ◽

Andreas S. Stordal

Keyword(s):

Kalman Filter ◽

Data Assimilation ◽

Ensemble Kalman Filter ◽

History Matching ◽

Computational Cost ◽

Ensemble Size ◽

Matching Problems ◽

Reservoir Models ◽

Reservoir History Matching ◽

Computational Resources

AbstractMultilevel ensemble-based data assimilation (DA) as an alternative to standard (single-level) ensemble-based DA for reservoir history matching problems is considered. Restricted computational resources currently limit the ensemble size to about 100 for field-scale cases, resulting in large sampling errors if no measures are taken to prevent it. With multilevel methods, the computational resources are spread over models with different accuracy and computational cost, enabling a substantially increased total ensemble size. Hence, reduced numerical accuracy is partially traded for increased statistical accuracy. A novel multilevel DA method, the multilevel hybrid ensemble Kalman filter (MLHEnKF) is proposed. Both the expected and the true efficiency of a previously published multilevel method, the multilevel ensemble Kalman filter (MLEnKF), and the MLHEnKF are assessed for a toy model and two reservoir models. A multilevel sequence of approximations is introduced for all models. This is achieved via spatial grid coarsening and simple upscaling for the reservoir models, and via a designed synthetic sequence for the toy model. For all models, the finest discretization level is assumed to correspond to the exact model. The results obtained show that, despite its good theoretical properties, MLEnKF does not perform well for the reservoir history matching problems considered. We also show that this is probably caused by the assumptions underlying its theoretical properties not being fulfilled for the multilevel reservoir models considered. The performance of MLHEnKF, which is designed to handle restricted computational resources well, is quite good. Furthermore, the toy model is utilized to set up a case where the assumptions underlying the theoretical properties of MLEnKF are fulfilled. On that case, MLEnKF performs very well and clearly better than MLHEnKF.

Download Full-text

Spatiotemporal Disaggregation of Remotely Sensed Precipitation for Ensemble Hydrologic Modeling and Data Assimilation

Journal of Hydrometeorology ◽

10.1175/jhm492.1 ◽

2006 ◽

Vol 7 (3) ◽

pp. 511-533 ◽

Cited By ~ 15

Author(s):

Steven A. Margulis ◽

Dara Entekhabi ◽

Dennis McLaughlin

Keyword(s):

Soil Moisture ◽

Data Assimilation ◽

Land Surface ◽

Large Scale ◽

Surface Flux ◽

Global Precipitation Climatology Project ◽

Remotely Sensed ◽

Open Loop ◽

Sampling Errors ◽

Spatial Coverage

Abstract Historically, estimates of precipitation for hydrologic applications have largely been obtained using ground-based rain gauges despite the fact that they can contain significant measurement and sampling errors. Remotely sensed precipitation products provide the ability to overcome spatial coverage limitations, but the direct use of these products generally suffers from their relatively coarse spatial and temporal resolution and inherent retrieval errors. A simple ensemble-based disaggregation scheme is proposed as a general framework for using remotely sensed precipitation data in hydrologic applications. The scheme generates fine-scale precipitation realizations that are conditioned on large-scale precipitation measurements. The ensemble approach allows for uncertainty related to the complex error characteristics of the remotely sensed precipitation (undetected events, nonzero false alarm rate, etc.) to be taken into account. The methodology is applied through several synthetic experiments over the southern Great Plains using the Global Precipitation Climatology Project 1° daily (GPCP-1DD) product. The scheme is shown to reasonably capture the land-surface-forcing variability and propagate this uncertainty to the estimation of soil moisture and land surface flux fields at fine scales. The ensemble results outperform a case using sparse ground-based forcing. Additionally, the ensemble nature of the framework allows for simply merging the open-loop soil moisture estimation scheme with modern data assimilation techniques like the ensemble Kalman filter. Results show that estimation of the soil moisture and surface flux fields are further improved through the assimilation of coarse-scale microwave radiobrightness observations.

Download Full-text

Statistical Justification of Pearson's Criterion for Testing a Complex Hypothesis on the Uniform Distribution

Mechanical Engineering and Computer Science ◽

10.24108/0418.0001392 ◽

2018 ◽

pp. 45-53

Author(s):

T. V. Oblakova

Keyword(s):

Uniform Distribution ◽

Degrees Of Freedom ◽

Likelihood Function ◽

Random Number Generator ◽

Maximum Likelihood Estimates ◽

Chi Square ◽

Main Hypothesis ◽

Pearson Criterion ◽

Complex Hypothesis ◽

Uniform Law

The paper is studying the justification of the Pearson criterion for checking the hypothesis on the uniform distribution of the general totality. If the distribution parameters are unknown, then estimates of the theoretical frequencies are used [1, 2, 3]. In this case the quantile of the chi-square distribution with the number of degrees of freedom, reduced by the number of parameters evaluated, is used to determine the upper threshold of the main hypothesis acceptance [7]. However, in the case of a uniform law, the application of Pearson's criterion does not extend to complex hypotheses, since the likelihood function does not allow differentiation with respect to parameters, which is used in the proof of the theorem mentioned [7, 10, 11].A statistical experiment is proposed in order to study the distribution of Pearson statistics for samples from a uniform law. The essence of the experiment is that at first a statistically significant number of one-type samples from a given uniform distribution is modeled, then for each sample Pearson statistics are calculated, and then the law of distribution of the totality of these statistics is studied. Modeling and processing of samples were performed in the Mathcad 15 package using the built-in random number generator and array processing facilities.In all the experiments carried out, the hypothesis that the Pearson statistics conform to the chi-square law was unambiguously accepted (confidence level 0.95). It is also statistically proved that the number of degrees of freedom in the case of a complex hypothesis need not be corrected. That is, the maximum likelihood estimates of the uniform law parameters implicitly used in calculating Pearson statistics do not affect the number of degrees of freedom, which is thus determined by the number of grouping intervals only.

Download Full-text