Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Abstract. Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov Chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally-demanding models and large data sets. We describe an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature, and introduced novel approaches to the specification of both model and data uncertainties, including bias and autocorrelation corrections on multiple data streams. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard "bruteforce" MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model’s parameters with comparable performance to the bruteforce approach, but reduced computation time by more than two orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (bruteforce) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties showing that the emulator method makes it possible to efficiently calibrate complex models. This efficient data assimilation method allows us to conduct more calibration experiments in relatively much shorter times, enabling constraining of numerous models using the expanding amount and types of data.

Download Full-text

Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Biogeosciences ◽

10.5194/bg-15-5801-2018 ◽

2018 ◽

Vol 15 (19) ◽

pp. 5801-5830 ◽

Cited By ~ 26

Author(s):

Istem Fer ◽

Ryan Kelly ◽

Paul R. Moorcroft ◽

Andrew D. Richardson ◽

Elizabeth M. Cowdery ◽

...

Keyword(s):

Data Assimilation ◽

Bayesian Model ◽

Critical Role ◽

Ecosystem Dynamics ◽

Model Assessment ◽

Brute Force ◽

Sufficient Statistics ◽

True Parameter ◽

Multiple Data ◽

Data Constraints

Abstract. Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.

Download Full-text

Data assimilation challenges posed by nonlinear operators: A comparative study of ensemble and variational filters and smoothers

Monthly Weather Review ◽

10.1175/mwr-d-20-0368.1 ◽

2021 ◽

Author(s):

Kenta Kurosawa ◽

Jonathan Poterjoy

Keyword(s):

Data Assimilation ◽

Weather Prediction ◽

Ensemble Methods ◽

Current Data ◽

Bayesian Filtering ◽

Posterior Density ◽

Nonlinear Operators ◽

Linear System Of Equations ◽

Multiple Data ◽

Non Gaussian

AbstractThe ensemble Kalman Filter (EnKF) and the 4D variational method (4DVar) are the most commonly used filters and smoothers in atmospheric science. These methods typically approximate prior densities using a Gaussian and solve a linear system of equations for the posterior mean and covariance. Therefore, strongly nonlinear model dynamics and measurement operators can lead to bias in posterior estimates. To improve the performance in nonlinear regimes, minimization of the 4DVar cost function typically follows multiple sets of iterations, known as an “outer loop”, which helps reduce bias caused by linear assumptions. Alternatively, “iterative ensemble methods” follow a similar strategy of periodically re-linearizing model and measurement operators. These methods come with different, possibly more appropriate, assumptions for drawing samples from the posterior density, but have seen little attention in numerical weather prediction (NWP) communities. Lastly, particle filters (PFs) present a purely Bayesian filtering approach for state estimation, which avoids many of the assumptions made by the above methods. Several strategies for applying localized PFs for NWP have been proposed very recently. The current study investigates intrinsic limitations of current data assimilation methodology for applications that require nonlinear measurement operators. In doing so, it targets a specific problem that is relevant to the assimilation of remotely-sensed measurements, such as radar reflectivity and all-sky radiances, which pose challenges for Gaussian-based data assimilation systems. This comparison includes multiple data assimilation approaches designed recently for nonlinear/non-Gaussian applications, as well as those currently used for NWP.

Download Full-text

Influences of the inflation factors generation in the main parameters of the ensemble smoother with multiple data assimilation

Journal of Petroleum Science and Engineering ◽

10.1016/j.petrol.2021.108648 ◽

2021 ◽

pp. 108648

Author(s):

Thiago M.D. Silva ◽

Sinesio Pesco ◽

Abelardo Barreto

Keyword(s):

Data Assimilation ◽

Multiple Data ◽

Ensemble Smoother

Download Full-text

A new procedure for generating data covariance inflation factors for ensemble smoother with multiple data assimilation

Computers & Geosciences ◽

10.1016/j.cageo.2021.104722 ◽

2021 ◽

Vol 150 ◽

pp. 104722

Author(s):

Thiago M.D. Silva ◽

Sinesio Pesco ◽

Abelardo Barreto Jr. ◽

Mustafa Onur

Keyword(s):

Data Assimilation ◽

Multiple Data ◽

Ensemble Smoother

Download Full-text

Efficient Dimensionality Reduction Methods in Reservoir History Matching

Energies ◽

10.3390/en14113137 ◽

2021 ◽

Vol 14 (11) ◽

pp. 3137

Author(s):

Amine Tadjer ◽

Reider B. Bratvold ◽

Remus G. Hanea

Keyword(s):

Data Assimilation ◽

Dimensionality Reduction ◽

Gaussian Process ◽

Latent Variable ◽

History Matching ◽

Production Performance ◽

Latent Variable Model ◽

Variable Model ◽

Multiple Data ◽

Ensemble Smoother

Production forecasting is the basis for decision making in the oil and gas industry, and can be quite challenging, especially in terms of complex geological modeling of the subsurface. To help solve this problem, assisted history matching built on ensemble-based analysis such as the ensemble smoother and ensemble Kalman filter is useful in estimating models that preserve geological realism and have predictive capabilities. These methods tend, however, to be computationally demanding, as they require a large ensemble size for stable convergence. In this paper, we propose a novel method of uncertainty quantification and reservoir model calibration with much-reduced computation time. This approach is based on a sequential combination of nonlinear dimensionality reduction techniques: t-distributed stochastic neighbor embedding or the Gaussian process latent variable model and clustering K-means, along with the data assimilation method ensemble smoother with multiple data assimilation. The cluster analysis with t-distributed stochastic neighbor embedding and Gaussian process latent variable model is used to reduce the number of initial geostatistical realizations and select a set of optimal reservoir models that have similar production performance to the reference model. We then apply ensemble smoother with multiple data assimilation for providing reliable assimilation results. Experimental results based on the Brugge field case data verify the efficiency of the proposed approach.

Download Full-text

An Adaptive Ensemble Smoother with Multiple Data Assimilation for Assisted History Matching

10.2118/173214-ms ◽

2015 ◽

Cited By ~ 10

Author(s):

Duc H. Le ◽

Alexandre A. Emerick ◽

Albert C. Reynolds

Keyword(s):

Data Assimilation ◽

History Matching ◽

Multiple Data ◽

Ensemble Smoother ◽

Assisted History Matching

Download Full-text

A Paleoenvironmental and Archaeological Model-Based Age Estimate for the Colonization of Hawai’l

American Antiquity ◽

10.7183/0002-7316.79.1.144 ◽

2014 ◽

Vol 79 (1) ◽

pp. 144-155 ◽

Cited By ~ 31

Author(s):

J. Stephen Athens ◽

Timothy M. Rieth ◽

Thomas S. Dye

Keyword(s):

Bayesian Model ◽

Depositional Model ◽

Considerable Variability ◽

Human Settlement ◽

Posterior Density ◽

Density Region ◽

Radiocarbon Dates ◽

The Core ◽

Initial Settlement ◽

Highest Posterior Density

AbstractRecent estimates of when Hawai’i was colonized by Polynesians display considerable variability, with dates ranging from about A.D. 800 to 1250. Using high resolution paleoenvironmental coring data and a carefully defined set of archaeological radiocarbon dates, a Bayesian model for initial settlement was constructed. The pollen and charcoal assemblages of the core record made it possible to identify and date the prehuman period and also the start of human settlement using a simple depositional model. The archaeological and paleoenvironmental estimates of the colonization date show a striking convergence, indicating that initial settlement occurred at A.D. 940–1130 at a 95 percent highest posterior density region (HPD), and most probably between A.D. 1000 to 1100, using a 67 percent HPD. This analysis highlights problems that may occur when paleoenvironmental core chronologies are based on bulk soil dates. Further research on the dating of the bones ofRattus exulans, a Polynesian introduction, may refine the dating model, as would archaeological investigations focused on potential early site locations.

Download Full-text

Recognizing Physical Activity of hospitalized Older People from Wearable Sensors Data using IoT

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2022010104 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Technical Assistance ◽

Performance Metrics ◽

Wearable Sensors ◽

Critical Role ◽

Elderly Person ◽

The Elderly ◽

Classification Algorithms ◽

Complex Task ◽

Almost All ◽

Different Sources

The IoT is a new concept that provides a world where smart, connected, embedded systems operate, giving rise to the amount of data from different sources that are considered to have highly useful and valuable information. Data mining would play a critical role in creating smarter IoT. Traditional care of an elderly person is a difficult and complex task. The need to have a caregiver with the elderly person almost all the time drains the human and financial resources of the health care system. The emergence of Artificial intelligence has allowed the conception of technical assistance where it helps and reduces the time spent by the caregiver with the elderly person. This work aims to focus on analyzing techniques that are used for prediction purposes of falls in the elderly. We examine the applicability of three classification algorithms for IoT data. These algorithms are analyzed and a comparative study is undertaken to find the classifier that performs the best analysis on the dataset using a set of predefined performance metrics to compare the results of each classifier.

Download Full-text

Advancing an Ecosystem Approach in the Gulf of Maine

Advancing an Ecosystem Approach in the Gulf of Maine ◽

10.47886/9781934874301.ch15 ◽

2012 ◽

Keyword(s):

Large Scale ◽

Gulf Of Maine ◽

Critical Role ◽

Monitoring Program ◽

Ecosystem Dynamics ◽

Georges Bank ◽

Scotian Shelf ◽

Monitoring Programs ◽

Zooplankton Dynamics ◽

Biological Environment

<i>Abstract</i>.—Zooplankton communities perform a critical role as secondary producers in marine ecosystems. They are vulnerable to climate-induced changes in the marine environment, including temperature, stratification, and circulation, but the effects of these changes are difficult to discern without sustained ocean monitoring. The physical, chemical, and biological environment of the Gulf of Maine, including Georges Bank, is strongly influenced by inflow from the Scotian Shelf and through the Northeast Channel, and thus observations both in the Gulf of Maine and in upstream regions are necessary to understand plankton variability and change in the Gulf of Maine. Large-scale, quasi synoptic plankton surveys have been performed in the Gulf of Maine since Bigelow’s work at the beginning of the 20th century. More recently, ongoing plankton monitoring efforts include Continuous Plankton Recorder sampling in the Gulf of Maine and on the Scotian Shelf, U.S. National Marine Fisheries Service’s MARMAP (Marine Resources Monitoring, Assessment, and Prediction) and EcoMon (Ecosystem Monitoring) programs sampling the northeast U.S. Continental Shelf, including the Gulf of Maine, and Fisheries and Oceans Canada’s Atlantic Zone Monitoring Program on the Scotian Shelf and in the eastern Gulf of Maine. Here, we review and compare past and ongoing zooplankton monitoring programs in the Gulf of Maine region, including Georges Bank and the western Scotian Shelf, to facilitate retrospective analysis and broadscale synthesis of zooplankton dynamics in the Gulf of Maine. Additional sustained sampling at greater-than-monthly frequency at selected sites in the Gulf of Maine would be necessary to detect changes in phenology (i.e. seasonal timing of biological events). Sustained zooplankton sampling in critical nearshore fish habitats and in key feeding areas for upper trophic level organisms, such as marine mammals and seabirds, would yield significant insights into their dynamics. The ecosystem dynamics of the Gulf of Maine are strongly influenced by large-scale forcing and variability in upstream inflow. Improved coordination of sampling and data analysis among monitoring programs, effective data management, and use of multiple modeling approaches will all enhance the mechanistic understanding of the structure and function of the Gulf of Maine pelagic ecosystem.

Download Full-text

Reservoir Inverse Modeling by Ensemble Smoother with Multiple Data Assimilation for Seismic and Production Data

ECMOR XVI - 16th European Conference on the Mathematics of Oil Recovery ◽

10.3997/2214-4609.201802284 ◽

2018 ◽

Author(s):

Z. Wang ◽

H. Tang ◽

Z. Lv ◽

Q. Liu

Keyword(s):

Data Assimilation ◽

Inverse Modeling ◽

Production Data ◽

Multiple Data ◽

Ensemble Smoother

Download Full-text