An objective prior error quantification for regional atmospheric inverse applications

Abstract. Assigning proper prior uncertainties for inverse modelling of CO2 is of high importance, both to regularise the otherwise ill-constrained inverse problem and to quantitatively characterise the magnitude and structure of the error between prior and "true" flux. We use surface fluxes derived from three biosphere models – VPRM, ORCHIDEE, and 5PM – and compare them against daily averaged fluxes from 53 eddy covariance sites across Europe for the year 2007 and against repeated aircraft flux measurements encompassing spatial transects. In addition we create synthetic observations using modelled fluxes instead of the observed ones to explore the potential to infer prior uncertainties from model–model residuals. To ensure the realism of the synthetic data analysis, a random measurement noise was added to the modelled tower fluxes which were used as reference. The temporal autocorrelation time for tower model–data residuals was found to be around 30 days for both VPRM and ORCHIDEE but significantly different for the 5PM model with 70 days. This difference is caused by a few sites with large biases between the data and the 5PM model. The spatial correlation of the model–data residuals for all models was found to be very short, up to few tens of kilometres but with uncertainties up to 100 % of this estimation. Propagating this error structure to annual continental scale yields an uncertainty of 0.06 Gt C and strongly underestimates uncertainties typically used from atmospheric inversion systems, revealing another potential source of errors. Long spatial e-folding correlation lengths up to several hundreds of kilometres were determined when synthetic data were used. Results from repeated aircraft transects in south-western France are consistent with those obtained from the tower sites in terms of spatial autocorrelation (35 km on average) while temporal autocorrelation is markedly lower (13 days). Our findings suggest that the different prior models have a common temporal error structure. Separating the analysis of the statistics for the model data residuals by seasons did not result in any significant differences of the spatial e-folding correlation lengths.

Download Full-text

An objective prior error quantification for regional atmospheric inverse applications

Biogeosciences Discussions ◽

10.5194/bgd-12-9393-2015 ◽

2015 ◽

Vol 12 (12) ◽

pp. 9393-9441

Author(s):

P. Kountouris ◽

C. Gerbig ◽

K.-U. Totsche ◽

A.-J. Dolman ◽

A.-G.-C.-A. Meesters ◽

...

Keyword(s):

Spatial Correlation ◽

Synthetic Data ◽

Surface Fluxes ◽

Model Data ◽

Correlation Lengths ◽

Temporal Autocorrelation ◽

Objective Prior ◽

Model Residuals ◽

Tower Model ◽

Aircraft Flux Measurements

Abstract. Assigning proper prior uncertainties for inverse modeling of CO2 is of high importance, both to regularize the otherwise ill-constrained inverse problem, and to quantitatively characterize the magnitude and structure of the error between prior and "true" flux. We use surface fluxes derived from three biosphere models VPRM, ORCHIDEE, and 5PM, and compare them against daily averaged fluxes from 53 Eddy Covariance sites across Europe for the year 2007, and against repeated aircraft flux measurements encompassing spatial transects. In addition we create synthetic observations to substitute observed by modeled fluxes to explore the potential to infer prior uncertainties from model-model residuals. To ensure the realism of the synthetic data analysis, a random measurement noise was added to the tower fluxes which were used as reference. The temporal autocorrelation time for tower model-data residuals was found to be around 35 days for both VPRM and ORCHIDEE, but significantly different for the 5PM model with 76 days. This difference is caused by a few sites with large model-data bias. The spatial correlation of the model-data residuals for all models was found to be very short, up to few tens of km. Long spatial correlation lengths up to several hundreds of km were determined when synthetic data were used. Results from repeated aircraft transects in south-western France, are consistent with those obtained from the tower sites in terms of spatial autocorrelation (35 km on average) while temporal autocorrelation is markedly lower (13 days). Our findings suggest that the different prior models have a common temporal error structure. Separating the analysis of the statistics for the model data residuals by seasons did not result in any significant differences of the spatial correlation lengths.

Download Full-text

Fold-stratified cross-validation for unbiased and privacy-preserving federated learning

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa096 ◽

2020 ◽

Vol 27 (8) ◽

pp. 1244-1251

Author(s):

Romain Bey ◽

Romain Goussault ◽

François Grolleau ◽

Mehdi Benchoufi ◽

Raphaël Porcher

Keyword(s):

Medical Records ◽

Cross Validation ◽

Medical Information ◽

Synthetic Data ◽

Privacy Preserving ◽

Model Data ◽

Health Records ◽

Computational Overhead ◽

Preliminary Identification ◽

Mimic Iii

Abstract Objective We introduce fold-stratified cross-validation, a validation methodology that is compatible with privacy-preserving federated learning and that prevents data leakage caused by duplicates of electronic health records (EHRs). Materials and Methods Fold-stratified cross-validation complements cross-validation with an initial stratification of EHRs in folds containing patients with similar characteristics, thus ensuring that duplicates of a record are jointly present either in training or in validation folds. Monte Carlo simulations are performed to investigate the properties of fold-stratified cross-validation in the case of a model data analysis using both synthetic data and MIMIC-III (Medical Information Mart for Intensive Care-III) medical records. Results In situations in which duplicated EHRs could induce overoptimistic estimations of accuracy, applying fold-stratified cross-validation prevented this bias, while not requiring full deduplication. However, a pessimistic bias might appear if the covariate used for the stratification was strongly associated with the outcome. Discussion Although fold-stratified cross-validation presents low computational overhead, to be efficient it requires the preliminary identification of a covariate that is both shared by duplicated records and weakly associated with the outcome. When available, the hash of a personal identifier or a patient’s date of birth provides such a covariate. On the contrary, pseudonymization interferes with fold-stratified cross-validation, as it may break the equality of the stratifying covariate among duplicates. Conclusion Fold-stratified cross-validation is an easy-to-implement methodology that prevents data leakage when a model is trained on distributed EHRs that contain duplicates, while preserving privacy.

Download Full-text

Acoustic modeling and migration of stacked cross‐hole data

Geophysics ◽

10.1190/1.1442480 ◽

1988 ◽

Vol 53 (4) ◽

pp. 492-500 ◽

Cited By ~ 5

Author(s):

Xianhuai Zhu ◽

George A. McMechan

Keyword(s):

Line Source ◽

Synthetic Data ◽

Point Sources ◽

Scale Model ◽

Model Data ◽

Reverse Time ◽

Imaging Condition ◽

Surface Survey ◽

Excitation Time ◽

And Migration

Prestack computations for cross‐hole data are relatively expensive, as they are for prestack surface survey data. It is therefore of interest to develop methodologies for modeling and processing stacked cross‐hole data. In this context, stacking is over sources, not midpoints. Modeling with a line source produces data that are equivalent (by Huygen’s principle) to those obtained by stacking over a line of point sources. Reverse‐time finite‐difference migration may be applied to the resulting stacked section by generalizing the excitation‐ time imaging condition for a point source to a line source. Illustrations include successful applications to both synthetic data and scale‐model data.

Download Full-text

Bayesian inverse estimation of urban CO2 emissions: Results from a synthetic data simulation over Salt Lake City, UT

Elem Sci Anth ◽

10.1525/elementa.375 ◽

2019 ◽

Vol 7 ◽

Cited By ~ 3

Author(s):

Lewis Kunik ◽

Derek V. Mallia ◽

Kevin R. Gurney ◽

Daniel L. Mendoza ◽

Tomohiro Oda ◽

...

Keyword(s):

Transport Model ◽

Salt Lake ◽

Salt Lake City ◽

Synthetic Data ◽

Emission Inventories ◽

Model Data ◽

Mismatch Error ◽

Data Simulation ◽

Salt Lake Valley ◽

Lake City

Top-down, data-driven models possess ample power to improve the accuracy of bottom-up carbon dioxide (CO2) emission inventories, and more work is needed to explore the merger of top-down and bottom-up estimates to better inform the metrics used to monitor global CO2 fluxes. Here we present a Bayesian inverse modeling framework over Salt Lake City, Utah, which utilizes available CO2 emission inventories to establish a synthetic data simulation aimed at exploring model uncertainties. Prescribing a high-resolution, urban-scale data product (Hestia) as the “true” emissions in the model, we combine prior emissions with an atmospheric transport model to derive modeled afternoon CO2 enhancements at six monitoring sites within the Salt Lake Valley during the month of September 2015. A global high-resolution gridded emissions data product (ODIAC) is used as the prior, and objective uncertainty structures are defined for both the a priori estimates and the transport model-data relationship which consider non-negligible spatial and temporal covariances. Optimized (posterior) emissions over the Salt Lake Valley agree closely with the assumed “true” emissions during afternoon times, while results including unconstrained times (e.g. night-time) lack such agreement. Both spatial and temporal correlations of prior errors were found to be necessary for obtaining a robust posterior estimate. Model sensitivity analyses are performed, which examine correlation length and time scales, model-data mismatch error, and measurement site network variability. Through these analyses, one measurement site is identified as being particularly prone to introducing bias into posterior emissions due to influences from a nearby point source. Increasing model-data mismatch error at this site is shown to reduce bias in the posterior without significantly compromising agreement with monthly averaged true emissions.

Download Full-text

Bayesian statistical modeling of spatially correlated error structure in atmospheric tracer inverse analysis

Atmospheric Chemistry and Physics ◽

10.5194/acp-11-5365-2011 ◽

2011 ◽

Vol 11 (11) ◽

pp. 5365-5382 ◽

Cited By ~ 5

Author(s):

C. Mukherjee ◽

P. S. Kasibhatla ◽

M. West

Keyword(s):

Atmospheric Chemistry ◽

Synthetic Data ◽

Spatial Models ◽

Real Data ◽

Global Scale ◽

Model Assessment ◽

Error Structure ◽

Spatially Correlated ◽

Statistical Framework ◽

Satellite Retrievals

Abstract. We present and discuss the use of Bayesian modeling and computational methods for atmospheric chemistry inverse analyses that incorporate evaluation of spatial structure in model-data residuals. Motivated by problems of refining bottom-up estimates of source/sink fluxes of trace gas and aerosols based on satellite retrievals of atmospheric chemical concentrations, we address the need for formal modeling of spatial residual error structure in global scale inversion models. We do this using analytically and computationally tractable conditional autoregressive (CAR) spatial models as components of a global inversion framework. We develop Markov chain Monte Carlo methods to explore and fit these spatial structures in an overall statistical framework that simultaneously estimates source fluxes. Additional aspects of the study extend the statistical framework to utilize priors on source fluxes in a physically realistic manner, and to formally address and deal with missing data in satellite retrievals. We demonstrate the analysis in the context of inferring carbon monoxide (CO) sources constrained by satellite retrievals of column CO from the Measurement of Pollution in the Troposphere (MOPITT) instrument on the TERRA satellite, paying special attention to evaluating performance of the inverse approach using various statistical diagnostic metrics. This is developed using synthetic data generated to resemble MOPITT data to define a proof-of-concept and model assessment, and then in analysis of real MOPITT data. These studies demonstrate the ability of these simple spatial models to substantially improve over standard non-spatial models in terms of statistical fit, ability to recover sources in synthetic examples, and predictive match with real data.

Download Full-text

Migration in orthorhombic media: A prestack time-migration approach

Geophysics ◽

10.1190/geo2018-0552.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. C217-C227 ◽

Cited By ~ 1

Author(s):

Baoqing Tian ◽

Jiangjie Zhang

Keyword(s):

Synthetic Data ◽

Real Data ◽

High Resolution Imaging ◽

Model Data ◽

Data Set ◽

Realistic Case ◽

Time Migration ◽

Novel Approach ◽

Resolution Imaging ◽

Prestack Time Migration

High-resolution imaging has become more popular recently in exploration geophysics. Conventionally, geophysicists image the subsurface using the isotropy approximation. When considering the anisotropy effects, one can expect to obtain an imaging profile with higher accuracy than the isotropy approach allows. Orthorhombic anisotropy is considered an ideal approximation in the realistic case. It has been used in the industry for several years. Although being attractive, broad application of orthorhombic anisotropy has many problems to solve. We have developed a novel approach of prestack time migration in the orthorhombic case. The traveltime and amplitude of a wave propagating in orthorhombic media are calculated directly by launching new anisotropic velocity and anisotropic parameters. We validate our methods with synthetic data. We also highlight our methods with model data set and real data. The results found that our methods work well for prestack time migration in orthorhombic media.

Download Full-text

Bayesian statistical modeling of spatially correlated error structure in atmospheric tracer inverse analysis

Atmospheric Chemistry and Physics Discussions ◽

10.5194/acpd-11-1671-2011 ◽

2011 ◽

Vol 11 (1) ◽

pp. 1671-1713

Author(s):

C. Mukherjee ◽

P. S. Kasibhatla ◽

M. West

Keyword(s):

Atmospheric Chemistry ◽

Inverse Modeling ◽

Synthetic Data ◽

Spatial Models ◽

Global Scale ◽

Model Assessment ◽

Error Structure ◽

Spatially Correlated ◽

Statistical Framework ◽

Satellite Retrievals

Abstract. Inverse modeling applications in atmospheric chemistry are increasingly addressing the challenging statistical issues of data synthesis by adopting refined statistical analysis methods. This paper advances this line of research by addressing several central questions in inverse modeling, focusing specifically on Bayesian statistical computation. Motivated by problems of refining bottom-up estimates of source/sink fluxes of trace gas and aerosols based on increasingly high-resolution satellite retrievals of atmospheric chemical concentrations, we address head-on the need for integrating formal spatial statistical methods of residual error structure in global scale inversion models. We do this using analytically and computationally tractable spatial statistical models, know as conditional autoregressive spatial models, as components of a global inversion framework. We develop Markov chain Monte Carlo methods to explore and fit these spatial structures in an overall statistical framework that simultaneously estimates source fluxes. Additional aspects of the study extend the statistical framework to utilize priors in a more physically realistic manner, and to formally address and deal with missing data in satellite retrievals. We demonstrate the analysis in the context of inferring carbon monoxide (CO) sources constrained by satellite retrievals of column CO from the Measurement of Pollution in the Troposphere (MOPITT) instrument on the TERRA satellite, paying special attention to evaluating performance of the inverse approach using various statistical diagnostic metrics. This is developed using synthetic data generated to resemble MOPITT data to define a~proof-of-concept and model assessment, and then in analysis of real MOPITT data.

Download Full-text

Pooling Data Improves Multimodel IDF Estimates over Median-Based IDF Estimates: Analysis over the Susquehanna and Florida

Journal of Hydrometeorology ◽

10.1175/jhm-d-20-0180.1 ◽

2021 ◽

Vol 22 (4) ◽

pp. 971-995

Author(s):

Abhishekh Kumar Srivastava ◽

Richard Grotjahn ◽

Paul Aaron Ullrich ◽

Mojtaba Sadegh

Keyword(s):

Climate Models ◽

Climate Model ◽

Synthetic Data ◽

Extreme Value Distribution ◽

Precipitation Intensity ◽

Spatial And Temporal Variability ◽

Return Periods ◽

Model Data ◽

Estimation Uncertainty ◽

Almost All

AbstractTraditional multimodel methods for estimating future changes in precipitation intensity, duration, and frequency (IDF) curves rely on mean or median of models’ IDF estimates. Such multimodel estimates are impaired by large estimation uncertainty, shadowing their efficacy in planning efforts. Here, assuming that each climate model is one representation of the underlying data generating process, i.e., the Earth system, we propose a novel extension of current methods through pooling model data: (i) evaluate performance of climate models in simulating the spatial and temporal variability of the observed annual maximum precipitation (AMP), (ii) bias-correct and pool historical and future AMP data of reasonably performing models, and (iii) compute IDF estimates in a nonstationary framework from pooled historical and future model data. Pooling enhances fitting of the extreme value distribution to the data and assumes that data from reasonably performing models represent samples from the “true” underlying data generating distribution. Through Monte Carlo simulations with synthetic data, we show that return periods derived from pooled data have smaller biases and lesser uncertainty than those derived from ensembles of individual model data. We apply this method to NA-CORDEX models to estimate changes in 24-h precipitation intensity–frequency (PIF) estimates over the Susquehanna watershed and Florida peninsula. Our approach identifies significant future changes at more stations compared to median-based PIF estimates. The analysis suggests that almost all stations over the Susquehanna and at least two-thirds of the stations over the Florida peninsula will observe significant increases in 24-h precipitation for 2–100-yr return periods.

Download Full-text

Regional-scale geostatistical inverse modeling of North American CO<sub>2</sub> fluxes: a synthetic data study

Atmospheric Chemistry and Physics ◽

10.5194/acp-10-6151-2010 ◽

2010 ◽

Vol 10 (13) ◽

pp. 6151-6167 ◽

Cited By ~ 48

Author(s):

S. M. Gourdji ◽

A. I. Hirsch ◽

K. L. Mueller ◽

V. Yadav ◽

A. E. Andrews ◽

...

Keyword(s):

North America ◽

Flux Distribution ◽

Regional Scale ◽

Measurement Data ◽

Synthetic Data ◽

Surface Fluxes ◽

Co2 Fluxes ◽

Actual Measurement ◽

Diurnal Variability ◽

The Impact

Abstract. A series of synthetic data experiments is performed to investigate the ability of a regional atmospheric inversion to estimate grid-scale CO2 fluxes during the growing season over North America. The inversions are performed within a geostatistical framework without the use of any prior flux estimates or auxiliary variables, in order to focus on the atmospheric constraint provided by the nine towers collecting continuous, calibrated CO2 measurements in 2004. Using synthetic measurements and their associated concentration footprints, flux and model-data mismatch covariance parameters are first optimized, and then fluxes and their uncertainties are estimated at three different temporal resolutions. These temporal resolutions, which include a four-day average, a four-day-average diurnal cycle with 3-hourly increments, and 3-hourly fluxes, are chosen to help assess the impact of temporal aggregation errors on the estimated fluxes and covariance parameters. Estimating fluxes at a temporal resolution that can adjust the diurnal variability is found to be critical both for recovering covariance parameters directly from the atmospheric data, and for inferring accurate ecoregion-scale fluxes. Accounting for both spatial and temporal a priori covariance in the flux distribution is also found to be necessary for recovering accurate a posteriori uncertainty bounds on the estimated fluxes. Overall, the results suggest that even a fairly sparse network of 9 towers collecting continuous CO2 measurements across the continent, used with no auxiliary information or prior estimates of the flux distribution in time or space, can be used to infer relatively accurate monthly ecoregion scale CO2 surface fluxes over North America within estimated uncertainty bounds. Simulated random transport error is shown to decrease the quality of flux estimates in under-constrained areas at the ecoregion scale, although the uncertainty bounds remain realistic. While these synthetic data inversions do not consider all potential issues associated with using actual measurement data, e.g. systematic transport errors or problems with the boundary conditions, they help to highlight the impact of inversion setup choices, and help to provide a baseline set of CO2 fluxes for comparison with estimates from future real-data inversions.

Download Full-text

Migration with the full acoustic wave equation

Geophysics ◽

10.1190/1.1441498 ◽

1983 ◽

Vol 48 (6) ◽

pp. 677-687 ◽

Cited By ~ 47

Author(s):

Dan D. Kosloff ◽

Edip Baysal

Keyword(s):

Wave Equation ◽

Finite Difference ◽

Acoustic Wave ◽

Physical Model ◽

Wave Equations ◽

Synthetic Data ◽

Acoustic Wave Equation ◽

Model Data ◽

Lateral Velocity ◽

Alternative Approach

Conventional finite‐difference migration has relied on one‐way wave equations which allow energy to propagate only downward. Although generally reliable, such equations may not give accurate migration when the structures have strong lateral velocity variations or steep dips. The present study examined an alternative approach based on the full acoustic wave equation. The migration algorithm which developed from this equation was tested against synthetic data and against physical model data. The results indicated that such a scheme gives accurate migration for complicated structures.

Download Full-text