Methods for estimating uncertainty in factor analytic solutions

Abstract. EPA PMF version 5.0 and the underlying multilinear engine executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.

Download Full-text

Methods for estimating uncertainty in factor analytic solutions

Atmospheric Measurement Techniques ◽

10.5194/amt-7-781-2014 ◽

2014 ◽

Vol 7 (3) ◽

pp. 781-797 ◽

Cited By ~ 174

Author(s):

P. Paatero ◽

S. Eberly ◽

S. G. Brown ◽

G. A. Norris

Keyword(s):

Environmental Protection Agency ◽

Synthetic Data ◽

Analytic Solutions ◽

Data Sets ◽

Random Errors ◽

Data Set ◽

Factor Analytic ◽

Uncertainty Estimates ◽

Multilinear Engine ◽

Analytic Models

Abstract. The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

Sparse reflectivity inversion for nonstationary seismic data with surface-related multiples: Numerical and field-data experiments

Geophysics ◽

10.1190/geo2016-0520.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. R199-R217 ◽

Cited By ~ 3

Author(s):

Xintao Chai ◽

Shangxu Wang ◽

Genyang Tang

Keyword(s):

Seismic Data ◽

Resolution Enhancement ◽

Synthetic Data ◽

Data Sets ◽

Data Set ◽

Anelastic Attenuation ◽

Seismic Resolution ◽

Text Filtering ◽

The Stability ◽

Reflectivity Inversion

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.

Download Full-text

Marlim R3D: A realistic model for controlled-source electromagnetic simulations — Phase 2: The controlled-source electromagnetic data set

Geophysics ◽

10.1190/geo2018-0452.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. E293-E299

Author(s):

Jorlivan L. Correa ◽

Paulo T. L. Menezes

Keyword(s):

A Priori ◽

Synthetic Data ◽

Realistic Model ◽

Earth Model ◽

Data Sets ◽

Data Set ◽

Geoelectric Model ◽

Controlled Source ◽

The North ◽

Electromagnetic Simulations

Synthetic data provided by geoelectric earth models are a powerful tool to evaluate a priori a controlled-source electromagnetic (CSEM) workflow effectiveness. Marlim R3D (MR3D) is an open-source complex and realistic geoelectric model for CSEM simulations of the postsalt turbiditic reservoirs at the Brazilian offshore margin. We have developed a 3D CSEM finite-difference time-domain forward study to generate the full-azimuth CSEM data set for the MR3D earth model. To that end, we fabricated a full-azimuth survey with 45 towlines striking the north–south and east–west directions over a total of 500 receivers evenly spaced at 1 km intervals along the rugged seafloor of the MR3D model. To correctly represent the thin, disconnected, and complex geometries of the studied reservoirs, we have built a finely discretized mesh of [Formula: see text] cells leading to a large mesh with a total of approximately 90 million cells. We computed the six electromagnetic field components (Ex, Ey, Ez, Hx, Hy, and Hz) at six frequencies in the range of 0.125–1.25 Hz. In our efforts to mimic noise in real CSEM data, we summed to the data a multiplicative noise with a 1% standard deviation. Both CSEM data sets (noise free and noise added), with inline and broadside geometries, are distributed for research or commercial use, under the Creative Common License, at the Zenodo platform.

Download Full-text

A super-Earth and a mini-Neptune around Kepler-59

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz3369 ◽

2019 ◽

Vol 491 (4) ◽

pp. 5238-5247 ◽

Cited By ~ 1

Author(s):

X Saad-Olivera ◽

C F Martinez ◽

A Costa de Souza ◽

F Roig ◽

D Nesvorný

Keyword(s):

Stability Analysis ◽

Bayesian Inference ◽

Spectroscopic Analysis ◽

The Other ◽

Inversion Method ◽

Outer Planet ◽

Data Sets ◽

Dynamical Study ◽

Data Set ◽

Orbital Parameters

ABSTRACT We characterize the radii and masses of the star and planets in the Kepler-59 system, as well as their orbital parameters. The star parameters are determined through a standard spectroscopic analysis, resulting in a mass of $1.359\pm 0.155\, \mathrm{M}_\odot$ and a radius of $1.367\pm 0.078\, \mathrm{R}_\odot$. The obtained planetary radii are $1.5\pm 0.1\, R_\oplus$ for the inner and $2.2\pm 0.1\, R_\oplus$ for the outer planet. The orbital parameters and the planetary masses are determined by the inversion of Transit Timing Variations (TTV) signals. We consider two different data sets: one provided by Holczer et al. (2016), with TTVs only for Kepler-59c, and the other provided by Rowe et al. (2015), with TTVs for both planets. The inversion method applies an algorithm of Bayesian inference (MultiNest) combined with an efficient N-body integrator (Swift). For each of the data set, we found two possible solutions, both having the same probability according to their corresponding Bayesian evidences. All four solutions appear to be indistinguishable within their 2-σ uncertainties. However, statistical analyses show that the solutions from Rowe et al. (2015) data set provide a better characterization. The first solution infers masses of $5.3_{-2.1}^{+4.0}~M_{\mathrm{\oplus }}$ and $4.6_{-2.0}^{+3.6}~M_{\mathrm{\oplus }}$ for the inner and outer planet, respectively, while the second solution gives masses of $3.0^{+0.8}_{-0.8}~M_{\mathrm{\oplus }}$ and $2.6^{+0.9}_{-0.8}~M_{\mathrm{\oplus }}$. These values point to a system with an inner super-Earth and an outer mini-Neptune. A dynamical study shows that the planets have almost co-planar orbits with small eccentricities (e < 0.1), close to the 3:2 mean motion resonance. A stability analysis indicates that this configuration is stable over million years of evolution.

Download Full-text

Loi and Gong Low-Degree Rotational Splittings

Symposium - International Astronomical Union ◽

10.1017/s0074180900238515 ◽

1998 ◽

Vol 185 ◽

pp. 167-168

Author(s):

T. Appourchaux ◽

M.C. Rabello-Soares ◽

L. Gizon

Keyword(s):

Time Series ◽

The Other ◽

Data Sets ◽

Data Set ◽

Low Degree ◽

Fourier Spectra

Two different data sets have been used to derive low-degree rotational splittings. One data set comes from the Luminosity Oscillations Imager of VIRGO on board SOHO; the observation starts on 27 March 96 and ends on 26 March 97, and are made of intensity time series of 12 pixels (Appourchaux et al, 1997, Sol. Phys., 170, 27). The other data set was kindly made available by the GONG project; the observation starts on 26 August 1995 and ends on 21 August 1996, and are made of complex Fourier spectra of velocity time series for l = 0 − 9. For the GONG data, the contamination of l = 1 from the spatial aliases of l = 6 and l = 9 required some cleaning. To achieve this, we applied the inverse of the leakage matrix of l = 1, 6 and 9 to the original Fourier spectra of the same degrees; cleaning of all 3 degrees was achieved simultaneously (Appourchaux and Gizon, 1997, these proceedings).

Download Full-text

Detection of sharp lateral discontinuities through the analysis of surface-wave propagation

Geophysics ◽

10.1190/geo2013-0314.1 ◽

2014 ◽

Vol 79 (4) ◽

pp. EN77-EN90 ◽

Cited By ~ 19

Author(s):

Paolo Bergamo ◽

Laura Valentina Socco

Keyword(s):

Surface Wave ◽

Fault Location ◽

Synthetic Data ◽

Finite Element Method Simulation ◽

Fault System ◽

Energy Concentration ◽

Data Sets ◽

Data Set ◽

Shallow Subsurface ◽

Velocity Models

Surface-wave (SW) techniques are mainly used to retrieve 1D velocity models and are therefore characterized by a 1D approach, which might prove unsatisfactory when relevant 2D effects are present in the investigated subsurface. In the case of sharp and sudden lateral heterogeneities in the subsurface, a strategy to tackle this limitation is to estimate the location of the discontinuities and to separately process seismic traces belonging to quasi-1D subsurface portions. We have addressed our attention to methods aimed at locating discontinuities by identifying anomalies in SW propagation and attenuation. The considered methods are the autospectrum computation and the attenuation analysis of Rayleigh waves (AARW). These methods were developed for purposes and/or scales of analysis that are different from those of this work, which aims at detecting and characterizing sharp subvertical discontinuities in the shallow subsurface. We applied both methods to two data sets, synthetic data from a finite-element method simulation and a field data set acquired over a fault system, both presenting an abrupt lateral variation perpendicularly crossing the acquisition line. We also extended the AARW method to the detection of sharp discontinuities from large and multifold data sets and we tested these novel procedures on the field case. The two methods are proven to be effective for the detection of the discontinuity, by portraying propagation phenomena linked to the presence of the heterogeneity, such as the interference between incident and reflected wavetrains, and energy concentration as well as subsequent decay at the fault location. The procedures we developed for the processing of multifold seismic data set showed to be reliable tools in locating and characterizing subvertical sharp heterogeneities.

Download Full-text

A Bayesian approach to modeling 2D gravity data using polygons

Geophysics ◽

10.1190/geo2016-0153.1 ◽

2017 ◽

Vol 82 (1) ◽

pp. G1-G21 ◽

Cited By ~ 3

Author(s):

William J. Titus ◽

Sarah J. Titus ◽

Joshua R. Davis

Keyword(s):

Gravity Data ◽

Synthetic Data ◽

Gravity Inversion ◽

Limiting Factor ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Occupancy Probability ◽

Compute Model ◽

Parameter Values

We apply a Bayesian Markov chain Monte Carlo formalism to the gravity inversion of a single localized 2D subsurface object. The object is modeled as a polygon described by five parameters: the number of vertices, a density contrast, a shape-limiting factor, and the width and depth of an encompassing container. We first constrain these parameters with an interactive forward model and explicit geologic information. Then, we generate an approximate probability distribution of polygons for a given set of parameter values. From these, we determine statistical distributions such as the variance between the observed and model fields, the area, the center of area, and the occupancy probability (the probability that a spatial point lies within the subsurface object). We introduce replica exchange to mitigate trapping in local optima and to compute model probabilities and their uncertainties. We apply our techniques to synthetic data sets and a natural data set collected across the Rio Grande Gorge Bridge in New Mexico. On the basis of our examples, we find that the occupancy probability is useful in visualizing the results, giving a “hazy” cross section of the object. We also find that the role of the container is important in making predictions about the subsurface object.

Download Full-text

Acoustic and elastic numerical wave simulations by recursive spatial derivative operators

Geophysics ◽

10.1190/1.3485217 ◽

2010 ◽

Vol 75 (6) ◽

pp. T167-T174 ◽

Cited By ~ 45

Author(s):

Dan Kosloff ◽

Reynam C. Pestana ◽

Hillel Tal-Ezer

Keyword(s):

Synthetic Data ◽

Analytic Solutions ◽

Numerical Dispersion ◽

Reverse Time ◽

Reverse Time Migration ◽

Data Set ◽

Lamb’S Problem ◽

Dynamic Elasticity ◽

Time Migration ◽

Recursive Operators

A new scheme for the calculation of spatial derivatives has been developed. The technique is based on recursive derivative operators that are generated by an [Formula: see text] fit in the spectral domain. The use of recursive operators enables us to extend acoustic and elastic wave simulations to shorter wavelengths. The method is applied to the numerical solution of the 2D acoustic wave equation and to the solution of the equations of 2D dynamic elasticity in an isotropic medium. An example of reverse-time migration of a synthetic data set shows that the numerical dispersion can be significantly reduced with respect to schemes that are based on finite differences. The method is tested for the solutions of the equations of dynamic elasticity by comparing numerical and analytic solutions to Lamb’s problem.

Download Full-text

A periodically varying code for improving deblending of simultaneous sources in marine acquisition

Geophysics ◽

10.1190/geo2015-0447.1 ◽

2016 ◽

Vol 81 (3) ◽

pp. V213-V225 ◽

Cited By ~ 80

Author(s):

Shaohuan Zu ◽

Hui Zhou ◽

Yangkang Chen ◽

Shan Qu ◽

Xiaofeng Zou ◽

...

Keyword(s):

Field Data ◽

Synthetic Data ◽

Data Sets ◽

Ocean Bottom ◽

Data Set ◽

Acceptable Model ◽

Model Subspace ◽

New Form ◽

Simultaneous Source ◽

Practical Field

We have designed a periodically varying code that can avoid the problem of the local coherency and make the interference distribute uniformly in a given range; hence, it was better at suppressing incoherent interference (blending noise) and preserving coherent useful signals compared with a random dithering code. We have also devised a new form of the iterative method to remove interference generated from the simultaneous source acquisition. In each iteration, we have estimated the interference using the blending operator following the proposed formula and then subtracted the interference from the pseudodeblended data. To further eliminate the incoherent interference and constrain the inversion, the data were then transformed to an auxiliary sparse domain for applying a thresholding operator. During the iterations, the threshold was decreased from the largest value to zero following an exponential function. The exponentially decreasing threshold aimed to gradually pass the deblended data to a more acceptable model subspace. Two numerically blended synthetic data sets and one numerically blended practical field data set from an ocean bottom cable were used to demonstrate the usefulness of our proposed method and the better performance of the periodically varying code over the traditional random dithering code.

Download Full-text