Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy  c -means cluster analyses

Inversions of an individual geophysical data set can be highly nonunique, and it is generally difficult to determine petrophysical parameters from geophysical data. We show that both issues can be addressed by adopting a statistical multiparameter approach that requires the acquisition, processing, and separate inversion of two or more types of geophysical data. To combine information contained in the physical-property models that result from inverting the individual data sets and to estimate the spatial distribution of petrophysical parameters in regions where they are known at only a few locations, we demonstrate the potential of the fuzzy [Formula: see text]-means (FCM) clustering technique. After testing this new approach on synthetic data, we apply it to limited crosshole georadar, crosshole seismic, gamma-log, and slug-test data acquired within a shallow alluvial aquifer. The derived multiparameter model effectively outlines the major sedimentary units observed in numerous boreholes and provides plausible estimates for the spatial distributions of gamma-ray emitters and hydraulic conductivity.

Download Full-text

Methods for estimating uncertainty in factor analytic solutions

Atmospheric Measurement Techniques ◽

10.5194/amt-7-781-2014 ◽

2014 ◽

Vol 7 (3) ◽

pp. 781-797 ◽

Cited By ~ 174

Author(s):

P. Paatero ◽

S. Eberly ◽

S. G. Brown ◽

G. A. Norris

Keyword(s):

Environmental Protection Agency ◽

Synthetic Data ◽

Analytic Solutions ◽

Data Sets ◽

Random Errors ◽

Data Set ◽

Factor Analytic ◽

Uncertainty Estimates ◽

Multilinear Engine ◽

Analytic Models

Abstract. The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement of factor elements (BS-DISP). The goal of these methods is to capture the uncertainty of PMF analyses due to random errors and rotational ambiguity. It is shown that the three methods complement each other: depending on characteristics of the data set, one method may provide better results than the other two. Results are presented using synthetic data sets, including interpretation of diagnostics, and recommendations are given for parameters to report when documenting uncertainty estimates from EPA PMF or ME-2 applications.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

Sparse reflectivity inversion for nonstationary seismic data with surface-related multiples: Numerical and field-data experiments

Geophysics ◽

10.1190/geo2016-0520.1 ◽

2017 ◽

Vol 82 (3) ◽

pp. R199-R217 ◽

Cited By ~ 3

Author(s):

Xintao Chai ◽

Shangxu Wang ◽

Genyang Tang

Keyword(s):

Seismic Data ◽

Resolution Enhancement ◽

Synthetic Data ◽

Data Sets ◽

Data Set ◽

Anelastic Attenuation ◽

Seismic Resolution ◽

Text Filtering ◽

The Stability ◽

Reflectivity Inversion

Seismic data are nonstationary due to subsurface anelastic attenuation and dispersion effects. These effects, also referred to as the earth’s [Formula: see text]-filtering effects, can diminish seismic resolution. We previously developed a method of nonstationary sparse reflectivity inversion (NSRI) for resolution enhancement, which avoids the intrinsic instability associated with inverse [Formula: see text] filtering and generates superior [Formula: see text] compensation results. Applying NSRI to data sets that contain multiples (addressing surface-related multiples only) requires a demultiple preprocessing step because NSRI cannot distinguish primaries from multiples and will treat them as interference convolved with incorrect [Formula: see text] values. However, multiples contain information about subsurface properties. To use information carried by multiples, with the feedback model and NSRI theory, we adapt NSRI to the context of nonstationary seismic data with surface-related multiples. Consequently, not only are the benefits of NSRI (e.g., circumventing the intrinsic instability associated with inverse [Formula: see text] filtering) extended, but also multiples are considered. Our method is limited to be a 1D implementation. Theoretical and numerical analyses verify that given a wavelet, the input [Formula: see text] values primarily affect the inverted reflectivities and exert little effect on the estimated multiples; i.e., multiple estimation need not consider [Formula: see text] filtering effects explicitly. However, there are benefits for NSRI considering multiples. The periodicity and amplitude of the multiples imply the position of the reflectivities and amplitude of the wavelet. Multiples assist in overcoming scaling and shifting ambiguities of conventional problems in which multiples are not considered. Experiments using a 1D algorithm on a synthetic data set, the publicly available Pluto 1.5 data set, and a marine data set support the aforementioned findings and reveal the stability, capabilities, and limitations of the proposed method.

Download Full-text

Global whole-rock geochemical database compilation

10.5194/essd-2019-50 ◽

2019 ◽

Cited By ~ 2

Author(s):

Matthew Gard ◽

Derrick Hasterok ◽

Jacqueline Halpin

Keyword(s):

Database Management ◽

Temporal Trends ◽

Physical Property ◽

Database Management System ◽

Geochemical Data ◽

Data Sets ◽

Unique Contribution ◽

Data Set ◽

Wide Range ◽

Geochemical Indices

Abstract. Dissemination and collation of geochemical data are critical to promote rapid, creative and accurate research and place new results in an appropriate global context. To this end, we have assembled a global whole-rock geochemical database, with other associated sample information and properties, sourced from various existing databases and supplemented with numerous individual publications and corrections. Currently the database stands at 1,023,490 samples with varying amounts of associated information including major and trace element concentrations, isotopic ratios, and location data. The distribution both spatially and temporally is quite heterogeneous, however temporal distributions are enhanced over some previous database compilations, particularly in terms of ages older than ~ 1000 Ma. Also included are a wide range of computed geochemical indices, physical property estimates and naming schema on a major element normalized version of the geochemical data for quick reference. This compilation will be useful for geochemical studies requiring extensive data sets, in particular those wishing to investigate secular temporal trends. The addition of physical properties, estimated by sample chemistry, represents a unique contribution to otherwise similar geochemical databases. The data is published in .csv format for the purposes of simple distribution but exists in a format acceptable for database management systems (e.g. SQL). One can either manipulate this data using conventional analysis tools such as MATLAB®, Microsoft® Excel, or R, or upload to a relational database management system for easy querying and management of the data as unique keys already exist. This data set will continue to grow, and we encourage readers to contact us or other database compilations contained within about any data that is yet to be included. The data files described in this paper are available at https://doi.org/10.5281/zenodo.2592823 (Gard et al., 2019).

Download Full-text

Marlim R3D: A realistic model for controlled-source electromagnetic simulations — Phase 2: The controlled-source electromagnetic data set

Geophysics ◽

10.1190/geo2018-0452.1 ◽

2019 ◽

Vol 84 (5) ◽

pp. E293-E299

Author(s):

Jorlivan L. Correa ◽

Paulo T. L. Menezes

Keyword(s):

A Priori ◽

Synthetic Data ◽

Realistic Model ◽

Earth Model ◽

Data Sets ◽

Data Set ◽

Geoelectric Model ◽

Controlled Source ◽

The North ◽

Electromagnetic Simulations

Synthetic data provided by geoelectric earth models are a powerful tool to evaluate a priori a controlled-source electromagnetic (CSEM) workflow effectiveness. Marlim R3D (MR3D) is an open-source complex and realistic geoelectric model for CSEM simulations of the postsalt turbiditic reservoirs at the Brazilian offshore margin. We have developed a 3D CSEM finite-difference time-domain forward study to generate the full-azimuth CSEM data set for the MR3D earth model. To that end, we fabricated a full-azimuth survey with 45 towlines striking the north–south and east–west directions over a total of 500 receivers evenly spaced at 1 km intervals along the rugged seafloor of the MR3D model. To correctly represent the thin, disconnected, and complex geometries of the studied reservoirs, we have built a finely discretized mesh of [Formula: see text] cells leading to a large mesh with a total of approximately 90 million cells. We computed the six electromagnetic field components (Ex, Ey, Ez, Hx, Hy, and Hz) at six frequencies in the range of 0.125–1.25 Hz. In our efforts to mimic noise in real CSEM data, we summed to the data a multiplicative noise with a 1% standard deviation. Both CSEM data sets (noise free and noise added), with inline and broadside geometries, are distributed for research or commercial use, under the Creative Common License, at the Zenodo platform.

Download Full-text

The GTC exoplanet transit spectroscopy survey

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834063 ◽

2019 ◽

Vol 622 ◽

pp. A172 ◽

Cited By ~ 7

Author(s):

F. Murgas ◽

G. Chen ◽

E. Pallé ◽

L. Nortmann ◽

G. Nowak

Keyword(s):

Transmission Spectrum ◽

Rayleigh Scattering ◽

Extrasolar Planets ◽

Light Curves ◽

Data Sets ◽

Transmission Spectra ◽

Data Set ◽

Systematic Effects ◽

The Individual ◽

Spectrum Slope

Context. Rayleigh scattering in a hydrogen-dominated exoplanet atmosphere can be detected using ground- or space-based telescopes. However, stellar activity in the form of spots can mimic Rayleigh scattering in the observed transmission spectrum. Quantifying this phenomena is key to our correct interpretation of exoplanet atmospheric properties. Aims. We use the ten-meter Gran Telescopio Canarias (GTC) telescope to carry out a ground-based transmission spectra survey of extrasolar planets to characterize their atmospheres. In this paper we investigate the exoplanet HAT-P-11b, a Neptune-sized planet orbiting an active K-type star. Methods. We obtained long-slit optical spectroscopy of two transits of HAT-P-11b with the Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy (OSIRIS) on August 30, 2016 and September 25, 2017. We integrated the spectrum of HAT-P-11 and one reference star in several spectroscopic channels across the λ ~ 400–785 nm region, creating numerous light curves of the transits. We fit analytic transit curves to the data taking into account the systematic effects and red noise present in the time series in an effort to measure the change of the planet-to-star radius ratio (Rp∕Rs) across wavelength. Results. By fitting both transits together, we find a slope in the transmission spectrum showing an increase of the planetary radius towards blue wavelengths. Closer inspection of the transmission spectrum of the individual data sets reveals that the first transit presents this slope while the transmission spectrum of the second data set is flat. Additionally, we detect hints of Na absorption on the first night, but not on the second. We conclude that the transmission spectrum slope and Na absorption excess found in the first transit observation are caused by unocculted stellar spots. Modeling the contribution of unocculted spots to reproduce the results of the first night we find a spot filling factor of δ = 0.62−0.17+0.20 and a spot-to-photosphere temperature difference of ΔT = 429−299+184 K.

Download Full-text

A classification approach based on variable precision rough sets and cluster validity index function

Engineering Computations ◽

10.1108/ec-11-2012-0297 ◽

2014 ◽

Vol 31 (8) ◽

pp. 1778-1789

Author(s):

Hongkang Lin

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Index Method ◽

Data Set ◽

Content Type ◽

The Individual ◽

Variable Precision Rough Sets ◽

Optimal Number Of Clusters

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.

Download Full-text

Detection of sharp lateral discontinuities through the analysis of surface-wave propagation

Geophysics ◽

10.1190/geo2013-0314.1 ◽

2014 ◽

Vol 79 (4) ◽

pp. EN77-EN90 ◽

Cited By ~ 19

Author(s):

Paolo Bergamo ◽

Laura Valentina Socco

Keyword(s):

Surface Wave ◽

Fault Location ◽

Synthetic Data ◽

Finite Element Method Simulation ◽

Fault System ◽

Energy Concentration ◽

Data Sets ◽

Data Set ◽

Shallow Subsurface ◽

Velocity Models

Surface-wave (SW) techniques are mainly used to retrieve 1D velocity models and are therefore characterized by a 1D approach, which might prove unsatisfactory when relevant 2D effects are present in the investigated subsurface. In the case of sharp and sudden lateral heterogeneities in the subsurface, a strategy to tackle this limitation is to estimate the location of the discontinuities and to separately process seismic traces belonging to quasi-1D subsurface portions. We have addressed our attention to methods aimed at locating discontinuities by identifying anomalies in SW propagation and attenuation. The considered methods are the autospectrum computation and the attenuation analysis of Rayleigh waves (AARW). These methods were developed for purposes and/or scales of analysis that are different from those of this work, which aims at detecting and characterizing sharp subvertical discontinuities in the shallow subsurface. We applied both methods to two data sets, synthetic data from a finite-element method simulation and a field data set acquired over a fault system, both presenting an abrupt lateral variation perpendicularly crossing the acquisition line. We also extended the AARW method to the detection of sharp discontinuities from large and multifold data sets and we tested these novel procedures on the field case. The two methods are proven to be effective for the detection of the discontinuity, by portraying propagation phenomena linked to the presence of the heterogeneity, such as the interference between incident and reflected wavetrains, and energy concentration as well as subsequent decay at the fault location. The procedures we developed for the processing of multifold seismic data set showed to be reliable tools in locating and characterizing subvertical sharp heterogeneities.

Download Full-text

A Bayesian approach to modeling 2D gravity data using polygons

Geophysics ◽

10.1190/geo2016-0153.1 ◽

2017 ◽

Vol 82 (1) ◽

pp. G1-G21 ◽

Cited By ~ 3

Author(s):

William J. Titus ◽

Sarah J. Titus ◽

Joshua R. Davis

Keyword(s):

Gravity Data ◽

Synthetic Data ◽

Gravity Inversion ◽

Limiting Factor ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Occupancy Probability ◽

Compute Model ◽

Parameter Values

We apply a Bayesian Markov chain Monte Carlo formalism to the gravity inversion of a single localized 2D subsurface object. The object is modeled as a polygon described by five parameters: the number of vertices, a density contrast, a shape-limiting factor, and the width and depth of an encompassing container. We first constrain these parameters with an interactive forward model and explicit geologic information. Then, we generate an approximate probability distribution of polygons for a given set of parameter values. From these, we determine statistical distributions such as the variance between the observed and model fields, the area, the center of area, and the occupancy probability (the probability that a spatial point lies within the subsurface object). We introduce replica exchange to mitigate trapping in local optima and to compute model probabilities and their uncertainties. We apply our techniques to synthetic data sets and a natural data set collected across the Rio Grande Gorge Bridge in New Mexico. On the basis of our examples, we find that the occupancy probability is useful in visualizing the results, giving a “hazy” cross section of the object. We also find that the role of the container is important in making predictions about the subsurface object.

Download Full-text

An Improved Classification Analysis on Utility Aware K-Anonymized Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7748 ◽

2019 ◽

Vol 16 (2) ◽

pp. 445-452

Author(s):

Kishore S. Verma ◽

A. Rajesh ◽

Adeline J. S. Johnsana

Keyword(s):

Data Mining ◽

Analytical Approach ◽

Value Added ◽

Data Sets ◽

Data Set ◽

Privacy Preserving Data Mining ◽

Privacy Leakage ◽

Anonymized Data ◽

Null Values ◽

The Individual

K anonymization is one of the worldwide used approaches to protect the individual records from the privacy leakage attack of Privacy Preserving Data Mining (PPDM) arena. Typically anonymized dataset will impact the effectiveness of data mining results. Anyhow, currently researchers of PPDM progress in driving their efforts in finding out the optimum trade-off between privacy and utility. This work tends in bringing out the optimum classifier from a set of best classifiers of data mining approaches that are capable enough in generating value-added classifying results on utility aware k-anonymized data set. We performed the analytical approach on the data set that are anonymized in sense of accompanying the anonymity utility factors like null values count and transformation pattern loss. The experimentation is done with three widely used classifiers HNB, PART and J48 and these classifiers are analysed with Accuracy, F-measure, and ROC-AUC which are literately proved to be the perfect measures of classification. Our experimental analysis reveals the best classifiers on the utility aware anonymized data sets of Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA).

Download Full-text