Global whole-rock geochemical database compilation

Abstract. Dissemination and collation of geochemical data are critical to promote rapid, creative and accurate research and place new results in an appropriate global context. To this end, we have assembled a global whole-rock geochemical database, with other associated sample information and properties, sourced from various existing databases and supplemented with numerous individual publications and corrections. Currently the database stands at 1,023,490 samples with varying amounts of associated information including major and trace element concentrations, isotopic ratios, and location data. The distribution both spatially and temporally is quite heterogeneous, however temporal distributions are enhanced over some previous database compilations, particularly in terms of ages older than ~ 1000 Ma. Also included are a wide range of computed geochemical indices, physical property estimates and naming schema on a major element normalized version of the geochemical data for quick reference. This compilation will be useful for geochemical studies requiring extensive data sets, in particular those wishing to investigate secular temporal trends. The addition of physical properties, estimated by sample chemistry, represents a unique contribution to otherwise similar geochemical databases. The data is published in .csv format for the purposes of simple distribution but exists in a format acceptable for database management systems (e.g. SQL). One can either manipulate this data using conventional analysis tools such as MATLAB®, Microsoft® Excel, or R, or upload to a relational database management system for easy querying and management of the data as unique keys already exist. This data set will continue to grow, and we encourage readers to contact us or other database compilations contained within about any data that is yet to be included. The data files described in this paper are available at https://doi.org/10.5281/zenodo.2592823 (Gard et al., 2019).

Download Full-text

Global whole-rock geochemical database compilation

Earth System Science Data ◽

10.5194/essd-11-1553-2019 ◽

2019 ◽

Vol 11 (4) ◽

pp. 1553-1566 ◽

Cited By ~ 8

Author(s):

Matthew Gard ◽

Derrick Hasterok ◽

Jacqueline A. Halpin

Keyword(s):

Database Management ◽

Temporal Trends ◽

Temporal Distribution ◽

Physical Property ◽

Database Management System ◽

Geochemical Data ◽

Data Sets ◽

Unique Contribution ◽

Data Set ◽

Geochemical Indices

Abstract. Collation and dissemination of geochemical data are critical to promote rapid, creative, and accurate research and place new results in an appropriate global context. To this end, we have compiled a global whole-rock geochemical database, sourced from various existing databases and supplemented with an extensive list of individual publications. Currently the database stands at 1 022 092 samples with varying amounts of associated sample data, including major and trace element concentrations, isotopic ratios, and location information. Spatial and temporal distribution is heterogeneous; however, temporal distributions are enhanced over some previous database compilations, particularly in ages older than ∼ 1000 Ma. Also included are a range of geochemical indices, various naming schema, and physical property estimates computed on a major element normalized version of the geochemical data for quick reference. This compilation will be useful for geochemical studies requiring extensive data sets, in particular those wishing to investigate secular temporal trends. The addition of physical properties, estimated from sample chemistry, represents a unique contribution to otherwise similar geochemical databases. The data are published in .csv format for the purposes of simple distribution, but exist in a structure format acceptable for database management systems (e.g. SQL). One can either manipulate these data using conventional analysis tools such as MATLAB®, Microsoft® Excel, or R, or upload them to a relational database management system for easy querying and management of the data as unique keys already exist. The data set will continue to grow and be improved, and we encourage readers to contact us or other database compilations within about any data that are yet to be included. The data files described in this paper are available at https://doi.org/10.5281/zenodo.2592822 (Gard et al., 2019a).

Download Full-text

A Visual and VAE Based Hierarchical Indoor Localization Method

Sensors ◽

10.3390/s21103406 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3406

Author(s):

Jie Jiang ◽

Yin Zou ◽

Lidong Chen ◽

Yujie Fang

Keyword(s):

Image Retrieval ◽

Indoor Localization ◽

Data Sets ◽

Indoor Environments ◽

Global Features ◽

Data Set ◽

Data Annotation ◽

Wide Range ◽

Annotation Costs ◽

Global And Local

Precise localization and pose estimation in indoor environments are commonly employed in a wide range of applications, including robotics, augmented reality, and navigation and positioning services. Such applications can be solved via visual-based localization using a pre-built 3D model. The increase in searching space associated with large scenes can be overcome by retrieving images in advance and subsequently estimating the pose. The majority of current deep learning-based image retrieval methods require labeled data, which increase data annotation costs and complicate the acquisition of data. In this paper, we propose an unsupervised hierarchical indoor localization framework that integrates an unsupervised network variational autoencoder (VAE) with a visual-based Structure-from-Motion (SfM) approach in order to extract global and local features. During the localization process, global features are applied for the image retrieval at the level of the scene map in order to obtain candidate images, and are subsequently used to estimate the pose from 2D-3D matches between query and candidate images. RGB images only are used as the input of the proposed localization system, which is both convenient and challenging. Experimental results reveal that the proposed method can localize images within 0.16 m and 4° in the 7-Scenes data sets and 32.8% within 5 m and 20° in the Baidu data set. Furthermore, our proposed method achieves a higher precision compared to advanced methods.

Download Full-text

Panoramic stitching of heterogeneous single-cell transcriptomic data

10.1101/371179 ◽

2018 ◽

Cited By ~ 17

Author(s):

Brian Hie ◽

Bryan Bryson ◽

Bonnie Berger

Keyword(s):

Single Cell ◽

Cell Types ◽

Data Sets ◽

Cell Type ◽

Data Set ◽

Wide Range ◽

Data Set Integration ◽

Biological Patterns ◽

Insight Into ◽

Comprehensive Reference

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Clusterdv, a simple density-based clustering method that is robust, general and automatic

10.1101/224840 ◽

2017 ◽

Author(s):

João C. Marques ◽

Michael B. Orger

Keyword(s):

Clustering Algorithm ◽

Underlying Structure ◽

Data Sets ◽

Natural Phenomena ◽

Cluster Number ◽

Data Set ◽

Density Peaks ◽

Wide Range ◽

Cluster Shape ◽

Fully Automatic

AbstractHow to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental data sets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.Author summarIt is common that natural phenomena produce groupings, or clusters, in data, that can reveal the underlying processes. However, the form of these clusters can vary arbitrarily, making it challenging to find a single algorithm that identifies their structure correctly, without prior knowledge of the number of groupings or their distribution. We describe a simple clustering algorithm that is fully automatic and is able to correctly identify the number and shape of groupings in data of many types. We expect this algorithm to be useful in finding unknown natural phenomena present in data from a wide range of scientific fields.

Download Full-text

Integration of diverse physical-property models: Subsurface zonation and petrophysical parameter estimation based on fuzzy c -means cluster analyses

Geophysics ◽

10.1190/1.2192927 ◽

2006 ◽

Vol 71 (3) ◽

pp. H33-H44 ◽

Cited By ~ 99

Author(s):

Hendrik Paasche ◽

Jens Tronicke ◽

Klaus Holliger ◽

Alan G. Green ◽

Hansruedi Maurer

Keyword(s):

Gamma Ray ◽

Synthetic Data ◽

Physical Property ◽

Geophysical Data ◽

Data Sets ◽

Data Set ◽

Petrophysical Parameters ◽

Fcm Clustering ◽

The Individual ◽

Combine Information

Inversions of an individual geophysical data set can be highly nonunique, and it is generally difficult to determine petrophysical parameters from geophysical data. We show that both issues can be addressed by adopting a statistical multiparameter approach that requires the acquisition, processing, and separate inversion of two or more types of geophysical data. To combine information contained in the physical-property models that result from inverting the individual data sets and to estimate the spatial distribution of petrophysical parameters in regions where they are known at only a few locations, we demonstrate the potential of the fuzzy [Formula: see text]-means (FCM) clustering technique. After testing this new approach on synthetic data, we apply it to limited crosshole georadar, crosshole seismic, gamma-log, and slug-test data acquired within a shallow alluvial aquifer. The derived multiparameter model effectively outlines the major sedimentary units observed in numerous boreholes and provides plausible estimates for the spatial distributions of gamma-ray emitters and hydraulic conductivity.

Download Full-text

Poisson errors and adaptive rebinning in X-ray powder diffraction data

Powder Diffraction ◽

10.1017/s0885715618000726 ◽

2018 ◽

Vol 33 (4) ◽

pp. 266-269 ◽

Cited By ~ 1

Author(s):

Marcus H. Mendenhall

Keyword(s):

Powder Diffraction ◽

Diffraction Data ◽

Data Sets ◽

Powder Diffraction Data ◽

Data Set ◽

X Ray ◽

Short Summary ◽

Correct Assignment ◽

Wide Range ◽

Diffraction Patterns

This work provides a short summary of techniques for formally-correct handling of statistical uncertainties in Poisson-statistics dominated data, with emphasis on X-ray powder diffraction patterns. Correct assignment of uncertainties for low counts is documented. Further, we describe a technique for adaptively rebinning such data sets to provide more uniform statistics across a pattern with a wide range of count rates, from a few (or no) counts in a background bin to on-peak regions with many counts. This permits better plotting of data and analysis of a smaller number of points in a fitting package, without significant degradation of the information content of the data set. Examples of the effect of this on a diffraction data set are given.

Download Full-text

Scientific impact of MODIS C5 calibration degradation and C6+ improvements

Atmospheric Measurement Techniques ◽

10.5194/amt-7-4353-2014 ◽

2014 ◽

Vol 7 (12) ◽

pp. 4353-4365 ◽

Cited By ~ 126

Author(s):

A. Lyapustin ◽

Y. Wang ◽

X. Xiong ◽

G. Meister ◽

S. Platnick ◽

...

Keyword(s):

Vegetation Index ◽

Near Infrared ◽

Atmospheric Correction ◽

Temporal Trends ◽

Polarization Sensitivity ◽

Data Sets ◽

Surface Reflectance ◽

Data Set ◽

Cross Calibration ◽

Calibration Approach

Abstract. The Collection 6 (C6) MODIS (Moderate Resolution Imaging Spectroradiometer) land and atmosphere data sets are scheduled for release in 2014. C6 contains significant revisions of the calibration approach to account for sensor aging. This analysis documents the presence of systematic temporal trends in the visible and near-infrared (500 m) bands of the Collection 5 (C5) MODIS Terra and, to lesser extent, in MODIS Aqua geophysical data sets. Sensor degradation is largest in the blue band (B3) of the MODIS sensor on Terra and decreases with wavelength. Calibration degradation causes negative global trends in multiple MODIS C5 products including the dark target algorithm's aerosol optical depth over land and Ångström exponent over the ocean, global liquid water and ice cloud optical thickness, as well as surface reflectance and vegetation indices, including the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI). As the C5 production will be maintained for another year in parallel with C6, one objective of this paper is to raise awareness of the calibration-related trends for the broad MODIS user community. The new C6 calibration approach removes major calibrations trends in the Level 1B (L1B) data. This paper also introduces an enhanced C6+ calibration of the MODIS data set which includes an additional polarization correction (PC) to compensate for the increased polarization sensitivity of MODIS Terra since about 2007, as well as detrending and Terra–Aqua cross-calibration over quasi-stable desert calibration sites. The PC algorithm, developed by the MODIS ocean biology processing group (OBPG), removes residual scan angle, mirror side and seasonal biases from aerosol and surface reflectance (SR) records along with spectral distortions of SR. Using the multiangle implementation of atmospheric correction (MAIAC) algorithm over deserts, we have also developed a detrending and cross-calibration method which removes residual decadal trends on the order of several tenths of 1% of the top-of-atmosphere (TOA) reflectance in the visible and near-infrared MODIS bands B1–B4, and provides a good consistency between the two MODIS sensors. MAIAC analysis over the southern USA shows that the C6+ approach removed an additional negative decadal trend of Terra ΔNDVI ~ 0.01 as compared to Aqua data. This change is particularly important for analysis of vegetation dynamics and trends in the tropics, e.g., Amazon rainforest, where the morning orbit of Terra provides considerably more cloud-free observations compared to the afternoon Aqua measurements.

Download Full-text

Quantitative Electron Microprobe Analysis Of Semiconductor Materials: An Evaluation Of Accuracy

Microscopy and Microanalysis ◽

10.1017/s1431927600013696 ◽

1999 ◽

Vol 5 (S2) ◽

pp. 74-75

Author(s):

P.K. Carpenter

Keyword(s):

Ternary Systems ◽

Data Sets ◽

Silicate Mineral ◽

Data Set ◽

X Ray ◽

Quantitative Microanalysis ◽

Wide Range ◽

Analytical Accuracy ◽

Unknown Composition ◽

Counting Statistics

Both precision and accuracy are central to quantitative microanalysis. While precision may be evaluated from x-ray counting statistics and replicate measurement, the determination of analytical accuracy requires well characterized standards of which there are few that span a wide range of compositions in binary and ternary systems. The accuracy of silicate mineral analysis has been previously studied via measurement of α factors at multiple accelerating potential and the subsequent evaluation of correction algorithms and mass absorption coefficient (mac) data sets. This approach has been extended in this study to the In2O3-Ga2O3 and HgTe-CdTe systems.Single crystals of ln2O3, Ga2O3, and an InGa-oxide of unknown composition were used to evaluate accuracy in the In2O3-Ga2O3 binary, using the GaKα, GaLα, and InLα x-ray lines, with WDS measurements performed at 15, 20, and 25KV relative to the ln2O3 and Ga2O3 standards (see Table I). The Ga Kα line exhibits minimal absorption, has no fluorescence correction in this system and is not critically dependent on the correction algorithm or mac data set used.

Download Full-text

The Impact of Normalization Methods on RNA-Seq Data Analysis

BioMed Research International ◽

10.1155/2015/621690 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

J. Zyprych-Walczak ◽

A. Szabelska ◽

L. Handschuh ◽

K. Górczak ◽

K. Klamecka ◽

...

Keyword(s):

High Throughput Sequencing ◽

Data Sets ◽

Complex Data ◽

Rna Seq ◽

Medical Problems ◽

Data Set ◽

Normalization Methods ◽

Wide Range ◽

The Impact ◽

Selection Of

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Download Full-text