Application of a handheld X-ray fluorescence analyser to trace the provenance of Roman monuments of Neogene lithotypes to quarries in the Leitha Mountains, Hainburg Mountains and along the south-west border of the Vienna Basin

Geological Samples ◽

Vienna Basin ◽

Data Set ◽

Fine Grained ◽

Ornamental Stones ◽

Good Differentiation ◽

Log Ratio ◽

Archaeological Objects

This study shows results of geochemical pXRF-data of a closed data set from selected calcareous and mixed calcareous-siliciclastic lithotypes of ornamental and building stones, mainly attributed to corallinacean Leitha Limestone, its succeeding reworked and variegated deposits known as Detrital Leitha Limestone, as well as to younger or lateral interconnected oolites, coquinas and low calcitic sandstones. They altogether represent shallow marine deposits in the Central Paratethys Sea in the Middle to Upper Miocene (16&#8211;5 my). Certain analytical reasons require comparing quantities in the geochemical compositions just within the presented dataset.The stones in focus were prominent building and ornamental stones in former centuries and embody the stonemason culture during various historic periods e.g. in Vienna (St. Stephen&#8217;s Cathedral, Vienna State Opera). A still active quarry at Sankt Margarethen im Burgenland provides replacement material. The heritage value of these appreciated freestones is emphasised by their use for various cultural monuments and for buildings and infrastructure already when this region was part of Imperium Romanum. The interdisciplinary archaeological-geological project CarVin (Stone Monuments and Stone Quarrying in the Carnuntum - Vindobona Area, G. Kremer) provided the opportunity to relate archaeological stone objects with native quarries from the nearest possible locations by using this non-destructive analysing technique. The aim was to compare fine-grained archaeological stone objects with samples of similar lithologies from investigated outcrops for potential likenesses. In the present dataset we include 300 archaeological objects and 155 geological samples, each measured at least twice. We used the NITON XL3t 900s GOLDD Air of AnalytiCON Instruments. Its Mining Mode was used to measure main, minor and trace elements with an atomic mass from Magnesium upwards. The internal software converts the composition into percentage. Therefore compositional data analysis recommends a statistical centered log-ratio transformation. Scatterplots with certain elements by pairs show significant distributions. A preceding hypothetical grouping of the measured geological samples draws upon their lithology and their affinity to specifically defined quarry regions (see https://meetingorganizer.copernicus.org/EGU2018/EGU2018-18923.pdf). The grouping of the geological samples shows a good expression in the Ca-Sr plot and Sr-Ti allows a good differentiation as well. However, the attempt to differentiate between two specific areas &#8211; Leitha Mountains northeast and southwest &#8211; seems improbable. The expressed situation concerning the majority of the archaeological objects shows some similarities but also conspicuous differences: a clear depletion in Ba, Ca and Mg and partly in Mn and Sr linked with a striking enrichment in sulphur. Without further analysing methods we make environmental effects liable for that.Although more measurements per sample and object would have improved the study, the results from the pXRF method are supportive for petrological examinations. Nonetheless, a very sensitive handling and chemical data evaluation is critical with this method (analysing influences, surface conditions).

Variable selection in microbiome compositional data analysis

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa029 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Antoni Susin ◽

Yiwen Wang ◽

Kim-Anh Lê Cao ◽

M Luz Calle

Keyword(s):

Data Analysis ◽

Variable Selection ◽

Compositional Data ◽

Penalized Regression ◽

Forward Selection ◽

Computationally Efficient ◽

Parsimonious Model ◽

Microbiome Data ◽

Abstract Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa040 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 1

Author(s):

David R Lovell ◽

Xin-Yi Chua ◽

Annette McGrath

Keyword(s):

Count Data ◽

Compositional Data ◽

Ratio Analysis ◽

Sequencing Technology ◽

Scale Invariant ◽

Measurement And Analysis ◽

Discrete Nature ◽

The Impact ◽

Abstract Thanks to sequencing technology, modern molecular bioscience datasets are often compositions of counts, e.g. counts of amplicons, mRNAs, etc. While there is growing appreciation that compositional data need special analysis and interpretation, less well understood is the discrete nature of these count compositions (or, as we call them, lattice compositions) and the impact this has on statistical analysis, particularly log-ratio analysis (LRA) of pairwise association. While LRA methods are scale-invariant, count compositional data are not; consequently, the conclusions we draw from LRA of lattice compositions depend on the scale of counts involved. We know that additive variation affects the relative abundance of small counts more than large counts; here we show that additive (quantization) variation comes from the discrete nature of count data itself, as well as (biological) variation in the system under study and (technical) variation from measurement and analysis processes. Variation due to quantization is inevitable, but its impact on conclusions depends on the underlying scale and distribution of counts. We illustrate the different distributions of real molecular bioscience data from different experimental settings to show why it is vital to understand the distributional characteristics of count data before applying and drawing conclusions from compositional data analysis methods.

Performance Assessment in Water Polo Using Compositional Data Analysis

Journal of Human Kinetics ◽

10.1515/hukin-2016-0043 ◽

2016 ◽

Vol 54 (1) ◽

pp. 143-151 ◽

Cited By ~ 4

Author(s):

Enrique García Ordóñez ◽

María del Carmen Iglesias Pérez ◽

Carlos Touriño González

Keyword(s):

Performance Indicators ◽

Cross Validation ◽

Compositional Data ◽

Original Sample ◽

Water Polo ◽

Discriminant Analyses ◽

Multivariate Discriminant ◽

Log Ratio ◽

Match Score

AbstractThe aim of the present study was to identify groups of offensive performance indicators which best discriminated between a match score (favourable, balanced or unfavourable) in water polo. The sample comprised 88 regular season games (2011-2014) from the Spanish Professional Water Polo League. The offensive performance indicators were clustered in five groups: Attacks in relation to the different playing situations; Shots in relation to the different playing situations; Attacks outcome; Origin of shots; Technical execution of shots. The variables of each group had a constant sum which equalled 100%. The data were compositional data, therefore the variables were changed by means of the additive log-ratio (alr) transformation. Multivariate discriminant analyses to compare the match scores were calculated using the transformed variables. With regard to the percentage of right classification, the results showed the group that discriminated the most between the match scores was “Attacks outcome” (60.4% for the original sample and 52.2% for cross-validation). The performance indicators that discriminated the most between the match scores in games with penalties were goals (structure coefficient (SC) = .761), counterattack shots (SC = .541) and counterattacks (SC = .481). In matches without penalties, goals were the primary discriminating factor (SC = .576). This approach provides a new tool to compare the importance of the offensive performance groups and their effect on the match score discrimination.

Assessing Global Covid-19 Cases Data through Compositional Data Analysis(CoDa)

10.1101/2020.12.17.20248424 ◽

2020 ◽

Author(s):

Luis P.V. Braga ◽

Dina Feigenbaum

Keyword(s):

Data Analysis ◽

Compositional Data ◽

Discrete Groups ◽

Data Sets ◽

Cumulative Number ◽

Governmental Agencies ◽

Global Pandemic ◽

Number Of Patients ◽

AbstractBackgroundCovid-19 cases data pose an enormous challenge to any analysis. The evaluation of such a global pandemic requires matching reports that follow different procedures and even overcoming some countries’ censorship that restricts publications.MethodsThis work proposes a methodology that could assist future studies. Compositional Data Analysis (CoDa) is proposed as the proper approach as Covid-19 cases data is compositional in nature. Under this methodology, for each country three attributes were selected: cumulative number of deaths (D); cumulative number of recovered patients(R); present number of patients (A).ResultsAfter the operation called closure, with c=1, a ternary diagram and Log-Ratio plots, as well as, compositional statistics are presented. Cluster analysis is then applied, splitting the countries into discrete groups.ConclusionsThis methodology can also be applied to other data sets such as countries, cities, provinces or districts in order to help authorities and governmental agencies to improve their actions to fight against a pandemic.

A field guide for the compositional analysis of any-omics data

GigaScience ◽

10.1093/gigascience/giz107 ◽

2019 ◽

Vol 8 (9) ◽

Cited By ~ 22

Author(s):

Thomas P Quinn ◽

Ionas Erb ◽

Greg Gloor ◽

Cedric Notredame ◽

Mark F Richardson ◽

...

Keyword(s):

Data Analysis ◽

General Solution ◽

Compositional Data ◽

Compositional Analysis ◽

Nucleotide Synthesis ◽

Library Size ◽

Next Generation Sequencing Ngs ◽

Concise Guide ◽

Abstract Background Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: their magnitude is determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when heterogeneous samples are compared. Results Methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. Herein, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. Conclusions In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, “Relative to some important activity of the cell, what is changing?”

An application of compositional data analysis to multiomic time-series data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa079 ◽

2020 ◽

Vol 2 (4) ◽

Cited By ~ 1

Author(s):

Laura Sisk-Hackworth ◽

Scott T Kelley

Keyword(s):

Data Analysis ◽

Time Series Data ◽

Compositional Data ◽

Series Data ◽

Metabolomics Data ◽

Normalization Methods ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Abstract Compositional data analysis (CoDA) methods have increased in popularity as a new framework for analyzing next-generation sequencing (NGS) data. CoDA methods, such as the centered log-ratio (clr) transformation, adjust for the compositional nature of NGS counts, which is not addressed by traditional normalization methods. CoDA has only been sparsely applied to NGS data generated from microbial communities or to multiple ‘omics’ datasets. In this study, we applied CoDA methods to analyze NGS and untargeted metabolomic datasets obtained from bacterial and fungal communities. Specifically, we used clr transformation to reanalyze NGS amplicon and metabolomics data from a study investigating the effects of building material type, moisture and time on microbial and metabolomic diversity. Compared to analysis of untransformed data, analysis of clr-transformed data revealed novel relationships and stronger associations between sample conditions and microbial and metabolic community profiles.

The spatial pattern of beryllium and its possible origin using compositional data analysis on a high-density topsoil data set from the Campania Region (Italy)

Applied Geochemistry ◽

10.1016/j.apgeochem.2018.02.008 ◽

2018 ◽

Vol 91 ◽

pp. 162-173 ◽

Cited By ~ 7

Author(s):

Attila Petrik ◽

Stefano Albanese ◽

Annamaria Lima ◽

Benedetto De Vivo

Keyword(s):

Data Analysis ◽

Spatial Pattern ◽

Compositional Data ◽

High Density ◽

Data Set ◽

Campania Region

Statistical Analysis of Chemical Element Compositions in Food Science: Problems and Possibilities

Molecules ◽

10.3390/molecules26195752 ◽

2021 ◽

Vol 26 (19) ◽

pp. 5752

Author(s):

Matthias Templ ◽

Barbara Templ

Keyword(s):

Statistical Analysis ◽

Missing Values ◽

Compositional Data ◽

Chemical Elements ◽

Chemical Components ◽

Research Fields ◽

Log Ratio ◽

Processing Steps

In recent years, many analyses have been carried out to investigate the chemical components of food data. However, studies rarely consider the compositional pitfalls of such analyses. This is problematic as it may lead to arbitrary results when non-compositional statistical analysis is applied to compositional datasets. In this study, compositional data analysis (CoDa), which is widely used in other research fields, is compared with classical statistical analysis to demonstrate how the results vary depending on the approach and to show the best possible statistical analysis. For example, honey and saffron are highly susceptible to adulteration and imitation, so the determination of their chemical elements requires the best possible statistical analysis. Our study demonstrated how principle component analysis (PCA) and classification results are influenced by the pre-processing steps conducted on the raw data, and the replacement strategies for missing values and non-detects. Furthermore, it demonstrated the differences in results when compositional and non-compositional methods were applied. Our results suggested that the outcome of the log-ratio analysis provided better separation between the pure and adulterated data and allowed for easier interpretability of the results and a higher accuracy of classification. Similarly, it showed that classification with artificial neural networks (ANNs) works poorly if the CoDa pre-processing steps are left out. From these results, we advise the application of CoDa methods for analyses of the chemical elements of food and for the characterization and authentication of food products.

Amalgams: data-driven amalgamation for the reference-free dimensionality reduction of zero-laden compositional data

10.1101/2020.02.27.968677 ◽

2020 ◽

Cited By ~ 1

Author(s):

Thomas P. Quinn ◽

Ionas Erb

Keyword(s):

Dimension Reduction ◽

Domain Knowledge ◽

Compositional Data ◽

Real Data ◽

R Package ◽

Data Driven ◽

Alternative Methods ◽

Data Sets ◽

AbstractIn the health sciences, many data sets produced by next-generation sequencing (NGS) only contain relative information because of biological and technical factors that limit the total number of nucleotides observed for a given sample. As mutually dependent elements, it is not possible to interpret any component in isolation, at least without normalization. The field of compositional data analysis (CoDA) has emerged with alternative methods for relative data based on log-ratio transforms. However, NGS data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data without sacrificing interpretability. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension reduction method. In this report, we present data-driven amalgamation as a new method and conceptual framework for reducing the dimensionality of compositional data. Unlike expert-driven amalgamation which requires prior domain knowledge, our data-driven amalgamation method uses a genetic algorithm to answer the question, “What is the best way to amalgamate the data to achieve the user-defined objective?”. We present a user-friendly R package, called amalgam, that can quickly find the optimal amalgamation to (a) preserve the distance between samples, or (b) classify samples as diseased or not. Our benchmark on 13 real data sets confirm that these amalgamations compete with the state-of-the-art unsupervised and supervised dimension reduction methods in terms of performance, but result in new variables that are much easier to understand: they are groups of features added together.

A field guide for the compositional analysis of any-omics data

10.1101/484766 ◽

2018 ◽

Cited By ~ 3

Author(s):

Thomas P. Quinn ◽

Ionas Erb ◽

Greg Gloor ◽

Cedric Notredame ◽

Mark F. Richardson ◽

...

Keyword(s):

Data Analysis ◽

General Solution ◽

Compositional Data ◽

Compositional Analysis ◽

Sequencing Data ◽

Nucleotide Synthesis ◽

Library Size ◽

Concise Guide ◽

AbstractNext-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. Today, NGS is routinely used to understand many important topics in biology from human disease to microorganism diversity. A cornerstone of NGS is the quantification of RNA or DNA presence as counts. However, these counts are not counts per se: the magnitude of the counts are determined arbitrarily by the sequencing depth, not by the input material. Consequently, counts must undergo normalization prior to use. Conventional normalization methods require a set of assumptions: they assume that the majority of features are unchanged, and that all environments under study have the same carrying capacity for nucleotide synthesis. These assumptions are often untestable and may not hold when comparing heterogeneous samples (e.g., samples collected across distinct cancers or tissues). Instead, methods developed within the field of compositional data analysis offer a general solution that is assumption-free and valid for all data. In this manuscript, we synthesize the extant literature to provide a concise guide on how to apply compositional data analysis to NGS count data. In doing so, we review zero replacement, differential abundance analysis, and within-group and between-group coordination analysis. We then discuss how this pipeline can accommodate complex study design, facilitate the analysis of vertically and horizontally integrated data, including multiomics data, and further extend to single-cell sequencing data. In highlighting the limitations of total library size, effective library size, and spike-in normalizations, we propose the log-ratio transformation as a general solution to answer the question, “Relative to some important activity of the cell, what is changing?”. Taken together, this manuscript establishes the first fully comprehensive analysis protocol that is suitable for any and all -omics data.