scholarly journals Exploratory Analysis of Provenance Data Using R and the Provenance Package

Minerals ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 193 ◽  
Author(s):  
Pieter Vermeesch

The provenance of siliclastic sediment may be traced using a wide variety of chemical, mineralogical and isotopic proxies. These define three distinct data types: (1) compositional data such as chemical concentrations; (2) point-counting data such as heavy mineral compositions; and (3) distributional data such as zircon U-Pb age spectra. Each of these three data types requires separate statistical treatment. Central to any such treatment is the ability to quantify the `dissimilarity’ between two samples. For compositional data, this is best done using a logratio distance. Point-counting data may be compared using the chi-square distance, which deals better with missing components (zero values) than the logratio distance does. Finally, distributional data can be compared using the Kolmogorov–Smirnov and related statistics. For small datasets using a single provenance proxy, data interpretation can sometimes be done by visual inspection of ternary diagrams or age spectra. However, this no longer works for larger and more complex datasets. This paper reviews a number of multivariate ordination techniques to aid the interpretation of such studies. Multidimensional Scaling (MDS) is a generally applicable method that displays the salient dissimilarities and differences between multiple samples as a configuration of points in which similar samples plot close together and dissimilar samples plot far apart. For compositional data, classical MDS analysis of logratio data is shown to be equivalent to Principal Component Analysis (PCA). The resulting MDS configurations can be augmented with compositional information as biplots. For point-counting data, classical MDS analysis of chi-square distances is shown to be equivalent to Correspondence Analysis (CA). This technique also produces biplots. Thus, MDS provides a common platform to visualise and interpret all types of provenance data. Generalising the method to three-way dissimilarity tables provides an opportunity to combine several datasets together and thereby facilitate the interpretation of `Big Data’. This paper presents a set of tutorials using the statistical programming language R. It illustrates the theoretical underpinnings of compositional data analysis, PCA, MDS and other concepts using toy examples, before applying these methods to real datasets with the provenance package.

2013 ◽  
Vol 11 (1) ◽  
pp. 85-99
Author(s):  
Polina Blagojevic ◽  
Niko Radulovic

It was recently confirmed that relative abundances of m/z values of the average mass scan of the total GC chromatograms (AMS) are suitable variables for multivariate statistical comparison (MVA) of essential oils. These are even more applicable, reliable and faster than the traditionally used variables-percentages (peak areas) of individual oil constituents. Herein, we have explored if AMS-derived variables are appropriate for MVA comparison of plant solvent extract compositional data. To achieve this, average mass scans of the total GC chromatograms and chemical compositions (relative percentages) of eight diethyl ether extracts (six different species; samples were analyzed using GC-FID and GC-MS; data from the literature) were separately compared using two MVA methods: agglomerative hierarchical clustering analysis and principal component analysis. The obtained results strongly suggest that MVA of complex volatile mixtures (GC-MS analyzable fractions of plant solvent extracts), using the corresponding AMS, could be considered as a promising time saving tool for easy and reliable comparison purposes. The AMS approach gives comparable or even better results than the traditional method.


2020 ◽  
Author(s):  
Kamila Fačevicová ◽  
Tomáš Matys Grygar ◽  
Karel Hron ◽  
Jitka Elznicová

<p>Fluvial sediments datasets, similarly as other types of a concentration based data, are typical by their relative nature and therefore they need preprocessing or normalization prior to the main statistical analysis. In the geochemical practice, several normalization methods are used, like a simple normalization of the target element concentration with the concentration of the reference (conservative, lithogenic) one, double normalization or concentration conversion to local enrichment factor. As an alternative to these methods, the approach using the principles of compositional data analysis (CoDA) can be considered.  Instead of the standard statistical analytical methods, like ordinary least squares regression, correlation of principal component analysis (PCA), applied on the raw or the target element normalized concentrations, the CoDA methods consider the relative structure of the whole dataset. CoDA together with the use of robust statistical methods, which are down weighting the influence of the outlying observations, have a potential to provide more accurate results. This property is demonstrated and discussed on the base of dataset from mapping the sediments from the Skalka Reservoir in the Ohře River, Czech Republic, and its tributaries. Mainly the performance of the robust versions of regression, correlation and principal components analysis, respecting the CoDA principles, will be presented and the way to them will be explained. </p>


1992 ◽  
Vol 117 (2) ◽  
pp. 239-242 ◽  
Author(s):  
L.E. Parent ◽  
M. Dafir

The premises underlying univariate (CVA = critical value approach) and bivariate (DRIS = diagnosis and recommendation integrated system) diagnostic systems were reexamined with regard to compositional data analysis (CDA). CDA recognizes a structure of dependence among plant nutrients, the bounded sum constraint to one (the whole composition equals 100% or 1), and removes the curvature problem carried by crude components and by dual ratios or logratios when treated in isolation. Linearization by “rowcentered logrationing” of nutrient fractions shows great potential for carrying multivariate diagnosis and principal component analysis on nutrient data. Compositional nutrient diagnosis (CND) is supported by the theory of CDA. CND is the multivariate expansion of CVA and DRIS and is fully compatible with PCA. CND takes all possible nutrient interactions into account. CND nutrient indices are composed of two separate functions, one considering differences between nutrient levels, another examining differences between nutrient balances (as defined by nutrient geometric means), of individual and target specimens. These functions indicate that nutrient insufficiency can be corrected by either adding a single nutrient or taking advantage of multiple nutrient interactions to improve nutrient balance as a whole. A theoretical interpretative table is presented for CND.


2020 ◽  
Author(s):  
Leila Jahangiry ◽  
Robabeh Parviz ◽  
Mojgan Mirghafourvand ◽  
Maryam Khazaee-Pool ◽  
Koen Ponnet

Abstract Background: To measure the severity of menopausal complaints and determine the pattern of menopausal symptoms, a valid and reliable instrument is needed in women’s healthcare. The Menopause Rating Scale (MRS) is one of the best-known tools in response to the lack of standardized scales. The purpose of this study was to examine the psychometric properties of the MRS in an Iranian example. Methods: Participants were randomly selected from women referred to healthcare centers in Miandoab, West Azerbaijan, Iran. A total of 330 questionnaires were completed (response rate of 96.9%). Two samples were considered for analysis in the validation process. An exploratory factor analysis (EFA) was conducted on the first sample (n1 =165), and a confirmatory factor analysis (CFA) was done using a second study sample (n2 = 165). The psychometric properties process was concluded with assessment of internal consistency and test-retest reliability. Results: The EFA with Principal Component Analysis extracted three factors explaining 75.47% cumulative variance. The CFA confirmed a three-factor structure of the 11-items MRS. All fit indices proved to be satisfactory. The relative chi-square (χ2/df) was 3.686 (p < .001). The Root Mean Square Error of Approximation (RMSEA) of the model was .04 (90% CI = .105 – .150). All comparative indices of the model, including the Comparative Fit Index, Normed Fit Index, and Relative Fit Index, were more than .80 (.90, .87, and .80, respectively). For the overall scale, Cronbach’s alpha was .931, whereas the alpha for the subscales ranged from 0.705-0.950. The intraclass correlation was .91 (95% CI = .89-.93), p < 0.001. Conclusion: The results of the study indicate that the Persian model of the MRS is a valid and reliable scale. As a screening tool, the Persian MRS could be used to identify the pattern of symptoms among menopausal, premenopausal, and postmenopausal women to care for and educate them on how to identify and treat the symptoms.


2018 ◽  
Vol 28 (9) ◽  
pp. 2834-2847 ◽  
Author(s):  
M Solans ◽  
G Coenders ◽  
R Marcos-Gragera ◽  
A Castelló ◽  
E Gràcia-Lavedan ◽  
...  

Instead of looking at individual nutrients or foods, dietary pattern analysis has emerged as a promising approach to examine the relationship between diet and health outcomes. Despite dietary patterns being compositional (i.e. usually a higher intake of some foods implies that less of other foods are being consumed), compositional data analysis has not yet been applied in this setting. We describe three compositional data analysis approaches (compositional principal component analysis, balances and principal balances) that enable the extraction of dietary patterns by using control subjects from the Spanish multicase-control (MCC-Spain) study. In particular, principal balances overcome the limitations of purely data-driven or investigator-driven methods and present dietary patterns as trade-offs between eating more of some foods and less of others.


Author(s):  
Mohammad Alrwashdeh ◽  
Jian Kai Yu ◽  
Aniseh Abdalla ◽  
Kan Wang

Several methods have been developed in order to evaluate the best fit for nuclear data parameters, these methods relies on sequence logical steps should be followed to get accurate and reliable results, it subdivided into: (1) the physical model have been used (2) data types (3) statistical methods (4) problems. This paper will discuss the statistical methods used to evaluate the best fit for the nuclear data. The difficulty in finding a real and reasonable solution to the fitting of data can be made easier by choosing the right fitting method. Different methods will converge differently depending on several parameters like the correct choose of the fitting function and number of fitting parameters. Here we will discuss the uses of two methods, first is Differential evaluation for nonlinear data to optimize a problem by iterative to improve the solution with regard to the quality. The second one is nonlinear regression method, this method used a model function is not linear in the parameters, and to estimate the relationships among variables, by capturing the trend in the data by assigning a single function in order to minimize the sum of residuals (Or Chi square) with respect to a set of parameters a = {A1, A2,…..An}.


2020 ◽  
Vol 12 (10) ◽  
pp. 4293 ◽  
Author(s):  
Marco Cruz-Sandoval ◽  
Elisabet Roca ◽  
María Isabel Ortego

The location and context in which people live influences and conditions their opportunities in life. This becomes relevant in a world subject to rapid urban and demographic growth, in which different economic, social, and political forces generate and accentuate disparities in cities. The foregoing generates an unequal distribution of the different social groups in the territory known as socio-spatial segregation. The study of this phenomenon incorporates a large number of variables belonging to different dimensions. Nonetheless, few studies have addressed socio-spatial segregation with a multivariate analysis approach. In addition, the existing studies may have obtained misleading outcomes by not acknowledging the inherent compositional nature of their variables. The objective of the present study is twofold: (i) To assess whether the phenomenon of socio-spatial segregation in Guadalajara, Mexico exists; and (ii) to introduce and stress the use of compositional techniques for the study of socio-spatial segregation. The study applied principal component analysis and cluster analysis considering the compositional nature of census variables, particularly from economic and educative indicators. In addition, the study used geographical information tools to depict and interpret the results. The results are intended to serve in the fulfillment of the Sustainable Development Goals towards inclusive and sustainable cities.


Minerals ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 501
Author(s):  
Caterina Gozzi ◽  
Roberta Sauro Graziano ◽  
Antonella Buccianti

Nature is often characterized by systems that are far from thermodynamic equilibrium, and rivers are not an exception for the Earth’s critical zone. When the chemical composition of stream waters is investigated, it emerges that riverine systems behave as complex systems. This means that the compositions have properties that depend on the integrity of the whole (i.e., the composition with all the chemical constituents), properties that arise thanks to the innumerable nonlinear interactions between the elements of the composition. The presence of interconnections indicates that the properties of the whole cannot be fully understood by examining the parts of the system in isolation. In this work, we propose investigating the complexity of riverine chemistry by using the CoDA (Compositional Data Analysis) methodology and the performance of the perturbation operator in the simplex geometry. With riverine bicarbonate considered as a key component of regional and global biogeochemical cycles and Ca2+ considered as mostly related to the weathering of carbonatic rocks, perturbations were calculated for subsequent couples of compositions after ranking the data for increasing values of the log-ratio ln(Ca2+/HCO3−). Numerical values were analyzed by using robust principal component analysis and non-parametric correlations between compositional parts (heat map) associated with distributional and multifractal methods. The results indicate that HCO3−, Ca2+, Mg2+ and Sr2+ are more resilient, thus contributing to compositional changes for all the values of ln(Ca2+/HCO3−) to a lesser degree with respect to the other chemical elements/components. Moreover, the complementary cumulative distribution function of all the sequences tracing the compositional change and the nonlinear relationship between the Q-th moment versus the scaling exponents for each of them indicate the presence of multifractal variability, thus revealing scaling properties of the fluctuations.


2020 ◽  
Author(s):  
Leila Jahangiry ◽  
Robabeh Parviz ◽  
Mojgan Mirghafourvand ◽  
Maryam Khazaee-Pool ◽  
Koen Ponnet

Abstract Background: To measure the severity of menopausal complaints and determine the pattern of menopausal symptoms, a valid and reliable instrument is needed in women’s healthcare. The Menopause Rating Scale (MRS) is one of the best-known tools in response to the lack of standardized scales. The purpose of this study was to examine the psychometric properties of the MRS in an Iranian example. Methods: Participants were randomly selected from women referred to healthcare centers in Miandoab, West Azerbaijan, Iran. A total of 330 questionnaires were completed (response rate of 96.9%). Two samples were considered for analysis in the validation process. An exploratory factor analysis (EFA) was conducted on the first sample (n1 =165), and a confirmatory factor analysis (CFA) was done using a second study sample (n2 = 165). The psychometric properties process was concluded with assessment of internal consistency and test-retest reliability. Results: The EFA with Principal Component Analysis extracted three factors explaining 75.47% cumulative variance. The CFA confirmed a three-factor structure of the 11-items MRS. All fit indices proved to be satisfactory. The relative chi-square (χ2/df) was 3.686 (p < .001). The Root Mean Square Error of Approximation (RMSEA) of the model was .04 (90% CI = .105 – .150). All comparative indices of the model, including the Comparative Fit Index, Normed Fit Index, and Relative Fit Index, were more than .80 (.90, .87, and .80, respectively). For the overall scale, Cronbach’s alpha was .931, whereas the alpha for the subscales ranged from 0.705-0.950. The intraclass correlation was .91 (95% CI = .89-.93), p < 0.001. Conclusion: The results of the study indicate that the Persian model of the MRS is a valid and reliable scale. As a screening tool, the Persian MRS could be used to identify the pattern of symptoms among menopausal, premenopausal, and postmenopausal women to care for and educate them on how to identify and treat the symptoms.


Author(s):  
Miquel Carreras Simó ◽  
Germà Coenders

Financial ratios are often used in principal component analysis and related techniques for the purposes of data reduction and visualization. Besides the dependence of results on ratio choice, ratios themselves pose a number of problems when subjected to a principal component analysis, such as skewed distributions. In this work, we put forward an alternative method drawn from compositional data analysis (CoDa), a standard statistical toolbox for use when data convey information about relative magnitudes, as financial ratios do. The method, referred to as the CoDa biplot, does not rely on any particular choice of financial ratio but allows researchers to visually order firms along the pairwise financial ratios for any two accounts. Non-financial magnitudes and time evolution can be added to the visualization as desired. We show an example of its application to the top chains in the Spanish grocery retail sector and show how the technique can be used to depict strategic management differences in financial structure or performance, and their evolution over time.


Sign in / Sign up

Export Citation Format

Share Document