logratio transformation
Recently Published Documents


TOTAL DOCUMENTS

10
(FIVE YEARS 2)

H-INDEX

4
(FIVE YEARS 0)

2021 ◽  
Vol 12 ◽  
Author(s):  
Michael Greenacre ◽  
Marina Martínez-Álvaro ◽  
Agustín Blasco

Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.


2021 ◽  
Author(s):  
Michael Greenacre ◽  
Marina Martinez-Alvaro ◽  
Agustin Blasco

Background: Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc...). These data are generally regarded as compositional since the total number of counts identified within a sample are irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric in the sense of reproducing the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. Finally, it is preferable that the reference component not be a rare component but well populated, and substantive biological reasons might also guide the choice if several reference candidates are identified. Results: On each of three high-dimensional datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria.For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9977 and 0.9997, respectively. In the third case, where the objective was to distinguish between three groups of samples, the approximation was made to the restricted logratio space of the between-group variance. Conclusions: We show that for high-dimensional compositional data additive logratios can provide a valid choice as transformed variables that are (1) subcompositionally coherent, (2) explaining 100% of the total logratio variance and (3) coming measurably very close to being isometric, that is approximating almost perfectly the exact logratio geometry. The interpretation of additive logratios is simple and, when the variance of the log-transformed reference is very low, it is made even simpler since each additive logratio can be identified with a corresponding compositional component.


Geochemistry ◽  
2015 ◽  
Vol 75 (1) ◽  
pp. 117-132 ◽  
Author(s):  
Ahad Nazarpour ◽  
Nematolah Rashidnejad Omran ◽  
Ghodratolal Rostami Paydar ◽  
Behnam Sadeghi ◽  
Fatemeh Matroud ◽  
...  

Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

In Chapter 6 we introduce additional aspects of the geostatistical approach presented in the preceding chapters that were not necessary for its theoretical development, but that are essential for the practical application of the method to compositional data. We discuss how to treat zeros in compositional data sets; how to model the required cross-covariances; how to compute expected values and estimation variances for the original, constrained variables; and how to build and interpret confidence intervals for estimated values. As mentioned in Section 2.1, data sets with many zeros are as troublesome in compositional analysis as they are in standard multivariate analysis. In our approach, the additional restriction for compositional data is that zero values are not admissible for modeling. The justification for this restriction can be given using arithmetic arguments. A transformation that uses logarithms cannot be performed on zero values. This is the case for the logratio transformation that leads to the definition of an additive logistic normal distribution, as introduced by Aitchison (1986, p. 113). It is also the case for the additive logistic skew-normal distribution defined in Mateu-Figueras et al. (1998), following previous results by Azzalini and Dalla Valle (1996). The centered logratio transformation and the family of multivariate Box-Cox transformations discussed in Andrews et al. (1971), Rayens and Srinivasan (1991), and Barceló- Vidal (1996) also call for the restriction of zero values. This restriction is certainly a wellspring of discussion, albeit surprisingly so, as nobody would complain about eliminating zeros either by simple suppression of samples or by substitution with reasonable values when dealing with a sample from a lognormal distribution in the univariate case. Recall that the logarithm of zero is undefined and the sample space of the lognormal distribution is the positive real line, excluding the origin. In order to present our position on how to deal with zeros as clearly as possible, let us assume that only one of our components has zeros in some of the samples. Those cases where more than one variable is affected can be analyzed by methods described below.


1998 ◽  
Vol 34 (1-2) ◽  
pp. 117-120 ◽  
Author(s):  
Michal Kucera ◽  
Björn A. Malmgren

Sign in / Sign up

Export Citation Format

Share Document