Practical aspects of compositional data analysis
In Chapter 6 we introduce additional aspects of the geostatistical approach presented in the preceding chapters that were not necessary for its theoretical development, but that are essential for the practical application of the method to compositional data. We discuss how to treat zeros in compositional data sets; how to model the required cross-covariances; how to compute expected values and estimation variances for the original, constrained variables; and how to build and interpret confidence intervals for estimated values. As mentioned in Section 2.1, data sets with many zeros are as troublesome in compositional analysis as they are in standard multivariate analysis. In our approach, the additional restriction for compositional data is that zero values are not admissible for modeling. The justification for this restriction can be given using arithmetic arguments. A transformation that uses logarithms cannot be performed on zero values. This is the case for the logratio transformation that leads to the definition of an additive logistic normal distribution, as introduced by Aitchison (1986, p. 113). It is also the case for the additive logistic skew-normal distribution defined in Mateu-Figueras et al. (1998), following previous results by Azzalini and Dalla Valle (1996). The centered logratio transformation and the family of multivariate Box-Cox transformations discussed in Andrews et al. (1971), Rayens and Srinivasan (1991), and Barceló- Vidal (1996) also call for the restriction of zero values. This restriction is certainly a wellspring of discussion, albeit surprisingly so, as nobody would complain about eliminating zeros either by simple suppression of samples or by substitution with reasonable values when dealing with a sample from a lognormal distribution in the univariate case. Recall that the logarithm of zero is undefined and the sample space of the lognormal distribution is the positive real line, excluding the origin. In order to present our position on how to deal with zeros as clearly as possible, let us assume that only one of our components has zeros in some of the samples. Those cases where more than one variable is affected can be analyzed by methods described below.