Geostatistical Analysis of Compositional Data
Latest Publications


TOTAL DOCUMENTS

7
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780195171662, 9780197565513

Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

The problem of estimation of a coregionalization of size q using cokriging will be discussed in this chapter. Cokriging—a multivariate extension of kriging—is the usual procedure applied to multivariate regionalized problems within the framework of geostatistics. Its fundament is a distribution-free, linear, unbiased estimator with minimum estimation variance, although the absence of constraints on the estimator is an implicit assumption that the multidimensional real space is the sample space of the variables under consideration. If a multivariate normal distribution can be assumed for the vector random function, then the simple kriging estimator is identical with the conditional expectation, given a sample of size N. See Journel (1977, pp. 576-577), Journel (1980, pp. 288-290), Cressie (1991, p. 110), and Diggle, Tawn, and Moyeed (1998, p. 300) for further details. This estimator is in general the best possible linear estimator, as it is unbiased and has minimum estimation variance, but it is not very robust in the face of strong departures from normality. Therefore, for the estimation of regionalized compositions other distributions must also be taken into consideration. Recall that compositions cannot follow a multivariate normal distribution by definition, their sample space being the simplex. Consequently, regionalized compositions in general cannot be modeled under explicit or implicit assumptions of multivariate Gaussian processes. Here only the multivariate lognormal and additive logistic normal distributions will be addressed. Besides the logarithmic and additive logratio transformations, others can be applied, such as the multivariate Box-Cox transformation, as stated by Andrews et al. (1971), Rayens and Srinivasan (1991), and Barcelo-Vidal (1996). Furthermore, distributions such as the multiplicative logistic normal distribution introduced by Aitchison (1986, p. 131) or the additive logistic skew-normal distribution defined by Azzalini and Dalla Valle (1996) can be investigated in a similar fashion. References to the literature for the fundamental principles of the theory discussed in this chapter were given in Chapter 2. Among those, special attention is drawn to the work of Myers (1982), where matrix formulation of cokriging was first presented and the properties included in the first section of this chapter were stated.


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

In Chapter 6 we introduce additional aspects of the geostatistical approach presented in the preceding chapters that were not necessary for its theoretical development, but that are essential for the practical application of the method to compositional data. We discuss how to treat zeros in compositional data sets; how to model the required cross-covariances; how to compute expected values and estimation variances for the original, constrained variables; and how to build and interpret confidence intervals for estimated values. As mentioned in Section 2.1, data sets with many zeros are as troublesome in compositional analysis as they are in standard multivariate analysis. In our approach, the additional restriction for compositional data is that zero values are not admissible for modeling. The justification for this restriction can be given using arithmetic arguments. A transformation that uses logarithms cannot be performed on zero values. This is the case for the logratio transformation that leads to the definition of an additive logistic normal distribution, as introduced by Aitchison (1986, p. 113). It is also the case for the additive logistic skew-normal distribution defined in Mateu-Figueras et al. (1998), following previous results by Azzalini and Dalla Valle (1996). The centered logratio transformation and the family of multivariate Box-Cox transformations discussed in Andrews et al. (1971), Rayens and Srinivasan (1991), and Barceló- Vidal (1996) also call for the restriction of zero values. This restriction is certainly a wellspring of discussion, albeit surprisingly so, as nobody would complain about eliminating zeros either by simple suppression of samples or by substitution with reasonable values when dealing with a sample from a lognormal distribution in the univariate case. Recall that the logarithm of zero is undefined and the sample space of the lognormal distribution is the positive real line, excluding the origin. In order to present our position on how to deal with zeros as clearly as possible, let us assume that only one of our components has zeros in some of the samples. Those cases where more than one variable is affected can be analyzed by methods described below.


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

In this chapter we set the rationale for the analysis of regionalized compositions. Required definitions for nonregionalized compositions are extended to vector random functions and necessary concepts from the theory of regionalized variables are related to vector random functions that form a composition. In order to avoid continually repeating references to literature, the reader is referred especially to the works of Matheron (1971) and Aitchison (1986), on which the following developments are based. Here the exposition is very concise; its purpose is basically to introduce terminology and notation. Proofs analogous to those of the nonregionalized case are omitted, for the most part. In general they can be derived directly from the corresponding definitions. For concepts of matrix algebra required by this work, refer to Kemény (1984) and Golub and Van Loan (1989). There are many excellent textbooks that treat concepts of probability theory and multivariate statistics. We have used mainly the books by Fahrmeir and Hamerle (1984) and Krzanowski (1988), and others have served as complementary bibliography, e.g., Feller (1968), Kendall and Stuart (1979), Kendall et al. (1983), Kres (1983), Stuart and Ord (1987), Johnson et al. (1994), and Kotz et al. (2000). A similar situation holds for the foundations of univariate geostatistics; refer to David (1977), Journel and Huijbregts (1978), Rendu (1978), Clark (1979), Isaaks and Srivastava (1989), Samper-Calvete and Carrera-Ramírez (1990), Cressie (1991), Goovaerts (1997), Chilès and Delfiner (1999), and Olea (1999). Treatments of multivariate geostatistics are found in Matheron (1979), François-Bongarçon (1981), Carr et al. (1985), and Wackernagel (1998). We base our presentation mainly on Journel and Huijbregts (1978) and Deutsch and Journel (1998), but also on Myers (1982), in which the matrix formulation of cokriging is given. Geostatistical terminology conforms, as far as possible, to that found in the Geostatistical Glossary and Multilingual Dictionary, compiled by members of the 1984-1989 IAMG Committee on Geostatistics and edited by R. A. Olea (1991).


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

Geological data, notably geochemical data, often take the form of a regionalized composition. The concept of regionalized composition combines the concepts of composition and coregionalization. A composition, also known in the literature as a closed array (Chayes 1962), is a random vector whose components add up to a constant. A coregionalization is a set of two or more regionalized variables defined over the same spatial domain, which is modeled as a realization of a vector random function. Here the term regionalized composition is used both for the vector random function used to model a composition and for the realization that we can observe. A regionalized composition can be, for example, a heavy-mineral suite along a river valley. The minerals are quantitatively determined through frequency counts and represented as percent-proportions of the entire heavy-mineral occurrence. Another example is the set of grades in a lead-copper-zinc deposit. In this instance, all components of each specimen are not quantitatively recorded and the grades are also not expressed as proportions of the whole of the measured components: only a small fraction of the composition in ppm is accounted for in each specimen. The problem with the statistical analysis of compositions has been stated historically in terms of correlations: the covariances are subject to essential nonstochastic controls, i.e., distortions which are due to the constant-sum constraint. These numerically induced covariances and correlations arise also with regionalized compositions and are called spurious spatial correlations. They falsify the picture of the spatial covariance structure and can lead to misinterpretations. This problem arises not only when the whole regionalized composition is analyzed, but also when interest lies only in a subvector. A second problem, singularity of the covariance matrix of a composition, has generally been considered only from a numerical point of view. Singularity is a direct consequence of the constant-sum constraint and, as in other multivariate methods, it rules out the use of estimation techniques such as cokriging of all components. Numerically the problem can be tackled either by taking generalized inverses or, equivalently, leaving one component out to avoid singularity of the matrices of coefficients.


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

Methods for spatial correlation analysis and estimation of r-compositions introduced in the foregoing chapters are illustrated here by an example that draws upon real data taken from the Lyons West oil field located in west-central Kansas, USA. Data consist of core analyses of water saturation, saturated thickness and average reservoir porosity over the connate saturated interval at different locations in the Lyons West field. These data are used to compare different possible methods for predicting regionalized compositions. The methods we consider are: 1. a direct approach for estimating compositional variables derived from the original measurements; 2. the basis method, applicable only when there is a random function that can be regarded as the size or accumulation of the regionalized variable under study; 3. the logratio approach, using the additive logratio (air) transformation. Kriging and cokriging estimation methods will be considered for original compositions and for transformed data. Software used for statistical analyses include GSLIB, programs written by Ma and Yao (2001) and ad hoc programs written by the authors. GSLIB is a public-domain library of geostatistical programs written in Fortran (Deutsch and Journel 1998); the other programs are available from their authors. The Lyons West oil field is located at 98° 15' west longitude and 38° 20' north latitude in west-central Kansas, near the center of the United States. The reservoir occurs in Mississippian (Lower Carboniferous) rocks that originated as sediments deposited in the shallow interior sea that covered much of North America in the late Paleozoic. The field was discovered somewhat accidentally in 1963, during the drilling of a deeper Ordovician prospect. Initial oil in place was estimated at 22 million stock-tank barrels of oil. The genesis of the reservoir, composed of carbonate-cemented sands, is interpreted as an offshore bar enclosed in marine shales. Regional uplift tilted the sand body, which was truncated along the western margins by the unconformity marking the base of the Pennsylvanian (Upper Carboniferous). The sandstones interfinger with marine shales to the east, but the eastern margin of the reservoir is defined by the intersection of the oil-water contact with the shale seal at the top of the reservoir interval (Ehm 1965).


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

Concepts of null correlation for r-compositions are discussed in this chapter, following the methodology developed by J. Aitchison for the statistical analysis of compositional data. This will be combined with G. Matheron’s theory of regionalized variables. These concepts are to be understood in the sense of absence de correlation différée (absence of deferred correlation), as defined by Matheron (1965). Concepts of null correlation are important not only for spatial-structure analysis of r-compositions, but also for simulation of phenomena that can be described by the use of r-compositions. The intrinsic analogue to the definitions of null correlation in the secondorder stationary case is carried out in parallel in this chapter because the relation between them is of special interest. All of the following concepts depend in general on the length of the vector h and also on its direction and sign, that is, they can be defined depending on the length, or set of lengths, and the direction, or set of directions, of h, or both. Therefore, as in Chapter 3, statements will be made for h Î H, where H stands for a set of vectors with specified range of directions and range of lengths. Here, for example, H may contain all possible directions and lengths for h, except h = 0; in this case, statements will be valid only in a spatial sense, but not in a standard nonspatial sense.


Author(s):  
Vera Pawlowsky-Glahn ◽  
Richardo A. Olea

For any component in time series analysis (Natke 1983), the concept of covariance between components of a spatially distributed random vector Z(u) leads to: direct covariances, Cov[Zi(u),Zj(u)]; shifted covariances or spatial covariances, Cov [Zi(u), Zj-(u+ h)], also known as cross-covariance functions; and autocovariance functions, Cov[Zi(u),Zi(u + h)]. The direct covariances may be thought of as a special case of the cross-covariance functions (for h = 0), and the same holds for the autocovariance functions (for i = j), so there is no need for a separate discussion. To simplify the exposition, hereafter the term function is dropped, and only the terms cross-covariance and autocovariance are used. Pawlowsky (1984) stated that if the vector random function constitutes an r-composition, then the problem of spurious spatial correlations appears. This is evident from the fact that at each point of the domain W, as in the nonregionalized case, the natural sample space of an r-composition is the D-simplex. This aspect will be discussed in Section 3.1.1. Aitchison (1986) discussed the problematic nature of the covariance analysis of nonregionalized compositions. He circumvents the problem of spurious correlations by using the fact that the ratio of two arbitrary components of a basis is identical to the ratios of the corresponding components of the associated composition. To avoid working with ratios, which is always difficult, Aitchison takes logarithms of the ratios. Then dependencies among variables of a composition can be examined in real space by analyzing the covariance structure of the log-quotients. The advantages of using this approach are not only numerical or related to the facility of subsequent mathematical operations. Essentially they relate to the fact that the approach consists of a projection of the original sample space, the simplex SD, onto a new sample space, namely real space IRD-1. Thus the door is open to many available methods and models based on the multivariate normal distribution. Recall that the multivariate normal distribution requires the sample space to be precisely the multidimensional, unconstrained real space. For this kind of model, strictly speaking, this is equivalent to saying that you need unconstrained components of the random vector to be analyzed.


Sign in / Sign up

Export Citation Format

Share Document