Correlations between microbial parameters from water samples: expectations and reality
Data which are collected in order to estimate the correlation between parameters must be analysed with caution. Classical statistics of correlation are often inappropriate. The “r” statistic is very easily distorted by non-Normal data. Non-parametric statistics can be helpful. The interpretation and usefulness of the estimates of correlation will depend on the study plan. If water samples come from disparate sources (e.g. upstream or downstream from sewage outlets) then parameters A and B may occur in their highest and lowest numbers according to how close the samples were to contamination sources thus correlating closely. However, if all samples come from sources with similar pollution levels then plots of A and B will show considerable scatter and apparently little correlation. So what is the relationship between A and B? An example of “perfect” correlation, as demonstrated by replicate counts of a single parameter from split samples, gave an r value of only 0.63 (ρ = 0.62) due to random variation in numbers of organisms between the two halves of the sample. Thus large amounts of data are needed for studying true correlation because relationships between parameters are embedded in the natural variation. This also illustrated that Standards for a single parameter can be “passed” or “failed” by two halves of the same sample. Study design is clearly of fundamental importance. Consideration must be given to the appropriate way of asking questions about correlation between different parameters.