A comparison of generalised linear models and compositional models for ordered categorical data
Ordered categorical data occur in many applied fields, such as geochemistry, econometrics, sociology and demography or even transportation research, for example, in the form of results from various questionnaires. There are different possibilities for modelling proportions of individual categories. Generalised linear models (GLMs) are traditionally used for this purpose, but also methods of compositional data analysis (CoDa) can be considered. Here, both approaches are compared in depth. Particularly, different assumptions of the models on variability are highlighted. Advantages and disadvantages of individual models are pointed out. While the CoDa model may be inappropriate when the variability of the compositional coordinates depends on the regressors, for example, due to different total counts on which the coordinates are based, the GLM may underestimate the uncertainty of the predictions considerably in case of large-scale data.