A comparison of generalised linear models and compositional models for ordered categorical data

Ordered categorical data occur in many applied fields, such as geochemistry, econometrics, sociology and demography or even transportation research, for example, in the form of results from various questionnaires. There are different possibilities for modelling proportions of individual categories. Generalised linear models (GLMs) are traditionally used for this purpose, but also methods of compositional data analysis (CoDa) can be considered. Here, both approaches are compared in depth. Particularly, different assumptions of the models on variability are highlighted. Advantages and disadvantages of individual models are pointed out. While the CoDa model may be inappropriate when the variability of the compositional coordinates depends on the regressors, for example, due to different total counts on which the coordinates are based, the GLM may underestimate the uncertainty of the predictions considerably in case of large-scale data.

Download Full-text

A measure of association for ordered categorical data in population-based studies

Statistical Methods in Medical Research ◽

10.1177/0962280216643347 ◽

2016 ◽

Vol 27 (3) ◽

pp. 812-831 ◽

Cited By ~ 4

Author(s):

Kerrie P Nelson ◽

Don Edwards

Keyword(s):

Categorical Data ◽

Large Scale ◽

Intraclass Correlation ◽

Disease Status ◽

Population Based ◽

Simulation Studies ◽

Ordered Categorical Data ◽

Measure Of Association ◽

Ordered Categorical ◽

Categorical Scale

Ordinal classification scales are commonly used to define a patient’s disease status in screening and diagnostic tests such as mammography. Challenges arise in agreement studies when evaluating the association between many raters’ classifications of patients’ disease or health status when an ordered categorical scale is used. In this paper, we describe a population-based approach and chance-corrected measure of association to evaluate the strength of relationship between multiple raters’ ordinal classifications where any number of raters can be accommodated. In contrast to Shrout and Fleiss’ intraclass correlation coefficient, the proposed measure of association is invariant with respect to changes in disease prevalence. We demonstrate how unique characteristics of individual raters can be explored using random effects. Simulation studies are conducted to demonstrate the properties of the proposed method under varying assumptions. The methods are applied to two large-scale agreement studies of breast cancer screening and prostate cancer severity.

Download Full-text

Continuous and Ordered Categorical Data in Network Psychometrics: Which Estimation Method to Choose? Deriving Guidelines for Applied Researchers

10.31234/osf.io/mbycn ◽

2021 ◽

Author(s):

Adela-Maria Isvoranu ◽

Sacha Epskamp

Keyword(s):

Categorical Data ◽

Large Scale ◽

Graphical Model ◽

Large Body ◽

Estimation Method ◽

Psychological Research ◽

Estimation Methods ◽

Edge Weight ◽

Ordered Categorical Data ◽

Ordered Categorical

The Gaussian Graphical Model (GGM) has recently grown popular in psychological research, with a large body of estimation methods being proposed and discussed across various fields of study, and several algorithms being identified and recommend as applicable to psychological datasets. Such high-dimensional model estimation, however, is not trivial, and algorithms tend to perform differently in different settings. In addition, psychological research poses unique challenges, including placing a strong focus on weak edges (e.g., bridge edges), handling data measured on ordered scales, and relatively limited sample sizes. As a result, there is currently no consensus regarding which estimation procedure performs best in which setting. In this large-scale simulation study, we aimed to overcome this gap in the literature by comparing the performance of several estimation algorithms suitable for gaussian and skewed ordered categorical data across a multitude of settings, as to arrive at concrete guidelines from applied researchers. In total, we investigated 60 different metrics across 564,000 simulated datasets. We summarized our findings through a platform that allows for manually exploring simulation results. Overall, we found that an exchange between discovery (e.g., sensitivity, edge weight correlation) and caution (e.g., specificity, precision) should always be expected and achieving both¬—which is a requirement for perfect replicability—is difficult. Further, we identified that the estimation method is best chosen in light of each research question and highlighted, alongside desirable asymptotic properties and low sample size discovery, results according to most common research questions in the field.

Download Full-text

Generalized linear models for ordered categorical data

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2021.1921210 ◽

2021 ◽

pp. 1-14

Author(s):

Sture Holm

Keyword(s):

Generalized Linear Models ◽

Categorical Data ◽

Linear Models ◽

Ordered Categorical Data ◽

Ordered Categorical

Download Full-text

Maximum Likelihood Methods for Association Models in Ordered Categorical Data: Multi-Way Case

Behaviormetrika ◽

10.2333/bhmk.15.23_85 ◽

1988 ◽

Vol 15 (23) ◽

pp. 85-91 ◽

Cited By ~ 3

Author(s):

Masaaki Tsujitani

Keyword(s):

Maximum Likelihood ◽

Categorical Data ◽

Likelihood Methods ◽

Association Models ◽

Ordered Categorical Data ◽

Maximum Likelihood Methods ◽

Ordered Categorical

Download Full-text

Power and sample size for ordered categorical data

Statistical Methods in Medical Research ◽

10.1191/0962280203sm317ra ◽

2003 ◽

Vol 12 (1) ◽

pp. 73-84 ◽

Cited By ~ 11

Author(s):

N Rabbee ◽

B A Coull ◽

C Mehta ◽

N Patel ◽

P Senchaudhuri

Keyword(s):

Sample Size ◽

Categorical Data ◽

Ordered Categorical Data ◽

Ordered Categorical

Download Full-text

The Gut Microbiota of Healthy Aged Chinese Is Similar to That of the Healthy Young

mSphere ◽

10.1128/msphere.00327-17 ◽

2017 ◽

Vol 2 (5) ◽

Cited By ~ 65

Author(s):

Gaorui Bian ◽

Gregory B. Gloor ◽

Aihua Gong ◽

Changsheng Jia ◽

Wei Zhang ◽

...

Keyword(s):

Data Analysis ◽

Gut Microbiota ◽

Large Scale ◽

Compositional Data ◽

Healthy Lifestyle ◽

Compositional Data Analysis ◽

Surprising Result ◽

Microbiota Composition ◽

Cross Sectional ◽

Age Cohorts

ABSTRACT We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations. The microbiota of the aged is variously described as being more or less diverse than that of younger cohorts, but the comparison groups used and the definitions of the aged population differ between experiments. The differences are often described by null hypothesis statistical tests, which are notoriously irreproducible when dealing with large multivariate samples. We collected and examined the gut microbiota of a cross-sectional cohort of more than 1,000 very healthy Chinese individuals who spanned ages from 3 to over 100 years. The analysis of 16S rRNA gene sequencing results used a compositional data analysis paradigm coupled with measures of effect size, where ordination, differential abundance, and correlation can be explored and analyzed in a unified and reproducible framework. Our analysis showed several surprising results compared to other cohorts. First, the overall microbiota composition of the healthy aged group was similar to that of people decades younger. Second, the major differences between groups in the gut microbiota profiles were found before age 20. Third, the gut microbiota differed little between individuals from the ages of 30 to >100. Fourth, the gut microbiota of males appeared to be more variable than that of females. Taken together, the present findings suggest that the microbiota of the healthy aged in this cross-sectional study differ little from that of the healthy young in the same population, although the minor variations that do exist depend upon the comparison cohort. IMPORTANCE We report the large-scale use of compositional data analysis to establish a baseline microbiota composition in an extremely healthy cohort of the Chinese population. This baseline will serve for comparison for future cohorts with chronic or acute disease. In addition to the expected difference in the microbiota of children and adults, we found that the microbiota of the elderly in this population was similar in almost all respects to that of healthy people in the same population who are scores of years younger. We speculate that this similarity is a consequence of an active healthy lifestyle and diet, although cause and effect cannot be ascribed in this (or any other) cross-sectional design. One surprising result was that the gut microbiota of persons in their 20s was distinct from those of other age cohorts, and this result was replicated, suggesting that it is a reproducible finding and distinct from those of other populations.

Download Full-text

ApplyingR2– Type Measures to Ordered Categorical Data

Technometrics ◽

10.1080/00401706.1986.10488114 ◽

1986 ◽

Vol 28 (2) ◽

pp. 133-138

Author(s):

Alan Agresti

Keyword(s):

Categorical Data ◽

Ordered Categorical Data ◽

Ordered Categorical

Download Full-text

Multivariate Permutation Tests for Ordered Categorical Data

Springer Proceedings in Mathematics & Statistics - Nonparametric Statistics ◽

10.1007/978-3-030-57306-5_21 ◽

2020 ◽

pp. 227-238

Author(s):

Huiting Huang ◽

Fortunato Pesarin ◽

Rosa Arboretti ◽

Riccardo Ceccato

Keyword(s):

Categorical Data ◽

Permutation Tests ◽

Ordered Categorical Data ◽

Ordered Categorical

Download Full-text

Tree-Aggregated Predictive Modeling of Microbiome Data

10.1101/2020.09.01.277632 ◽

2020 ◽

Author(s):

Jacob Bien ◽

Xiaohan Yan ◽

Léo Simpson ◽

Christian L. Müller

Keyword(s):

Data Analysis ◽

Predictive Modeling ◽

Large Scale ◽

High Throughput Sequencing ◽

Compositional Data ◽

Low Cost ◽

Primary Data ◽

Compositional Data Analysis ◽

Taxonomic Rank ◽

Microbiome Data

AbstractModern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven, parameter-free, and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call trac (tree-aggregation of compositional data), learns data-adaptive taxon aggregation levels for predictive modeling making user-defined aggregation obsolete while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human-gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbial ecologists gain insights into the structure and functioning of the underlying ecosystem of interest.

Download Full-text

A REVIEW OF THE ANALYSIS OF ORDERED CATEGORICAL DATA(Part II)

Kodo Keiryogaku (The Japanese Journal of Behaviormetrics) ◽

10.2333/jbhmk.13.33 ◽

1985 ◽

Vol 13 (1) ◽

pp. 33-43 ◽

Cited By ~ 1

Author(s):

Masaaki TSUJITANI

Keyword(s):

Categorical Data ◽

Ordered Categorical Data ◽

Ordered Categorical

Download Full-text