Patterns of 'Analytical Irreproducibility' in Multimodal Diseases
Abstract Background: Multimodal diseases are those in which affected individuals can be divided into subtypes (or ‘data modes’); for instance, ‘mild’ vs. ‘severe’, based on (unknown) modifiers of disease severity exemplified in the majority of microbiome-mediated human diseases. Studies have shown that despite the inclusion of a large number of subjects, the causal role of the microbiome in human diseases remains uncertain. The role of the microbiome in multimodal diseases has been studied in animals; however, findings are often deemed irreproducible, or unreasonably biased, with pathogenic roles in 95% of reports. As a solution to repeatability, investigators have been recently recommended to seek funds to increase the number of human-microbiome donors (N) to increase the reproducibility of animal studies. Herein, we outline the constraints of such recommendation. Results: Using published (observed) mean±SD microbiome data from human gut microbiota (hGM)-associated rodent studies, we illustrate through a series of simulations, that increasing N will not uniformly/universally enable the identification of consistent statistical differences (patterns of analytical irreproducibility), due to random sampling from a population with ample variability in disease and the presence of ‘disease data subtypes’ (or modes). To visualize data distribution, we used kernel-density-violin plots (rarely used in rodent studies; 0%, 0/38, 95%CI=6.9e-18,9.1) as a method to identify ‘disease data subtypes’. We also found that hGM preclinical rodent studies do not use cluster statistics when needed (97.4%, 37/38, 95%CI=86.5,99.5), and that scientists who increased N, concurrently reduced the number of mice/donor ( y =-0.21 x, R 2 =0.24 ; and vice versa), indicating that statistically, scientists replace the disease variance in mice by the variance of human disease in their studies. Conclusion: Instead of assuming that increasing N will solve reproducibility and identify clinically-predictive findings on causality in preclinical microbiome studies, we propose the visualization of data distribution using kernel-density’-violin plots to identify ‘disease data subtypes’ to self-correct, guide and promote the personalized investigation of disease subtype mechanisms.