AbstractThe potential of normative modeling to make individualized predictions has led to structural neu-roimaging results that go beyond the case-control approach. However, site effects, often con-founded with variables of interest in a complex manner, induce a bias in estimates of normative models, which has impeded the application of normative models to large multi-site neuroimaging data sets. In this study, we suggest accommodating for these site effects by including them as random effects in a hierarchical Bayesian model. We compare the performance of a linear and a non-linear hierarchical Bayesian model in modeling the effect of age on cortical thickness. We used data of 570 healthy individuals from the ABIDE (autism brain imaging data exchange, http://preprocessed-connectomes-project.org/abide/) data set in our experiments. We compare the proposed method to several harmonization techniques commonly used to deal with additive and multiplicative site effects, including regressing out site and harmonizing for site with ComBat, both with and without explicitly preserving variance related to age and sex as biological variation of interest. In addition, we make predictions from raw data, in which site has not been accommodated for. The proposed hierarchical Bayesian method shows the best performance according to multiple metrics. Performance is particularly bad for the regression model and the ComBat model when age and sex are not explicitly modeled. In addition, the predictions of those models are noticeably poorly calibrated, suffering from a loss of more than 90 % of the original variance. From these results we conclude that harmonization techniques like regressing out site and ComBat do not sufficiently accommodate for multi-site effects in pooled neuroimaging data sets. Our results show that the complex interaction between site and variables of interest is likely to be underestimated by those tools. One consequence is that harmonization techniques removed too much variance, which is undesirable and may have unpredictable consequences for subsequent analysis. Our results also show that this can be mostly avoided by explicitly modeling site as part of a hierarchical Bayesian Model. We discuss the potential of z-scores derived from normative models to be used as site corrected variables and of our method as site correction tool.