Abstract. A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records is developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance changes and greenhouse gas concentrations. Two statistical tests are formulated. Firstly, a preliminary test to establish whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The new methods are applied in a pseudo-proxy experiment. Here, a set of previously published millennial forced model simulations, including both "low" and "high" solar radiative forcing histories together with other common forcings, were used to define "true" target temperatures as well as pseudo-proxy and pseudo-instrumental series. The pseudo-proxies were created to reflect current proxy locations and noise levels, where it was found that the low and high solar full-forcing simulations could be distinguished when the latter were used as targets. When the former were used as targets, a greater number of proxy locations were needed to make this distinction. It was also found that to improve detectability of the low solar simulations, increasing the signal-to-noise ratio was more efficient than increasing the spatial coverage of the proxy network. In the next phase of the work, we will apply these methods to real proxy and instrumental data, with the aim to distinguish which of the two solar forcing histories is most compatible with the observed/reconstructed climate.