Abstract. Multi-model ensembles are frequently used to assess understanding of the response of ozone and methane lifetime to changes in emissions of ozone precursors such as NOx, VOC and CO. When these ozone changes are used to calculate radiative forcing (RF) (and climate metrics such as the global warming potential (GWP) and global temperature potential (GTP)) there is a methodological choice, determined partly by the available computing resources, as to whether the mean ozone (and methane lifetime) changes are input to the radiation code, or whether each model's ozone and methane changes are used as input, with the average RF computed from the individual model RFs. We use data from the Task Force on Hemispheric Transport of Air Pollution Source-Receptor global chemical transport model ensemble to assess the impact of this choice for emission changes in 4 regions (East Asia, Europe, North America and South Asia). We conclude that using the multi-model mean ozone and methane responses is accurate for calculating the mean RF, with differences up to 0.6% for CO, 0.7% for VOC and 2% for NOx. Differences of up to 60% for NOx 7% for VOC and 3% for CO are introduced into the 20 year GWP as a result of the exponential decay terms, with similar values for the 20 years GTP. However, estimates of the SD calculated from the ensemble-mean input fields (where the SD at each point on the model grid is added to or subtracted from the mean field) are almost always substantially larger in RF, GWP and GTP metrics than the true SD, and can be larger than the model range for short-lived ozone RF, and for the 20 and 100 year GWP and 100 year GTP. We find that the effect is generally most marked for the case of NOx emissions, where the net effect is a smaller residual of terms of opposing signs. For example, the SD for the 20 year GWP is two to three times larger using the ensemble-mean fields than using the individual models to calculate the RF. Hence, while the average of multi-model fields are appropriate for calculating mean RF, GWP and GTP, they are not a reliable method for calculating the uncertainty in these fields, and in general overestimate the uncertainty.