Information in speech and music is often conveyed through changes in fundamental frequency (f0), the perceptual correlate of which is known as "pitch". One challenge of extracting this information is that such sounds can also vary in their spectral content due to the filtering imposed by a vocal tract or instrument body. Pitch is envisioned as invariant to spectral shape, potentially providing a solution to this challenge, but the extent and nature of this invariance remain poorly understood. We examined the extent to which human pitch judgments are invariant to spectral differences between natural sounds. Listeners performed up/down and interval discrimination tasks with spoken vowels, instrument notes, or synthetic tones, synthesized to be either harmonic or inharmonic (lacking a well-defined f0). Listeners were worse at discriminating pitch across different vowel and instrument sounds compared to when vowels/instruments were the same, being biased by differences in the spectral centroids of the sounds being compared. However, there was no interaction between this effect and that of inharmonicity. In addition, this bias decreased when sounds were separated by short delays. This finding suggests that the representation of a sound's pitch is itself unbiased, but that pitch comparisons between sounds are influenced by changes in timbre, the effect of which weakens over time. Pitch representations thus appears to be relatively invariant to spectral shape. But relative pitch judgments are not, even when spectral shape variation is naturalistic, and when such judgments are based on representations of the f0.