Because speaking rates are highly variable, listeners must use cues like phoneme or sentence duration to scale or normalize speech across different contexts. Scaling speech perception in this way allows listeners to distinguish between temporal contrasts, like voiced and voiceless stops, even at different speech speeds. It has long been assumed that this normalization or adjustment of speaking rate can occur over individual phonemes. However, phonemes are often undefined in running speech, so it is not clear that listeners can rely on them for normalization. To evaluate this, we isolate two potential processing units for speaking rate normalization---the phoneme and the syllable---by manipulating phoneme duration in order to cue speaking rate, while also holding syllable duration constant. In doing so, we show that changing the duration of phonemes both with unique acoustic signatures (/k\textscripta/) and overlapping acoustic signatures (/w\textsci/) results in a speaking rate normalization effect. These results suggest that even absent clear acoustic boundaries within syllables, listeners can normalize for rate differences on the basis of individual phonemes.