Speech events do not typically exhibit the temporal regularity conspicuous in many musical rhythms. In the absence of such surface periodicity, hierarchical approaches to speech timing propose that nested prosodic domains, such as syllables and stress-delimited feet, can be modelled as coupled oscillators and that surface timing patterns reflect variation in the relative weights of oscillators. Localized approaches argue, by contrast, that speech timing is largely organized bottom-up, based on segmental identity and subsyllabic organization, with prosodic lengthening effects locally associated with domain heads and edges. This chapter weighs the claims of the two speech timing approaches against empirical data. It also reviews attempts to develop quantitative indices (‘rhythm metrics’) of cross-linguistic variations in surface timing, in particular in the degree of contrast between stronger and weaker syllables. It further reflects on the shortcomings of categorical ‘rhythm class’ typologies in the face of cross-linguistic evidence from speech production and speech perception.