Traditional experiments indicate that prediction is important for the efficient processing of incoming speech. In three virtual reality (VR) visual world paradigm experiments, we here tested whether such findings hold in naturalistic settings (Experiment 1) and provided novel insights into whether disfluencies in speech (repairs/hesitations) inform one’s predictions in rich environments (Experiments 2-3). In all three experiments, participants’ eye movements were recorded while they listened to sentences spoken by a virtual agent during a virtual tour of eight scenes. Experiment 1 showed that listeners predict upcoming speech in naturalistic environments, with a higher proportion of anticipatory target fixations in Restrictive (predictable) compared to Unrestrictive (unpredictable) trials. Experiments 2-3 provided novel findings that disfluencies reduce anticipatory fixations towards a predicted referent in naturalistic environments, compared to Conjunction sentences (Experiment 2) and Fluent sentences (Experiment 3). Unexpectedly, Experiment 2 provided no evidence that participants made new predictions from a repaired verb – there was no increase in the proportion of fixations towards objects compatible with the repaired verb – thereby supporting an attention rather than a predictive account of effects of repair disfluencies on sentence processing. Experiment 3 provided novel evidence that the proportion of fixations to the speaker increased upon hearing a hesitation, supporting current theories of the effects of hesitations on sentence processing. Together, these findings contribute to a better understanding of how listeners make use of visual (objects, speaker) and auditory (speech, including disfluencies) information to predict upcoming words.