A foundational assumption of human communication is that speakers ought to say as much as necessary, but no more. How speakers determine what is necessary in a given context, however, is unclear. In studies of referential communication, this expectation is often formalized as the idea that speakers should construct reference by selecting the shortest, sufficiently informative, description. Here we propose that reference production is, instead, a process whereby speakers adopt listeners’ perspectives to facilitate their visual search, without concern for utterance length. We show that a computational model of our proposal predicts graded acceptability judgments with quantitative accuracy, systematically outperforming brevity models. Our model also explains crosslinguistic differences in speakers’ propensity to over-specify in different visual contexts. Our findings suggest that reference production is best understood as driven by a cooperative goal to help the listener understand the intended message, rather than by an egocentric effort to minimize utterance length.