The vast majority of everyday social practices involves a form of joint interaction with the environment which relies on establishing a shared attentional focus (Tomasello, Carpenter, Call, Behne, & Moll, 2005). Languages mediate humans’ ability to coordinate attention via a specific class of words: spatial demonstratives. Words like the pronouns “this” or “that”, or the adverbs “here” and “there” are among the few undisputed language universals (Diessel, 1999, 2014). They are developmental (Capirci, Iverson, Pizzuto, & Volterra, 1996) and evolutionary (Diessel, 2006; 2013) cornerstones of language, and they are among the most frequent words in the lexicon (Leech & Rayson, 2014). Demonstratives are deictic expressions (from Ancient Greek deixis, “demonstration, indication”). They can in principle be used to indicate any object, and their meaning depends on the context of utterance (H. H. Clark & Bangerter, 2004; Diessel, 1999; S. C. Levinson, 1983a, 2004). Identifying their referent thus hinges on the availability of information on the perceptual context (which objects are perceptually available), multimodal cues (pointing gestures, gaze; Clark & Bangerter, 2004; Cooperrider, 2016; García, Ehlers, & Tylén, 2017; Stevens & Zhang, 2013), expectations (which objects are relevant for the present interaction; what the speaker may intend to refer to; Clark, 1996; Levinson, 1983) and cues provided by the use of specific demonstrative forms (e.g. a proximal “this” vs. a distal “that”). Yet, in spite of their semantically underspecified nature, these expressions function as powerful and effective coordination devices for social interaction. But what are the neural and cognitive mechanisms that enable the integration of linguistic, perceptual and pragmatic information required for the comprehension of demonstratives? Which cues on the intended referent does the use of proximal versus distal demonstrative forms actually provide? And finally, how can demonstratives function as effective tools for social interaction, in spite of their semantic vagueness? In the present thesis, I report three studies where these questions were addressed using novel experimental paradigms. The results of the three studies provide novel insights on the neural and cognitive underpinnings of demonstratives, as well as their key function in social interaction. However, their scope goes beyond an understanding of demonstratives per se. First, knowledge on the neural substrates of spatial demonstratives is informative with respect to the bigger question of how the brain extracts meaning from linguistic input. Secondly, our findings provide general insights on the relationship between language processing and extralinguistic cognition, highlighting its tight link to perception and attention (Article 1), to functional, action-oriented representations of the physical world (Article 2), and the role of partner-oriented adaptations of language use in successful social coordination (Article 3). Finally, this dissertation contributes to the development of new experimental methods for further research using demonstratives as a searchlight into core aspects of human cognition, and it outlines concrete suggestions in this direction.