Social media data
The increasing availability of huge volumes of social media ‘Big Data’ from Facebook, Flickr, Instagram, Twitter and other social network platforms, combined with the development of software designed to operate at web scale, has fuelled the growth of computational social science. Often analysed by ‘data scientists’, social media data differ substantially from the datasets officially disseminated as by-products of government-sponsored activity, such as population censuses or administrative data, which have long been analysed by professional statisticians. This chapter outlines the characteristics of social media data and identifies key data sources and methods of data capture, introducing several of the technologies used to acquire, store, query, visualise and augment social media data. Unrepresentativeness of, and lack of (geo)demographic control in, social media data are problematic for population-based research. These limitations, alongside wider epistemological and ethical concerns surrounding data validity, inadvertent co-option into research and protection of user privacy, suggest that caution should be exercised when analysing social media datasets. While care must be taken to respect personal privacy and sample assiduously, this chapter concludes that statisticians, who may be unfamiliar with some of the programmatic steps involved in accessing social media data, must play a pivotal role in analysing it.