Archival Data Thinking and Practices for Social Media Collections
In the Web 2.0 Era, most social media archives are born digital and large-scale. With an increasing need for processing them at a fast speed, researchers and archivists have started applying data science methods in managing social media data collections. However, many of the current computational or data-driven archival processing methods are missing the critical background understandings like “why we need to use computational methods,” and “how to evaluate and improve data-driven applications.” As a result, many computational archival science (CAS) attempts, with comparatively narrow scopes and low efficiencies, are not sufficiently holistic. In this talk, we first introduce the proposed concept of “Archival Data Thinking” that highlights the desirable comprehensiveness in mapping data science mindsets to archival practices. Next, we examine several examples of implementing “Archival Data Thinking” in processing two social media collections: (i) the COVID-19 Hate Speech Twitter Archive (CHSTA) and (ii) the Counter-anti-Asian Hate Twitter Archive (CAAHTA), both of which are with millions of records and their metadata, and needs for rapid processing. Finally, as a future research direction, we briefly discuss the standards and infrastructures that can better support the implementation of “Archival Data Thinking”.