Text Analysis in Python for Social Scientists

abstractRecent years have seen rapid developments in automated text analysis methods focused on measuring psychological and demographic properties. While this development has mainly been driven by computer scientists and computational linguists, such methods can be of great value for social scientists in general, and for psychologists in particular. In this paper, we review some of the most popular approaches to automated text analysis from the perspective of social scientists, and give examples of their applications in different theoretical domains. After describing some of the pros and cons of these methods, we speculate about future methodological developments, and how they might change social sciences. We conclude that, despite the fact that current methods have many disadvantages and pitfalls compared to more traditional methods of data collection, the constant increase of computational power and the wide availability of textual data will inevitably make automated text analysis a common tool for psychologists.

Download Full-text

Rethinking the ‘Great Divide’

Science & Technology Studies ◽

10.23987/sts.97321 ◽

2020 ◽

Vol 34 (1) ◽

pp. 19-42

Author(s):

David Moats

Keyword(s):

Big Data ◽

Text Analysis ◽

Data Science ◽

Interdisciplinary Collaboration ◽

Digital Tools ◽

Social Scientists ◽

New Methods ◽

Automated Text Analysis ◽

Growing Body ◽

Truth Claims

It is often claimed that the rise of so called ‘big data’ and computationally advanced methods may exacerbate tensions between disciplines like data science and anthropology. This paper is an attempt to reflect on these possible tensions and their resolution, empirically. It contributes to a growing body of literature which observes interdisciplinary collabrations around new methods and digital infrastructures in practice but argues that many existing arrangements for interdisciplinary collaboration enforce a separation between disciplines in which identities are not really put at risk. In order to disrupt these standard roles and routines we put on a series of workshops in which mainly self-identified qualitative or non-technical researchers were encouraged to use digital tools (scrapers, automated text analysis and data visualisations). The paper focuses on three empirical examples from the workshops in which tensions, both between disciplines and methods, flared up and how they were ultimately managed or settled. In order to characterise both these tensions and negotiating strategies I draw on Woolgar and Stengers’ use of the humour and irony to describe how disciplines relate to each others truth claims. I conclude that while there is great potential in more open-ended collaborative settings, qualitative social scientists may need to confront some of their own disciplinary baggage in order for better dialogue and more radical mixings between disciplines to occur.

Download Full-text

Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts

EPJ Data Science ◽

10.1140/epjds/s13688-021-00260-3 ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Ryan J. Gallagher ◽

Morgan R. Frank ◽

Lewis Mitchell ◽

Aaron J. Schwartz ◽

Andrew J. Reagan ◽

...

Keyword(s):

Text Analysis ◽

Weighted Average ◽

Social Scientists ◽

Diagnostic Investigation ◽

Urban Green Spaces ◽

Fine Grained ◽

Common Task ◽

Single Number ◽

Scientific Narratives ◽

Measurement Validity

AbstractA common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and entropy-based measures like the Kullback–Leibler and Jensen–Shannon divergences. Through a diverse set of case studies ranging from presidential speeches to tweets posted in urban green spaces, we demonstrate how generalized word shift graphs can be flexibly applied across domains for diagnostic investigation, hypothesis generation, and substantive interpretation. By providing a detailed lens into textual shifts between corpora, generalized word shift graphs help computational social scientists, digital humanists, and other text analysis practitioners fashion more robust scientific narratives.

Download Full-text

From Words to Numbers: Getting Started with Text Analysis for Applied Social Scientists

Business Communication Research and Practice ◽

10.22682/bcrp.2020.3.2.122 ◽

2020 ◽

Vol 3 (2) ◽

pp. 122-129

Author(s):

Hyun Woo Kim ◽

Hyejung Chang

Keyword(s):

Text Analysis ◽

Social Scientists

Download Full-text

Analyzing Language in Suicide Notes and Legacy Tokens

Crisis ◽

10.1027/0227-5910/a000363 ◽

2016 ◽

Vol 37 (2) ◽

pp. 140-147 ◽

Cited By ~ 6

Author(s):

Michael J. Egnoto ◽

Darrin J. Griffin

Keyword(s):

Text Analysis ◽

Low Cost ◽

Computer Software ◽

Student Writing ◽

Computer Assisted ◽

Automated Identification ◽

Completed Suicide ◽

Suicide Notes ◽

Self Harm ◽

Harm To Others

Abstract. Background: Identifying precursors that will aid in the discovery of individuals who may harm themselves or others has long been a focus of scholarly research. Aim: This work set out to determine if it is possible to use the legacy tokens of active shooters and notes left from individuals who completed suicide to uncover signals that foreshadow their behavior. Method: A total of 25 suicide notes and 21 legacy tokens were compared with a sample of over 20,000 student writings for a preliminary computer-assisted text analysis to determine what differences can be coded with existing computer software to better identify students who may commit self-harm or harm to others. Results: The results support that text analysis techniques with the Linguistic Inquiry and Word Count (LIWC) tool are effective for identifying suicidal or homicidal writings as distinct from each other and from a variety of student writings in an automated fashion. Conclusion: Findings indicate support for automated identification of writings that were associated with harm to self, harm to others, and various other student writing products. This work begins to uncover the viability or larger scale, low cost methods of automatic detection for individuals suffering from harmful ideation.

Download Full-text