Analysis and Visualization Considerations for Quantitative Social Science Research Using Social Media Data

This is the fifth in a series of white papers providing a summary of the discussions and future directions that are derived from these topical meetings. This paper focuses on issues related to analysis and visual analytics. While these two topics are distinct, there are clear overlaps between the two. It is common to use different visualizations during analysis and given the sheer volume of social media data, visual analytic tools can be important during analysis, as well as during other parts of the research lifecycle. Choices about analysis may be informed by visualization plans and vice versa - both are key in communicating about a data set and what it means. We also recognized that each field of research has different analysis techniques and different levels of familiarity with visual analytics. Putting these two topics into the same meeting provided us with the opportunity to think about analysis and visual analytics/visualization in new, synergistic ways.

Download Full-text

Modeling Considerations for Quantitative Social Science Research Using Social Media Data

10.31234/osf.io/3e2ux ◽

2021 ◽

Author(s):

Ceren Budak ◽

Stuart Soroka ◽

Lisa Singh ◽

Michael Bailey ◽

Leticia Bode ◽

...

Keyword(s):

Social Media ◽

Social Science ◽

Social Science Research ◽

Science Research ◽

Model Fit ◽

Model Construction ◽

Interesting Aspect ◽

Future Directions ◽

Social Media Data ◽

Media Data

In this paper, the fourth in a series of white papers, we provide a summary of the discussions and future directions that came from the topical meeting that focused on model construction with social media data. A particularly interesting aspect of this meeting was, in our view, discussion of the different disciplines’ requirements and approaches to modeling and the different considerations that are used to assess model fit.

Download Full-text

Study Designs for Quantitative Social Science Research Using Social Media

10.31234/osf.io/zp8q2 ◽

2020 ◽

Author(s):

Leticia Bode ◽

Pamela Davis-Kean ◽

Lisa Singh ◽

Tanya Berger-Wolf ◽

Ceren Budak ◽

...

Keyword(s):

Social Media ◽

Social Science ◽

Computer Science ◽

Quantitative Research ◽

Social Science Research ◽

Basic Research ◽

Science Research ◽

Social Media Data ◽

Computer Scientists ◽

Media Data

Social media provides a rich amount of data on the everyday lives, opinions, thoughts, beliefs, and behaviors of individuals and organizations in near real-time. Leveraging these data effectively and responsibly should therefore improve our ability to understand political, psychological, economic, and sociological behaviors and opinions across time. This article is the first in a series of white papers that will provide a summary of the discussions derived from meetings of social scientists and computer scientists with the goal of creating consensus for how social and computer science could converge to answer important questions about complex human behaviors and dynamics using social media data. We present three basic research designs that are commonly used in social science and are applicable to research using social media data: qualitative observation, experiments, and surveys. We also discuss a fourth design that is primarily informed by computer science, non-designed data, but that can inform social science research. After a brief discussion of the general approach of these designs and their applicability for use with social media data, we discuss the challenges associated with their use with social media data and potential solutions for “convergence” of these methods for future quantitative research in the social sciences.

Download Full-text

New Data Sources in Social Science Research: Things to Know Before Working With Reddit Data

Social Science Computer Review ◽

10.1177/0894439319893305 ◽

2019 ◽

pp. 089443931989330 ◽

Cited By ~ 4

Author(s):

Ashley Amaya ◽

Ruben Bach ◽

Florian Keusch ◽

Frauke Kreuter

Keyword(s):

Social Media ◽

Social Science ◽

Social Science Research ◽

Science Research ◽

Lessons Learned ◽

Data Sources ◽

Front Page ◽

Social Media Data ◽

New Research ◽

Media Data

Social media are becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from those with which many social scientists are used to working, so the assumptions often used to plan and manage a project may no longer hold. For example, social media data are so large that they may not be able to be processed on a single machine; they are in file formats with which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this article, we attempt to document several challenges and opportunities encountered when working with Reddit, the self-proclaimed “front page of the Internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this article is specific to Reddit, researchers may also view it as a list of the type of information one may seek to acquire prior to conducting a project that uses any type of social media data.

Download Full-text

Steven Lloyd Wilson Discusses Using Social Media Data for Social Science Research

10.4135/9781526492883 ◽

2019 ◽

Keyword(s):

Social Media ◽

Social Science ◽

Social Science Research ◽

Science Research ◽

Social Media Data ◽

Media Data

Download Full-text

Measurement Considerations for Quantitative Social Science Research Using Social Media Data

10.31234/osf.io/ga6nc ◽

2020 ◽

Author(s):

Jonathan Ladd ◽

Rebecca Ryan ◽

Lisa Singh ◽

Leticia Bode ◽

Ceren Budak ◽

...

Keyword(s):

Social Media ◽

Best Practices ◽

Social Science ◽

Social Science Research ◽

Science Research ◽

Social Scientists ◽

Social Media Data ◽

Measurement Issues ◽

Computer Scientists ◽

Media Data

Harnessing social media data for social science research entails creating measures out of the largely unstructured, noisy data that users generate on different platforms. This harnessing, particularly of data at scale, requires using methods developed in computer science. But it also typically requires integrating these methods with assessments of measurement quality along social science criteria -- reliability, validity and unbiasedness. In this paper, we outline measurement issues that arise when using social media data. We show examples of how to construct measures and discuss different measurement considerations and best practices. We conclude with a discussion of ways to accelerate research in this space, highlighting contributions that can be made by both social scientists and computer scientists.

Download Full-text

Data Acquisition, Sampling, and Data Preparation Considerations for Quantitative Social Science Research Using Social Media Data

10.31234/osf.io/k6vyj ◽

2021 ◽

Author(s):

Zeina Mneimneh ◽

Josh Pasek ◽

Lisa Singh ◽

Rachel Best ◽

Leticia Bode ◽

...

Keyword(s):

Social Media ◽

Data Acquisition ◽

Social Science Research ◽

Science Research ◽

Necessary Condition ◽

Data Preparation ◽

Social Scientists ◽

Social Media Data ◽

Computer Scientists ◽

Media Data

The convergence of methods and relevant theories between computer scientists and social scientists is a necessary condition for leveraging social media data to understand this increasingly important window into human societies. This paper focuses on issues of data acquisition, sampling, and data preparation. These topics incorporate data collection methods, sampling strategies, population mismatch adjustments, and other data acquisition and data preparation decisions.

Download Full-text

Mining Social Media Data to Study the Consequences of Dementia Diagnosis on Caregivers and Relatives (Preprint)

10.2196/preprints.10506 ◽

2018 ◽

Author(s):

Anika Oellrich ◽

George Gkotsis ◽

Richard James Butler Dobson ◽

Tim JP Hubbard ◽

Rina Dutta

Keyword(s):

Social Media ◽

Family Relationships ◽

Text Processing ◽

Automated Analysis ◽

Health Concern ◽

Dementia Diagnosis ◽

Data Set ◽

Social Media Data ◽

Real Time Processing ◽

Media Data

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

A Survey on Visual Analytics of Social Media Data

IEEE Transactions on Multimedia ◽

10.1109/tmm.2016.2614220 ◽

2016 ◽

Vol 18 (11) ◽

pp. 2135-2148 ◽

Cited By ~ 55

Author(s):

Yingcai Wu ◽

Nan Cao ◽

David Gotz ◽

Yap-Peng Tan ◽

Daniel A. Keim

Keyword(s):

Social Media ◽

Visual Analytics ◽

Social Media Data ◽

Media Data

Download Full-text

What Your Tweets Tell Us About You: Identity, Ownership and Privacy of Twitter Data

International Journal of Digital Curation ◽

10.2218/ijdc.v7i1.224 ◽

2012 ◽

Vol 7 (1) ◽

pp. 174-197 ◽

Cited By ~ 9

Author(s):

Heather Small ◽

Kristine Kasianovitz ◽

Ronald Blanford ◽

Ina Celaya

Keyword(s):

Social Media ◽

Social Networking Sites ◽

Data Sets ◽

Data Set ◽

Social Media Data ◽

Twitter Data ◽

Other Information ◽

Rich Data ◽

Additional Value ◽

Media Data

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.

Download Full-text