scholarly journals Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis

2019 ◽  
Author(s):  
Matthew Andreotta ◽  
Robertus Nugroho ◽  
Mark Hurlstone ◽  
Fabio Boschetti ◽  
Simon Farrell ◽  
...  

To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content, without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (Non-Negative Matrix inter-joint Factorization; Topic Alignment) and qualitative (Thematic Analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.

2018 ◽  
Author(s):  
Anika Oellrich ◽  
George Gkotsis ◽  
Richard James Butler Dobson ◽  
Tim JP Hubbard ◽  
Rina Dutta

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.


2021 ◽  
Author(s):  
Hansi Hettiarachchi ◽  
Mariam Adedoyin-Olowe ◽  
Jagdev Bhogal ◽  
Mohamed Medhat Gaber

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.


2012 ◽  
Vol 7 (1) ◽  
pp. 174-197 ◽  
Author(s):  
Heather Small ◽  
Kristine Kasianovitz ◽  
Ronald Blanford ◽  
Ina Celaya

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.


10.2196/18700 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e18700 ◽  
Author(s):  
Jiawei Li ◽  
Qing Xu ◽  
Raphael Cuomo ◽  
Vidya Purushothaman ◽  
Tim Mackey

Background The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. Objective The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. Methods Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. Results A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (P<.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors. Conclusions The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.


2021 ◽  
Vol 12 ◽  
Author(s):  
Muhammad Usman Tariq ◽  
Muhammad Babar ◽  
Marc Poulin ◽  
Akmal Saeed Khattak ◽  
Mohammad Dahman Alshehri ◽  
...  

Intelligent big data analysis is an evolving pattern in the age of big data science and artificial intelligence (AI). Analysis of organized data has been very successful, but analyzing human behavior using social media data becomes challenging. The social media data comprises a vast and unstructured format of data sources that can include likes, comments, tweets, shares, and views. Data analytics of social media data became a challenging task for companies, such as Dailymotion, that have billions of daily users and vast numbers of comments, likes, and views. Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions. This article proposes an architecture using a big data analytics mechanism to efficiently and logically process the huge social media datasets. The proposed architecture is composed of three layers. The main objective of the project is to demonstrate Apache Spark parallel processing and distributed framework technologies with other storage and processing mechanisms. The social media data generated from Dailymotion is used in this article to demonstrate the benefits of this architecture. The project utilized the application programming interface (API) of Dailymotion, allowing it to incorporate functions suitable to fetch and view information. The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing. The effectiveness of the proposed architecture is also highlighted.


Author(s):  
F. O. Ostermann ◽  
H. Huang ◽  
G. Andrienko ◽  
N. Andrienko ◽  
C. Capineri ◽  
...  

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.


2021 ◽  
Author(s):  
J. Bradford Jensen ◽  
Lisa Singh ◽  
Pamela Davis-Kean ◽  
Katharine Abraham ◽  
Paul Beatty ◽  
...  

This is the fifth in a series of white papers providing a summary of the discussions and future directions that are derived from these topical meetings. This paper focuses on issues related to analysis and visual analytics. While these two topics are distinct, there are clear overlaps between the two. It is common to use different visualizations during analysis and given the sheer volume of social media data, visual analytic tools can be important during analysis, as well as during other parts of the research lifecycle. Choices about analysis may be informed by visualization plans and vice versa - both are key in communicating about a data set and what it means. We also recognized that each field of research has different analysis techniques and different levels of familiarity with visual analytics. Putting these two topics into the same meeting provided us with the opportunity to think about analysis and visual analytics/visualization in new, synergistic ways.


2019 ◽  
Vol 3 (3) ◽  
pp. 38 ◽  
Author(s):  
Stefan Spettel ◽  
Dimitrios Vagianos

Social media are heavily used to shape political discussions. Thus, it is valuable for corporations and political parties to be able to analyze the content of those discussions. This is exemplified by the work of Cambridge Analytica, in support of the 2016 presidential campaign of Donald Trump. One of the most straightforward metrics is the sentiment of a message, whether it is considered as positive or negative. There are many commercial and/or closed-source tools available which make it possible to analyze social media data, including sentiment analysis (SA). However, to our knowledge, not many publicly available tools have been developed that allow for analyzing social media data and help researchers around the world to enter this quickly expanding field of study. In this paper, we provide a thorough description of implementing a tool that can be used for performing sentiment analysis on tweets. In an effort to underline the necessity for open tools and additional monitoring on the Twittersphere, we propose an implementation model based exclusively on publicly available open-source software. The resulting tool is capable of downloading Tweets in real-time based on hashtags or account names and stores the sentiment for replies to specific tweets. It is therefore capable of measuring the average reaction to one tweet by a person or a hashtag, which can be represented with graphs. Finally, we tested our open-source tool within a case study based on a data set of Twitter accounts and hashtags referring to the Syrian war, covering a short time window of one week in the spring of 2018. The results show that while high accuracy of commercial or other complicated tools may not be achieved, our proposed open source tool makes it possible to get a good overview of the overall replies to specific tweets, as well as a practical perception of tweets, related to specific hashtags, identifying them as positive or negative.


2017 ◽  
Author(s):  
Valentina Grasso ◽  
Imad Zaza ◽  
Federica Zabini ◽  
Gianni Pantaleo ◽  
Paolo Nesi ◽  
...  

Severe weather impact identification and monitoring through social media data is a good challenge for data science. In last years we assisted to an increase of natural disasters, also due to climate change. Many works showed that during such events people tend to share specific messages by of mean of social media platforms, especially Twitter. Not only they contribute to"situational" awareness also improving the dissemination of information during emergency but can be used to assess social impact of crisis events. We present in this work preliminary findings concerning how temporal distribution of weather related messages may help the identification of severe events that impacted a community. Severe weather events are recognizable by observing the synchronization of twitter streams volumes concerning extractions by using different but semantically graduate terms and hash-tags including the specific containing geo-content names. Impacting events seems immediately recognizable by graphical representation of weather streams and when the time-line show a specific parallel-wise pattern that we named "Half Onion Shape". Different but weather semantically linked twitter streams could exhibits different magnitude, in order to their term popularity, but they show, when a weather event occurs, the same temporal relative maximum. In reason of to these interesting indications, that needs to be confirmed through more deeper analysis, and of the great use of social media, as Twitter, during crisis events it's becoming fundamental to have a suite of suitable tools to monitor social media data. For Twitter data a comprehensive suite of tools is presented: the DISIT-Twitter Vigilance Platform for twitter data retrieve,management and visualization.


Sign in / Sign up

Export Citation Format

Share Document