What Your Tweets Tell Us About You: Identity, Ownership and Privacy of Twitter Data

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.

Download Full-text

Mining Social Media Data to Study the Consequences of Dementia Diagnosis on Caregivers and Relatives (Preprint)

10.2196/preprints.10506 ◽

2018 ◽

Author(s):

Anika Oellrich ◽

George Gkotsis ◽

Richard James Butler Dobson ◽

Tim JP Hubbard ◽

Rina Dutta

Keyword(s):

Social Media ◽

Family Relationships ◽

Text Processing ◽

Automated Analysis ◽

Health Concern ◽

Dementia Diagnosis ◽

Data Set ◽

Social Media Data ◽

Real Time Processing ◽

Media Data

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

Changing Trends in Long-Term Sentiments and Neighborhood Determinants in a Shrinking City

Journal of Planning Education and Research ◽

10.1177/0739456x211044215 ◽

2021 ◽

pp. 0739456X2110442

Author(s):

Yunmi Park ◽

Minju Kim ◽

Jiyeon Shin ◽

Megan E. Heim LaFrombois

Keyword(s):

Social Media ◽

Built Environment ◽

Urban Regeneration ◽

Neighborhood Conditions ◽

Social Media Data ◽

Twitter Data ◽

Changing Trends ◽

Shrinking City ◽

Media Data

This research examined social media’s role in understanding perceptions about the spaces in which individuals interact, what planners can learn from social media data, and how to use social media to inform urban regeneration efforts. Using Twitter data from 2010 to 2018 recorded in one U.S. shrinking city, Detroit, Michigan, this paper longitudinally investigated topics that people discuss, their emotions, and neighborhood conditions associated with these topics and sentiments. Findings demonstrate that neighborhood demographics, socioeconomic, and built environment conditions impact people’s sentiments.

Download Full-text

Using Social Media Data as Research Data

International Journal for Innovation Education and Research ◽

10.31686/ijier.vol1.iss3.114 ◽

2013 ◽

Vol 1 (3) ◽

pp. 49-55

Author(s):

Suman Silwal ◽

Dale W Callahan

Keyword(s):

Social Media ◽

Social Networking ◽

Social Networking Sites ◽

Communication Channel ◽

Social Networking Site ◽

Science And Engineering ◽

Social Media Data ◽

Breaking News ◽

Current State ◽

Media Data

Social Media (SM) is becoming a normal part of everyday life. The information generated from Social Media (SM) data is becoming increasingly utilized as a communication channel for market trend, brand awareness, breaking news, and online social interaction between person to person. SM is also rapidly growing and maturing [1]. Further, SM is becoming a reliable tool for interdisciplinary industries like banks, travel, healthcare, biotech, software, sports etc.SM data can also be used as a research tool to apply in different areas of Humanities, Art, Science and Engineering. There are unlimited possibilities using Social Networking Site (SNS) to collect, process and evaluate data. This paper reviews the current state of Social Networking Sites and Text-based Language Processes, and how it can be used to generate valuable information.

Download Full-text

EXTRACTING AND COMPARING PLACES USING GEO-SOCIAL MEDIA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-3-w5-311-2015 ◽

2015 ◽

Vol II-3/W5 ◽

pp. 311-316

Author(s):

F. O. Ostermann ◽

H. Huang ◽

G. Andrienko ◽

N. Andrienko ◽

C. Capineri ◽

...

Keyword(s):

Social Media ◽

Semantic Similarity ◽

Data Set ◽

Social Media Data ◽

Temporal Clustering ◽

Depth Study ◽

Data Source ◽

Spatio Temporal ◽

Media Data

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.

Download Full-text

A Comprehensive Analysis of Approaches for Sentiment Analysis Using Twitter Data on COVID-19 Vaccines

Journal of Informatics Electrical and Electronics Engineering (JIEEE) ◽

10.54060/jieee/002.02.009 ◽

2021 ◽

Vol 2 (2) ◽

pp. 1-10

Author(s):

Amrita Mishra ◽

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Text Classification ◽

Comprehensive Analysis ◽

Social Media Data ◽

The Public ◽

Opinion Analysis ◽

Twitter Data ◽

Media Data

Sentiment Analysis has paved routes for opinion analysis of masses over unrestricted territorial limits. With the advent and growth of social media like Twitter, Facebook, WhatsApp, Snapchat in today’s world, stakeholders and the public often takes to expressing their opinion on them and drawing conclusions. While these social media data are extremely informative and well connected, the major challenge lies in incorporating efficient Text Classification strategies which not only overcomes the unstructured and humongous nature of data but also generates correct polarity of opinions (i.e. positive, negative, and neutral). This paper is a thorough effort to provide a brief study about various approaches to SA including Machine Learning, Lexicon Based, and Automatic Approaches. The paper also highlights the comparison of positive, negative, and neutral tweets of the Sputnik V, Moderna, and Covaxin vaccines used for preventive and emergency use of COVID-19 disease.

Download Full-text

SPATIAL-TEMPORAL ANALYSIS OF SOCIAL MEDIA DATA RELATED TO NEPAL EARTHQUAKE 2015

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b2-567-2016 ◽

2016 ◽

Vol XLI-B2 ◽

pp. 567-571

Author(s):

L. Thapa

Keyword(s):

Social Media ◽

Temporal Analysis ◽

Archaeological Sites ◽

Social Media Data ◽

User Input ◽

Nepal Earthquake ◽

Twitter Data ◽

The Social ◽

Media Data ◽

Spatial Temporal Analysis

Social Medias these days have become the instant communication platform to share anything; from personal feelings to the matter of public concern, these are the easiest and aphoristic way to deliver information among the mass. With the development of Web 2.0 technologies, more and more emphasis has been given to user input in the web; the concept of Geoweb is being visualized and in the recent years, social media like Twitter, Flicker are among the popular Location Based Social Medias with locational functionality enabled in them. Nepal faced devastating earthquake on 25 April, 2015 resulting in the loss of thousands of lives, destruction in the historical-archaeological sites and properties. Instant help was offered by many countries around the globe and even lots of NGOs, INGOs and people started the rescue operations immediately; concerned authorities and people used different communication medium like Frequency Modulation Stations, Television, and Social Medias over the World Wide Web to gather information associated with the Quake and to ease the rescue activities. They also initiated campaign in the Social Media to raise the funds and support the victims. Even the social medias like Facebook, Twitter, themselves announced the helping campaign to rebuild Nepal. In such scenario, this paper features the analysis of Twitter data containing hashtag related to Nepal Earthquake 2015 together with their temporal characteristics, when were the message generated, where were these from and how these spread spatially over the internet?

Download Full-text

Linking Twitter and survey data: asymmetry in quantity and its impact

EPJ Data Science ◽

10.1140/epjds/s13688-021-00286-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Tarek Al Baghal ◽

Alexander Wenz ◽

Luke Sloan ◽

Curtis Jessop

Keyword(s):

Social Media ◽

Survey Data ◽

Linked Data ◽

Social Research ◽

Data Sources ◽

Social Media Data ◽

Twitter Data ◽

Unique Source ◽

Methodological Aspects ◽

Media Data

AbstractLinked social media and survey data have the potential to be a unique source of information for social research. While the potential usefulness of this methodology is widely acknowledged, very few studies have explored methodological aspects of such linkage. Respondents produce planned amounts of survey data, but highly variant amounts of social media data. This study explores this asymmetry by examining the amount of social media data available to link to surveys. The extent of variation in the amount of data collected from social media could affect the ability to derive meaningful linked indicators and could introduce possible biases. Linked Twitter data from respondents to two longitudinal surveys representative of Great Britain, the Innovation Panel and the NatCen Panel, show that there is indeed substantial variation in the number of tweets posted and the number of followers and friends respondents have. Multivariate analyses of both data sources show that only a few respondent characteristics have a statistically significant effect on the number of tweets posted, with the number of followers being the strongest predictor of posting in both panels, women posting less than men, and some evidence that people with higher education post less, but only in the Innovation Panel. We use sentiment analyses of tweets to provide an example of how the amount of Twitter data collected can impact outcomes using these linked data sources. Results show that more negatively coded tweets are related to general happiness, but not the number of positive tweets. Taken together, the findings suggest that the amount of data collected from social media which can be linked to surveys is an important factor to consider and indicate the potential for such linked data sources in social research.

Download Full-text

Analysis and Visualization Considerations for Quantitative Social Science Research Using Social Media Data

10.31234/osf.io/p2j5z ◽

2021 ◽

Author(s):

J. Bradford Jensen ◽

Lisa Singh ◽

Pamela Davis-Kean ◽

Katharine Abraham ◽

Paul Beatty ◽

...

Keyword(s):

Social Media ◽

Visual Analytics ◽

Social Science Research ◽

Science Research ◽

Data Set ◽

Future Directions ◽

Social Media Data ◽

Analysis Techniques ◽

Media Data ◽

Different Levels

This is the fifth in a series of white papers providing a summary of the discussions and future directions that are derived from these topical meetings. This paper focuses on issues related to analysis and visual analytics. While these two topics are distinct, there are clear overlaps between the two. It is common to use different visualizations during analysis and given the sheer volume of social media data, visual analytic tools can be important during analysis, as well as during other parts of the research lifecycle. Choices about analysis may be informed by visualization plans and vice versa - both are key in communicating about a data set and what it means. We also recognized that each field of research has different analysis techniques and different levels of familiarity with visual analytics. Putting these two topics into the same meeting provided us with the opportunity to think about analysis and visual analytics/visualization in new, synergistic ways.

Download Full-text

Performance Evaluation of Fuzzy C Mean Clustering on Social Media Data Set

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i6.13761380 ◽

2018 ◽

Vol 6 (6) ◽

pp. 1376-1380

Author(s):

Kothapalli Revathi ◽

Chalumuri Avinash

Keyword(s):

Social Media ◽

Performance Evaluation ◽

Data Set ◽

Social Media Data ◽

Media Data

Download Full-text