EXTRACTING AND COMPARING PLACES USING GEO-SOCIAL MEDIA

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.

Download Full-text

Mining Social Media Data to Study the Consequences of Dementia Diagnosis on Caregivers and Relatives (Preprint)

10.2196/preprints.10506 ◽

2018 ◽

Author(s):

Anika Oellrich ◽

George Gkotsis ◽

Richard James Butler Dobson ◽

Tim JP Hubbard ◽

Rina Dutta

Keyword(s):

Social Media ◽

Family Relationships ◽

Text Processing ◽

Automated Analysis ◽

Health Concern ◽

Dementia Diagnosis ◽

Data Set ◽

Social Media Data ◽

Real Time Processing ◽

Media Data

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.

Download Full-text

Real-time spatio-temporal event detection on geotagged social media

Journal Of Big Data ◽

10.1186/s40537-021-00482-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yasmeen George ◽

Shanika Karunasekera ◽

Aaron Harwood ◽

Kwan Hui Lim

Keyword(s):

New York ◽

Social Media ◽

Event Detection ◽

Detection System ◽

Time And Space ◽

Social Media Data ◽

Event Time ◽

Spatio Temporal ◽

Geographical Space ◽

Media Data

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

Using social media data to map the areas most affected by ISIS in Syria

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-1-26-464-470 ◽

2020 ◽

Vol 26 (1) ◽

pp. 464-470

Author(s):

Mohamad Hasan

Keyword(s):

Social Media ◽

Language Processing ◽

Geographical Information ◽

Data Mapping ◽

Islamic State ◽

Social Media Data ◽

The Social ◽

Mapping Process ◽

Data Source ◽

Media Data

This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.

Download Full-text

What Your Tweets Tell Us About You: Identity, Ownership and Privacy of Twitter Data

International Journal of Digital Curation ◽

10.2218/ijdc.v7i1.224 ◽

2012 ◽

Vol 7 (1) ◽

pp. 174-197 ◽

Cited By ~ 9

Author(s):

Heather Small ◽

Kristine Kasianovitz ◽

Ronald Blanford ◽

Ina Celaya

Keyword(s):

Social Media ◽

Social Networking Sites ◽

Data Sets ◽

Data Set ◽

Social Media Data ◽

Twitter Data ◽

Other Information ◽

Rich Data ◽

Additional Value ◽

Media Data

Social networking sites and other social media have enabled new forms of collaborative communication and participation for users, and created additional value as rich data sets for research. Research based on accessing, mining, and analyzing social media data has risen steadily over the last several years and is increasingly multidisciplinary; researchers from the social sciences, humanities, computer science and other domains have used social media data as the basis of their studies. The broad use of this form of data has implications for how curators address preservation, access and reuse for an audience with divergent disciplinary norms related to privacy, ownership, authenticity and reliability.In this paper, we explore how the characteristics of the Twitter platform, coupled with an ambiguous and evolving understanding of privacy in networked communication, and divergent disciplinary understandings of the resulting data, combine to create complex issues for curators trying to ensure broad-based and ethical reuse of Twitter data. We provide a case study of a specific data set to illustrate how data curators can engage with the topics and questions raised in the paper. While some initial suggestions are offered to librarians and other information professionals who are beginning to receive social media data from researchers, our larger goal is to stimulate discussion and prompt additional research on the curation and preservation of social media data.

Download Full-text

Distant Supervision for Mental Health Management in Social Media: Suicide Risk Classification System Development Study

Journal of Medical Internet Research ◽

10.2196/26119 ◽

2021 ◽

Vol 23 (8) ◽

pp. e26119

Author(s):

Guanghui Fu ◽

Changwei Song ◽

Jianqiang Li ◽

Yue Ma ◽

Pan Chen ◽

...

Keyword(s):

Social Media ◽

Suicide Risk ◽

Health Management ◽

System Development ◽

Common People ◽

Web Based ◽

Social Media Data ◽

Distant Supervision ◽

Data Source ◽

Media Data

Background Web-based social media provides common people with a platform to express their emotions conveniently and anonymously. There have been nearly 2 million messages in a particular Chinese social media data source, and several thousands more are generated each day. Therefore, it has become impossible to analyze these messages manually. However, these messages have been identified as an important data source for the prevention of suicide related to depression disorder. Objective We proposed in this paper a distant supervision approach to developing a system that can automatically identify textual comments that are indicative of a high suicide risk. Methods To avoid expensive manual data annotations, we used a knowledge graph method to produce approximate annotations for distant supervision, which provided a basis for a deep learning architecture that was built and refined by interactions with psychology experts. There were three annotation levels, as follows: free annotations (zero cost), easy annotations (by psychology students), and hard annotations (by psychology experts). Results Our system was evaluated accordingly and showed that its performance at each level was promising. By combining our system with several important psychology features from user blogs, we obtained a precision of 80.75%, a recall of 75.41%, and an F1 score of 77.98% for the hardest test data. Conclusions In this paper, we proposed a distant supervision approach to develop an automatic system that can classify high and low suicide risk based on social media comments. The model can therefore provide volunteers with early warnings to prevent social media users from committing suicide.

Download Full-text

Analysis and Visualization Considerations for Quantitative Social Science Research Using Social Media Data

10.31234/osf.io/p2j5z ◽

2021 ◽

Author(s):

J. Bradford Jensen ◽

Lisa Singh ◽

Pamela Davis-Kean ◽

Katharine Abraham ◽

Paul Beatty ◽

...

Keyword(s):

Social Media ◽

Visual Analytics ◽

Social Science Research ◽

Science Research ◽

Data Set ◽

Future Directions ◽

Social Media Data ◽

Analysis Techniques ◽

Media Data ◽

Different Levels

This is the fifth in a series of white papers providing a summary of the discussions and future directions that are derived from these topical meetings. This paper focuses on issues related to analysis and visual analytics. While these two topics are distinct, there are clear overlaps between the two. It is common to use different visualizations during analysis and given the sheer volume of social media data, visual analytic tools can be important during analysis, as well as during other parts of the research lifecycle. Choices about analysis may be informed by visualization plans and vice versa - both are key in communicating about a data set and what it means. We also recognized that each field of research has different analysis techniques and different levels of familiarity with visual analytics. Putting these two topics into the same meeting provided us with the opportunity to think about analysis and visual analytics/visualization in new, synergistic ways.

Download Full-text

Performance Evaluation of Fuzzy C Mean Clustering on Social Media Data Set

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i6.13761380 ◽

2018 ◽

Vol 6 (6) ◽

pp. 1376-1380

Author(s):

Kothapalli Revathi ◽

Chalumuri Avinash

Keyword(s):

Social Media ◽

Performance Evaluation ◽

Data Set ◽

Social Media Data ◽

Media Data

Download Full-text

Studies of depression and anxiety using Reddit as a data source: Scoping review (Preprint)

10.2196/preprints.29487 ◽

2021 ◽

Author(s):

Nick Boettcher

Keyword(s):

Mental Health ◽

Social Media ◽

Scoping Review ◽

Primary Data ◽

Technical Solution ◽

Depression And Anxiety ◽

Social Media Data ◽

Primary Data Source ◽

Data Source ◽

Media Data

BACKGROUND The study of depression and anxiety using publicly available social media data is a research activity that has grown considerably over the last decade. The discussion platform Reddit has become a popular social media data source in this nascent area of study, in part because of the unique ways in which the platform is facilitative of research. To date, no work has been done to synthesize existing studies of depression and anxiety using Reddit. OBJECTIVE The objective of this review is to understand the scope and nature of research using Reddit as a primary data source for studying depression and anxiety. METHODS A scoping review was conducted using the Arksey and O’Malley framework. Academic databases searched include MEDLINE/PubMed, EMBASE, CINAHL, PsycINFO, PsycARTICLES, Scopus, ScienceDirect, IEEE Xplore, and ACM database. Inclusion criteria were developed using the Participants/Concept/Context framework outlined by the Joanna Briggs Institute Scoping Review Methodology Group. Eligible studies featured a methodological focus on analyzing depression and/or anxiety using naturalistic written expressions from Reddit users as the primary data source. RESULTS 54 Studies were included for review. Tables and corresponding analysis delineate key methodological features including a comparatively larger focus on depression versus anxiety, an even split of original and premade datasets, a favored analytic focus on classifying the mental health states of Reddit users, and practical implications often recommending new methods of professionally-driven mental health monitoring and outreach for Reddit users. CONCLUSIONS Studies of depression and anxiety using Reddit data are currently driven by a prevailing methodology which favors a technical, solution-based orientation. Researchers interested in advancing this research area will benefit from further consideration of conceptual issues surrounding interpretation of Reddit data with the medical model of mental health. Further efforts are also needed to locate accountability and autonomy within practice implications suggesting new forms of engagement with Reddit users.

Download Full-text

Twitter Analyzer—How to Use Semantic Analysis to Retrieve an Atmospheric Image around Political Topics in Twitter

Big Data and Cognitive Computing ◽

10.3390/bdcc3030038 ◽

2019 ◽

Vol 3 (3) ◽

pp. 38 ◽

Cited By ~ 1

Author(s):

Stefan Spettel ◽

Dimitrios Vagianos

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Open Source ◽

Semantic Analysis ◽

Time Window ◽

Data Set ◽

Social Media Data ◽

Open Source Tool ◽

Implementation Model ◽

Media Data

Social media are heavily used to shape political discussions. Thus, it is valuable for corporations and political parties to be able to analyze the content of those discussions. This is exemplified by the work of Cambridge Analytica, in support of the 2016 presidential campaign of Donald Trump. One of the most straightforward metrics is the sentiment of a message, whether it is considered as positive or negative. There are many commercial and/or closed-source tools available which make it possible to analyze social media data, including sentiment analysis (SA). However, to our knowledge, not many publicly available tools have been developed that allow for analyzing social media data and help researchers around the world to enter this quickly expanding field of study. In this paper, we provide a thorough description of implementing a tool that can be used for performing sentiment analysis on tweets. In an effort to underline the necessity for open tools and additional monitoring on the Twittersphere, we propose an implementation model based exclusively on publicly available open-source software. The resulting tool is capable of downloading Tweets in real-time based on hashtags or account names and stores the sentiment for replies to specific tweets. It is therefore capable of measuring the average reaction to one tweet by a person or a hashtag, which can be represented with graphs. Finally, we tested our open-source tool within a case study based on a data set of Twitter accounts and hashtags referring to the Syrian war, covering a short time window of one week in the spring of 2018. The results show that while high accuracy of commercial or other complicated tools may not be achieved, our proposed open source tool makes it possible to get a good overview of the overall replies to specific tweets, as well as a practical perception of tweets, related to specific hashtags, identifying them as positive or negative.

Download Full-text