Normalization Strategies for Enhancing Spatio-Temporal Analysis of Social Media Responses during Extreme Events: A Case Study based on Analysis of Four Extreme Events using Socio-Environmental Data Explorer (SEDE)

With social media becoming increasingly location-based, there has been a greater push from researchers across various domains including social science, public health, and disaster management, to tap in the spatial, temporal, and textual data available from these sources to analyze public response during extreme events such as an epidemic outbreak or a natural disaster. Studies based on demographics and other socio-economic factors suggests that social media data could be highly skewed based on the variations of population density with respect to place. To capture the spatio-temporal variations in public response during extreme events we have developed the Socio-Environmental Data Explorer (SEDE). SEDE collects and integrates social media, news and environmental data to support exploration and assessment of public response to extreme events. For this study, using SEDE, we conduct spatio-temporal social media response analysis on four major extreme events in the United States including the “North American storm complex” in December 2015, the “snowstorm Jonas” in January 2016, the “West Virginia floods” in June 2016, and the “Hurricane Matthew” in October 2016. Analysis is conducted on geo-tagged social media data from Twitter and warnings from the storm events database provided by National Centers For Environmental Information (NCEI) for analysis. Results demonstrate that, to support complex social media analyses, spatial and population-based normalization and filtering is necessary. The implications of these results suggests that, while developing software solutions to support analysis of non-conventional data sources such as social media, it is quintessential to identify the inherent biases associated with the data sources, and adapt techniques and enhance capabilities to mitigate the bias. The normalization strategies that we have developed and incorporated to SEDE will be helpful in reducing the population bias associated with social media data and will be useful for researchers and decision makers to enhance their analysis on spatio-temporal social media responses during extreme events.

Download Full-text

Real-time spatio-temporal event detection on geotagged social media

Journal Of Big Data ◽

10.1186/s40537-021-00482-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yasmeen George ◽

Shanika Karunasekera ◽

Aaron Harwood ◽

Kwan Hui Lim

Keyword(s):

New York ◽

Social Media ◽

Event Detection ◽

Detection System ◽

Time And Space ◽

Social Media Data ◽

Event Time ◽

Spatio Temporal ◽

Geographical Space ◽

Media Data

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

Download Full-text

Using Social Media Data to Plan for Tourism

Quaestiones Geographicae ◽

10.2478/quageo-2020-0027 ◽

2020 ◽

Vol 39 (3) ◽

pp. 125-138

Author(s):

Alina Zajadacz ◽

Aleksandra Minkwitz

Keyword(s):

Social Media ◽

Local Level ◽

Data Sources ◽

Communication Tool ◽

Tourism Planning ◽

Communication Tools ◽

Social Media Data ◽

Planning Methods ◽

The One ◽

Media Data

AbstractThe purpose of the article is to present the concept of using social media (SM) as data sources and communication tools, useful at the various stages of planning, implementing and monitoring the effects of tourism development on a local level. The first part discusses the stages of planning, then presents the characteristics of SM, along with a discussion of the issues presented in the literature to this date. The next part presents data sources and methods of research on SM and functions that they can perform in tourism. The concept presented, on the one hand, reviews the perspectives of practical use of SM as a communication tool and source of data and, on the other hand, the challenges related to the need to further deepen research on tourism planning methods that are adequate to the continuously changing environment.

Download Full-text

Modeling Spatiotemporal Pattern of Depressive Symptoms Caused by COVID-19 Using Social Media Data Mining

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17144988 ◽

2020 ◽

Vol 17 (14) ◽

pp. 4988 ◽

Cited By ~ 3

Author(s):

Diya Li ◽

Harshita Chaudhary ◽

Zhe Zhang

Keyword(s):

Data Mining ◽

Social Media ◽

San Francisco ◽

Learning Algorithm ◽

The United States ◽

Spatiotemporal Pattern ◽

Social Media Data ◽

Clinical Patient ◽

Stress Symptoms ◽

Media Data

By 29 May 2020, the coronavirus disease (COVID-19) caused by SARS-CoV-2 had spread to 188 countries, infecting more than 5.9 million people, and causing 361,249 deaths. Governments issued travel restrictions, gatherings of institutions were cancelled, and citizens were ordered to socially distance themselves in an effort to limit the spread of the virus. Fear of being infected by the virus and panic over job losses and missed education opportunities have increased people’s stress levels. Psychological studies using traditional surveys are time-consuming and contain cognitive and sampling biases, and therefore cannot be used to build large datasets for a real-time depression analysis. In this article, we propose a CorExQ9 algorithm that integrates a Correlation Explanation (CorEx) learning algorithm and clinical Patient Health Questionnaire (PHQ) lexicon to detect COVID-19 related stress symptoms at a spatiotemporal scale in the United States. The proposed algorithm overcomes the common limitations of traditional topic detection models and minimizes the ambiguity that is caused by human interventions in social media data mining. The results show a strong correlation between stress symptoms and the number of increased COVID-19 cases for major U.S. cities such as Chicago, San Francisco, Seattle, New York, and Miami. The results also show that people’s risk perception is sensitive to the release of COVID-19 related public news and media messages. Between January and March, fear of infection and unpredictability of the virus caused widespread panic and people began stockpiling supplies, but later in April, concerns shifted as financial worries in western and eastern coastal areas of the U.S. left people uncertain of the long-term effects of COVID-19 on their lives.

Download Full-text

Quantifying human mobility resilience to extreme events using geo-located social media data

EPJ Data Science ◽

10.1140/epjds/s13688-019-0196-6 ◽

2019 ◽

Vol 8 (1) ◽

Cited By ~ 9

Author(s):

Kamol Chandra Roy ◽

Manuel Cebrian ◽

Samiul Hasan

Keyword(s):

Social Media ◽

Extreme Events ◽

Human Mobility ◽

Social Media Data ◽

Media Data

Download Full-text

Next-generation visitation models using social media to estimate recreation on public lands

Scientific Reports ◽

10.1038/s41598-020-70829-x ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Spencer A. Wood ◽

Samantha G. Winder ◽

Emilia H. Lia ◽

Eric M. White ◽

Christian S. L. Crowley ◽

...

Keyword(s):

Social Media ◽

Public Lands ◽

The United States ◽

Multiple Sources ◽

Social Media Data ◽

Visitor Management ◽

Relative Value ◽

Promising Source ◽

Recreational Use ◽

Media Data

Abstract Outdoor and nature-based recreation provides countless social benefits, yet public land managers often lack information on the spatial and temporal extent of recreation activities. Social media is a promising source of data to fill information gaps because the amount of recreational use is positively correlated with social media activity. However, despite the implication that these correlations could be employed to accurately estimate visitation, there are no known transferable models parameterized for use with multiple social media data sources. This study tackles these issues by examining the relative value of multiple sources of social media in models that estimate visitation at unmonitored sites and times across multiple destinations. Using a novel dataset of over 30,000 social media posts and 286,000 observed visits from two regions in the United States, we compare multiple competing statistical models for estimating visitation. We find social media data substantially improve visitor estimates at unmonitored sites, even when a model is parameterized with data from another region. Visitation estimates are further improved when models are parameterized with on-site counts. These findings indicate that while social media do not fully substitute for on-site data, they are a powerful component of recreation research and visitor management.

Download Full-text

EXTRACTING AND COMPARING PLACES USING GEO-SOCIAL MEDIA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-3-w5-311-2015 ◽

2015 ◽

Vol II-3/W5 ◽

pp. 311-316

Author(s):

F. O. Ostermann ◽

H. Huang ◽

G. Andrienko ◽

N. Andrienko ◽

C. Capineri ◽

...

Keyword(s):

Social Media ◽

Semantic Similarity ◽

Data Set ◽

Social Media Data ◽

Temporal Clustering ◽

Depth Study ◽

Data Source ◽

Spatio Temporal ◽

Media Data

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.

Download Full-text

Linking Twitter and survey data: asymmetry in quantity and its impact

EPJ Data Science ◽

10.1140/epjds/s13688-021-00286-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Tarek Al Baghal ◽

Alexander Wenz ◽

Luke Sloan ◽

Curtis Jessop

Keyword(s):

Social Media ◽

Survey Data ◽

Linked Data ◽

Social Research ◽

Data Sources ◽

Social Media Data ◽

Twitter Data ◽

Unique Source ◽

Methodological Aspects ◽

Media Data

AbstractLinked social media and survey data have the potential to be a unique source of information for social research. While the potential usefulness of this methodology is widely acknowledged, very few studies have explored methodological aspects of such linkage. Respondents produce planned amounts of survey data, but highly variant amounts of social media data. This study explores this asymmetry by examining the amount of social media data available to link to surveys. The extent of variation in the amount of data collected from social media could affect the ability to derive meaningful linked indicators and could introduce possible biases. Linked Twitter data from respondents to two longitudinal surveys representative of Great Britain, the Innovation Panel and the NatCen Panel, show that there is indeed substantial variation in the number of tweets posted and the number of followers and friends respondents have. Multivariate analyses of both data sources show that only a few respondent characteristics have a statistically significant effect on the number of tweets posted, with the number of followers being the strongest predictor of posting in both panels, women posting less than men, and some evidence that people with higher education post less, but only in the Innovation Panel. We use sentiment analyses of tweets to provide an example of how the amount of Twitter data collected can impact outcomes using these linked data sources. Results show that more negatively coded tweets are related to general happiness, but not the number of positive tweets. Taken together, the findings suggest that the amount of data collected from social media which can be linked to surveys is an important factor to consider and indicate the potential for such linked data sources in social research.

Download Full-text

How scientists can take the lead in establishing ethical practices for social media research

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy174 ◽

2019 ◽

Vol 26 (4) ◽

pp. 311-313 ◽

Cited By ~ 7

Author(s):

Sherry Pagoto ◽

Camille Nebeker

Keyword(s):

Social Media ◽

Scientific Community ◽

Human Subjects ◽

Research Misconduct ◽

The United States ◽

Ethical Practices ◽

Social Media Data ◽

Media Research ◽

Social Media Research ◽

Media Data

Abstract Social media use has become ubiquitous in the United States, providing unprecedented opportunities for research. However, the rapidly evolving research landscape has far outpaced federal regulations for the protection of human subjects. Recent highly publicized scandals have raised legitimate concerns in the media about how social media data are being used. These circumstances combined with the absence of ethical standards puts even the best intentioned scientists at risk of possible research misconduct. The scientific community may need to lead the charge in insuring the ethical use of social media data in scientific research. We propose 6 steps the scientific community can take to lead this charge. We underscore the important role of funding agencies and universities to create the necessary ethics infrastructure to allow social media research to flourish in a way that is pro-technology, pro-science, and most importantly, pro-humanity.

Download Full-text

New Data Sources in Social Science Research: Things to Know Before Working With Reddit Data

Social Science Computer Review ◽

10.1177/0894439319893305 ◽

2019 ◽

pp. 089443931989330 ◽

Cited By ~ 4

Author(s):

Ashley Amaya ◽

Ruben Bach ◽

Florian Keusch ◽

Frauke Kreuter

Keyword(s):

Social Media ◽

Social Science ◽

Social Science Research ◽

Science Research ◽

Lessons Learned ◽

Data Sources ◽

Front Page ◽

Social Media Data ◽

New Research ◽

Media Data

Social media are becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from those with which many social scientists are used to working, so the assumptions often used to plan and manage a project may no longer hold. For example, social media data are so large that they may not be able to be processed on a single machine; they are in file formats with which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this article, we attempt to document several challenges and opportunities encountered when working with Reddit, the self-proclaimed “front page of the Internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this article is specific to Reddit, researchers may also view it as a list of the type of information one may seek to acquire prior to conducting a project that uses any type of social media data.

Download Full-text

Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

Journal Of Big Data ◽

10.1186/s40537-021-00459-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hans Christian ◽

Derwin Suhartono ◽

Andry Chowanda ◽

Kamal Z. Zamli

Keyword(s):

Social Media ◽

Deep Learning ◽

Extraction Method ◽

Language Model ◽

Model Averaging ◽

Data Sources ◽

Online Information ◽

Social Media Data ◽

Personality Prediction ◽

Media Data

AbstractThe ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset.

Download Full-text