scholarly journals Normalization Strategies for Enhancing Spatio-Temporal Analysis of Social Media Responses during Extreme Events: A Case Study based on Analysis of Four Extreme Events using Socio-Environmental Data Explorer (SEDE)

Author(s):  
J. Ajayakumar ◽  
E. Shook ◽  
V. K. Turner

With social media becoming increasingly location-based, there has been a greater push from researchers across various domains including social science, public health, and disaster management, to tap in the spatial, temporal, and textual data available from these sources to analyze public response during extreme events such as an epidemic outbreak or a natural disaster. Studies based on demographics and other socio-economic factors suggests that social media data could be highly skewed based on the variations of population density with respect to place. To capture the spatio-temporal variations in public response during extreme events we have developed the Socio-Environmental Data Explorer (SEDE). SEDE collects and integrates social media, news and environmental data to support exploration and assessment of public response to extreme events. For this study, using SEDE, we conduct spatio-temporal social media response analysis on four major extreme events in the United States including the “North American storm complex” in December 2015, the “snowstorm Jonas” in January 2016, the “West Virginia floods” in June 2016, and the “Hurricane Matthew” in October 2016. Analysis is conducted on geo-tagged social media data from Twitter and warnings from the storm events database provided by National Centers For Environmental Information (NCEI) for analysis. Results demonstrate that, to support complex social media analyses, spatial and population-based normalization and filtering is necessary. The implications of these results suggests that, while developing software solutions to support analysis of non-conventional data sources such as social media, it is quintessential to identify the inherent biases associated with the data sources, and adapt techniques and enhance capabilities to mitigate the bias. The normalization strategies that we have developed and incorporated to SEDE will be helpful in reducing the population bias associated with social media data and will be useful for researchers and decision makers to enhance their analysis on spatio-temporal social media responses during extreme events.

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yasmeen George ◽  
Shanika Karunasekera ◽  
Aaron Harwood ◽  
Kwan Hui Lim

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.


2020 ◽  
Vol 39 (3) ◽  
pp. 125-138
Author(s):  
Alina Zajadacz ◽  
Aleksandra Minkwitz

AbstractThe purpose of the article is to present the concept of using social media (SM) as data sources and communication tools, useful at the various stages of planning, implementing and monitoring the effects of tourism development on a local level. The first part discusses the stages of planning, then presents the characteristics of SM, along with a discussion of the issues presented in the literature to this date. The next part presents data sources and methods of research on SM and functions that they can perform in tourism. The concept presented, on the one hand, reviews the perspectives of practical use of SM as a communication tool and source of data and, on the other hand, the challenges related to the need to further deepen research on tourism planning methods that are adequate to the continuously changing environment.


Author(s):  
Diya Li ◽  
Harshita Chaudhary ◽  
Zhe Zhang

By 29 May 2020, the coronavirus disease (COVID-19) caused by SARS-CoV-2 had spread to 188 countries, infecting more than 5.9 million people, and causing 361,249 deaths. Governments issued travel restrictions, gatherings of institutions were cancelled, and citizens were ordered to socially distance themselves in an effort to limit the spread of the virus. Fear of being infected by the virus and panic over job losses and missed education opportunities have increased people’s stress levels. Psychological studies using traditional surveys are time-consuming and contain cognitive and sampling biases, and therefore cannot be used to build large datasets for a real-time depression analysis. In this article, we propose a CorExQ9 algorithm that integrates a Correlation Explanation (CorEx) learning algorithm and clinical Patient Health Questionnaire (PHQ) lexicon to detect COVID-19 related stress symptoms at a spatiotemporal scale in the United States. The proposed algorithm overcomes the common limitations of traditional topic detection models and minimizes the ambiguity that is caused by human interventions in social media data mining. The results show a strong correlation between stress symptoms and the number of increased COVID-19 cases for major U.S. cities such as Chicago, San Francisco, Seattle, New York, and Miami. The results also show that people’s risk perception is sensitive to the release of COVID-19 related public news and media messages. Between January and March, fear of infection and unpredictability of the virus caused widespread panic and people began stockpiling supplies, but later in April, concerns shifted as financial worries in western and eastern coastal areas of the U.S. left people uncertain of the long-term effects of COVID-19 on their lives.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Spencer A. Wood ◽  
Samantha G. Winder ◽  
Emilia H. Lia ◽  
Eric M. White ◽  
Christian S. L. Crowley ◽  
...  

Abstract Outdoor and nature-based recreation provides countless social benefits, yet public land managers often lack information on the spatial and temporal extent of recreation activities. Social media is a promising source of data to fill information gaps because the amount of recreational use is positively correlated with social media activity. However, despite the implication that these correlations could be employed to accurately estimate visitation, there are no known transferable models parameterized for use with multiple social media data sources. This study tackles these issues by examining the relative value of multiple sources of social media in models that estimate visitation at unmonitored sites and times across multiple destinations. Using a novel dataset of over 30,000 social media posts and 286,000 observed visits from two regions in the United States, we compare multiple competing statistical models for estimating visitation. We find social media data substantially improve visitor estimates at unmonitored sites, even when a model is parameterized with data from another region. Visitation estimates are further improved when models are parameterized with on-site counts. These findings indicate that while social media do not fully substitute for on-site data, they are a powerful component of recreation research and visitor management.


Author(s):  
F. O. Ostermann ◽  
H. Huang ◽  
G. Andrienko ◽  
N. Andrienko ◽  
C. Capineri ◽  
...  

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Tarek Al Baghal ◽  
Alexander Wenz ◽  
Luke Sloan ◽  
Curtis Jessop

AbstractLinked social media and survey data have the potential to be a unique source of information for social research. While the potential usefulness of this methodology is widely acknowledged, very few studies have explored methodological aspects of such linkage. Respondents produce planned amounts of survey data, but highly variant amounts of social media data. This study explores this asymmetry by examining the amount of social media data available to link to surveys. The extent of variation in the amount of data collected from social media could affect the ability to derive meaningful linked indicators and could introduce possible biases. Linked Twitter data from respondents to two longitudinal surveys representative of Great Britain, the Innovation Panel and the NatCen Panel, show that there is indeed substantial variation in the number of tweets posted and the number of followers and friends respondents have. Multivariate analyses of both data sources show that only a few respondent characteristics have a statistically significant effect on the number of tweets posted, with the number of followers being the strongest predictor of posting in both panels, women posting less than men, and some evidence that people with higher education post less, but only in the Innovation Panel. We use sentiment analyses of tweets to provide an example of how the amount of Twitter data collected can impact outcomes using these linked data sources. Results show that more negatively coded tweets are related to general happiness, but not the number of positive tweets. Taken together, the findings suggest that the amount of data collected from social media which can be linked to surveys is an important factor to consider and indicate the potential for such linked data sources in social research.


2019 ◽  
Vol 26 (4) ◽  
pp. 311-313 ◽  
Author(s):  
Sherry Pagoto ◽  
Camille Nebeker

Abstract Social media use has become ubiquitous in the United States, providing unprecedented opportunities for research. However, the rapidly evolving research landscape has far outpaced federal regulations for the protection of human subjects. Recent highly publicized scandals have raised legitimate concerns in the media about how social media data are being used. These circumstances combined with the absence of ethical standards puts even the best intentioned scientists at risk of possible research misconduct. The scientific community may need to lead the charge in insuring the ethical use of social media data in scientific research. We propose 6 steps the scientific community can take to lead this charge. We underscore the important role of funding agencies and universities to create the necessary ethics infrastructure to allow social media research to flourish in a way that is pro-technology, pro-science, and most importantly, pro-humanity.


2019 ◽  
pp. 089443931989330 ◽  
Author(s):  
Ashley Amaya ◽  
Ruben Bach ◽  
Florian Keusch ◽  
Frauke Kreuter

Social media are becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from those with which many social scientists are used to working, so the assumptions often used to plan and manage a project may no longer hold. For example, social media data are so large that they may not be able to be processed on a single machine; they are in file formats with which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this article, we attempt to document several challenges and opportunities encountered when working with Reddit, the self-proclaimed “front page of the Internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this article is specific to Reddit, researchers may also view it as a list of the type of information one may seek to acquire prior to conducting a project that uses any type of social media data.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hans Christian ◽  
Derwin Suhartono ◽  
Andry Chowanda ◽  
Kamal Z. Zamli

AbstractThe ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset.


Sign in / Sign up

Export Citation Format

Share Document