Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

Bingqing Wang; Bin Meng; Juan Wang; Siyu Chen; Jian Liu

doi:10.3390/ijgi10070474

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Get full-text (via PubEx)

A Pipeline to Understand Emerging Illness Via Social Media Data Analysis: Case Study on Breast Implant Illness (Preprint)

10.2196/preprints.29768 ◽

2021 ◽

Author(s):

Vishal Dey ◽

Peter Krasniak ◽

Minh Nguyen ◽

Clara Lee ◽

Xia Ning

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Language Processing ◽

Breast Implant ◽

Public Attention ◽

Social Media Data ◽

Media Data

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL

Get full-text (via PubEx)

Using social media data to map the areas most affected by ISIS in Syria

Proceedings of the International conference “InterCarto/InterGIS” ◽

10.35595/2414-9179-2020-1-26-464-470 ◽

2020 ◽

Vol 26 (1) ◽

pp. 464-470

Author(s):

Mohamad Hasan

Keyword(s):

Social Media ◽

Language Processing ◽

Geographical Information ◽

Data Mapping ◽

Islamic State ◽

Social Media Data ◽

The Social ◽

Mapping Process ◽

Data Source ◽

Media Data

This paper presents a model to collect, save, geocode, and analyze social media data. The model is used to collect and process the social media data concerned with the ISIS terrorist group (the Islamic State in Iraq and Syria), and to map the areas in Syria most affected by ISIS accordingly to the social media data. Mapping process is assumed automated compilation of a density map for the geocoded tweets. Data mined from social media (e.g., Twitter and Facebook) is recognized as dynamic and easily accessible resources that can be used as a data source in spatial analysis and geographical information system. Social media data can be represented as a topic data and geocoding data basing on the text of the mined from social media and processed using Natural Language Processing (NLP) methods. NLP is a subdomain of artificial intelligence concerned with the programming computers to analyze natural human language and texts. NLP allows identifying words used as an initial data by developed geocoding algorithm. In this study, identifying the needed words using NLP was done using two corpora. First corpus contained the names of populated places in Syria. The second corpus was composed in result of statistical analysis of the number of tweets and picking the words that have a location meaning (i.e., schools, temples, etc.). After identifying the words, the algorithm used Google Maps geocoding API in order to obtain the coordinates for posts.

Get full-text (via PubEx)

Anatomy of a Protest: Spatial Information, Social Media, and Urban Space

Social Media + Society ◽

10.1177/2056305119897320 ◽

2020 ◽

Vol 6 (1) ◽

pp. 205630511989732

Author(s):

Alireza Karduni ◽

Eric Sauda

Keyword(s):

Social Media ◽

Public Space ◽

Urban Space ◽

Language Processing ◽

Local Community ◽

Spatial Information ◽

Social Media Data ◽

Public Events ◽

Use Of Social Media ◽

Media Data

Black Lives Matter, like many modern movements in the age of information, makes significant use of social media as well as public space to demand justice. In this article, we study the protests in response to the shooting of Keith Lamont Scott by police in Charlotte, North Carolina, on September 2016. Our goal is to measure the significance of urban space within the virtual and physical network of protesters. Using a mixed-methods approach, we identify and study urban space and social media generated by these protests. We conducted interviews with protesters who were among the first to join the Keith Lamont Scott shooting demonstrations. From the interviews, we identify places that were significant in our interviewees’ narratives. Using a combination of natural language processing and social network analysis, we analyze social media data related to the Charlotte protests retrieved from Twitter. We found that social media, local community, and public space work together to organize and motivate protests and that public events such as protests cause a discernible increase in social media activity. Finally, we find that there are two distinct communities who engage social media in different ways; one group involved with social media, local community and urban space, and a second group connected almost exclusively through social media.

Get full-text (via PubEx)

Citizens’ Spatial Footprint on Twitter—Anomaly, Trend and Bias Investigation in Istanbul

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9040222 ◽

2020 ◽

Vol 9 (4) ◽

pp. 222 ◽

Cited By ~ 2

Author(s):

Ayse Giz Gulnerman ◽

Himmet Karaman ◽

Direnc Pekaslan ◽

Serdar Bilgi

Keyword(s):

Social Media ◽

Spatial Data ◽

Central Area ◽

Research Area ◽

Time Of Day ◽

Activity Levels ◽

Social Media Data ◽

Time Space ◽

The City ◽

Media Data

Social media (SM) can be an invaluable resource in terms of understanding and managing the effects of catastrophic disasters. In order to use SM platforms for public participatory (PP) mapping of emergency management activities, a bias investigation should be undertaken with regard to the data related to the study area (urban, regional or national, etc.) to determine the spatial data dynamics. Thus, such determinations can be made on how SM can be used and interpreted in terms of PP. In this study, the city of Istanbul was chosen for social media data research area, as it is one of the most crowded cities in the world and expecting a major earthquake. The methodology for the data investigation is: 1. Obtain data and engage sampling, 2. Identify the representation and temporal biases in the data and normalize it in response to representation bias, 3. Identify general anomalies and spatial anomalies, 4. Manipulate the trend of the dataset with the discretization of anomalies and 5. Examine the spatiotemporal bias. Using this bias investigation methodology, citizen footprint dynamics in the city were determined and reference maps (most likely regional anomaly maps, representation maps, time-space bias maps, etc.) were produced. The outcomes of the study can be summarized in four steps. First, highly active users generate the majority of the data and removing this data as a general approach within a pseudo-cleaning process means concealing a large amount of data. Second, data normalization in terms of activity levels, changes the anomaly outcome resulting from diverse representation levels of users. Third, spatiotemporally normalized data present strong spatial anomaly tendency in some parts of the central area. Fourth, trend data is dense in the central area and the spatiotemporal bias assessments show the data density varies in terms of the time of day, day of week and season of the year. The methodology proposed in this study can be used to extract the unbiased daily routines of the social media data of the regions for the normal days and this can be referred for the emergency or unexpected event cases to detect the change or impacts.

Get full-text (via PubEx)

Artificial Immune Systems-Based Classification Model for Code-Mixed Social Media Data

IRBM ◽

10.1016/j.irbm.2020.07.004 ◽

2020 ◽

Author(s):

S. Shekhar ◽

D.K. Sharma ◽

D.K. Agarwal ◽

Y. Pathak

Keyword(s):

Social Media ◽

Artificial Immune Systems ◽

Classification Model ◽

Artificial Immune ◽

Social Media Data ◽

Immune Systems ◽

Media Data

Get full-text (via PubEx)

Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz162 ◽

2019 ◽

Vol 27 (2) ◽

pp. 315-329 ◽

Cited By ~ 6

Author(s):

Abeed Sarker ◽

Annika DeRoos ◽

Jeanmarie Perrone

Keyword(s):

Social Media ◽

Language Processing ◽

Prescription Medication ◽

Inclusion Criteria ◽

Major Health Problem ◽

Social Media Data ◽

Use Of Data ◽

Media Source ◽

Multiple Characteristics ◽

Media Data

Abstract Objective Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media–based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource. Materials and Methods We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings. Results A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning. Discussion There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data. Conclusion The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.

Get full-text (via PubEx)

Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study

JMIR Public Health and Surveillance ◽

10.2196/14986 ◽

2020 ◽

Vol 6 (2) ◽

pp. e14986 ◽

Cited By ~ 2

Author(s):

Ashlynn R Daughton ◽

Rumi Chunara ◽

Michael J Paul

Keyword(s):

Infectious Disease ◽

Social Media ◽

Random Sample ◽

Topic Model ◽

Ground Truth ◽

Ground Truth Data ◽

Social Media Data ◽

Individual Level ◽

Small Effect Size ◽

Media Data

Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

Get full-text (via PubEx)

Identifying Different Types of Social Ties in Events from Publicly Available Social Media Data

Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.5220/0008065501760186 ◽

2019 ◽

Author(s):

Jayesh Gupta ◽

Hannu Kärkkäinen ◽

Karan Menon ◽

Jukka Huhtamäki ◽

Raghava Mukkamala ◽

...

Keyword(s):

Social Media ◽

Social Ties ◽

Social Media Data ◽

Different Types ◽

Media Data

Get full-text (via PubEx)

Mining Consumer Brand Relationship from Social Media Data: A Natural Language Processing Approach

Lecture Notes in Computer Science - Artificial Intelligence and Security ◽

10.1007/978-3-030-78609-0_47 ◽

2021 ◽

pp. 553-565

Author(s):

Di Shang ◽

Zhenda Hu ◽

Zhaoxia Wang

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Social Media Data ◽

Brand Relationship ◽

Processing Approach ◽

Media Data

Get full-text (via PubEx)

Sentimental analysis on social media data using R programming

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.31.13402 ◽

2018 ◽

Vol 7 (2.31) ◽

pp. 80 ◽

Cited By ~ 1

Author(s):

Mandava Geetha Bhargava ◽

Duvvada Rajeswara Rao

Keyword(s):

Social Media ◽

Research Field ◽

Virtual Currency ◽

Ongoing Research ◽

Social Media Data ◽

Digital Currency ◽

Alternative Currency ◽

Different Types ◽

R Programming ◽

Media Data

Sentimental Analysis is an ongoing research field in Text Mining Arena to determine the situation of market on particular entity such as Product, Services...Etc. and it can be called as computational treatment of reviews, subjectivity and sentiment of text. Cryptocurrency can be explained as a type of digital estate and devised to mechanize as a form of trade and exchanges that uses cryptography as an encryption technique to secure the transactions and acts as decentralized controlled transaction which is opposed to centralized transactions. Cryptocurrency are a type of virtual currency, digital currency and alternative currency, On basis of categorical, there are different architecture and security protocols which are used in the cryptocurrencies to secure transactions, the different types of cryptocurrency are available in the market such as Bitcoin, Litecoin, and Namecoin…etc. This paper focuses on survey on different types of sentimental analysis methods and main contribution of this paper include sentimental analysis of social media data on different types of cryptocurrencies on basis of categorical and different terms of cryptocurrency such as Cryptocurrency, virtual currency, digital currency and discussed on trends of crypto currency in present market.

Get full-text (via PubEx)