A spatial regression and clustering method for developing place-specific social vulnerability indices using census and social media data

Abstract. Individual travel trajectories denote a series of places people visit along the time. These places (e.g., home, workspace, and park) reflect people’s corresponding activities (e.g., dwelling, work, and entertainment), which are discussed as semantic knowledge and could be implicit under raw data (Yan et al. 2013, Cai et al. 2016). Traditional survey data directly describe people’ activities at certain places, while costing tremendous labors and resources (Huang and Wong 2016). GPS data such as taxi logs record exact origin-destination pairs as well as people’s stay time along the way, from which semantics can be easily inferred combining with geographical context data (Yan et al. 2013). Research has been done to understand the activity sequences indicated by either individual or collective spatiotemporal (ST) travel trajectories using those dense data. Different models are proposed for trajectory mining and activity inference, including location categorization, frequent region detection, and so on (Njoo et al. 2015). A typical method for matching a location or region with a known activity type is to detect stay points and stay intervals of trajectories and to find geographical context of these stay occurrences (Furtado et al. 2013, Njoo et al. 2015, Beber et al. 2016, Beber et al. 2017).However, limited progress has been made to mine semantics of trajectory data collected from social media platforms. Specifically, detection of stay points and their intervals could be inaccurate using online trajectories because of data sparsity. Huang et al. (2014) define the notion of activity zone to detect activity types from digital footprints. In this method, individual travel trajectories first are aggregated using spatial clustering method such as density-based spatial clustering of applications with noise (DBSCAN). Then produced clusters are classified based on a regional land use map and Google Places application programming interface (API). Such land use data are only published at specific places, such as the state cartography office’s website at University of Wisconsin-Madison. Researchers need to search for those data based on their study area. Moreover, while major land use maps can be searched for large areas such as the whole United States, detailed land use data for statewide or citywide areas are made in diverse standards, which adds extra work to classify activity zones consistently. Besides, Google Places API is a service that Google opened for developers and will return information about a place, given the place location (e.g., address or GPS coordinates), in the search request. However, API keys need to be generated before we can use these interfaces and each user can only make a limited number of free-charged requests every day (i.e., 1,000 requests per 24 hours period). In sum, previous methods to detect activity zone types using social media data are not sufficient and can hardly achieve effective data fusion. Comparing to the high cost of using officially published dataset, emerging Volunteered Geographic Information (VGI) data offer an alternative to infer the types of an individual’s activities performed in each zone (i.e., cluster).Using geo-tagged tweets as an example, this research proposes a framework for mining social media data, detecting individual semantic travel trajectories, and individual representative daily travel trajectory paths by fusing with VGI data, specifically OpenStreetMap (OSM) datasets. First, inactive users and abnormal users (e.g., users representing a company with account being shared by many employees) are removed through data pre-processing (Step 1 in Figure 1). Next, a multi-scale spatial clustering method is developed to aggregate online trajectories captured through geo-tagged tweets of a group of users into collective spatial hot-spots (i.e., activity zones; Step 2). By integrating multiple OSM datasets the activity type (e.g., dwelling, service, transportation and work) of each collective zone then can be identified (Step 3). Each geo-tagged tweet of an individual, represented as a ST point, is then attached with a collective activity zone that either includes or overlaps a buffer zone of the ST point. Herein, the buffer zone is generated by using the point as the centroid and a predefined threshold as the radius. Given an individual’s ST points with semantics (i.e., activity type information) derived from the attached collective activity zone, a semantic activity clustering method is then developed to detect daily representative activity clusters of the individual (Step 4). Finally, individual representative daily semantic travel trajectory paths (i.e., semantic travel trajectory, defined as chronological travel activity sequences) are constructed between every two subsequent activity clusters (Step 5). Experiments with the historic geo-tagged tweets collected within Madison, Wisconsin reveal that: 1) The proposed method can detect most significant activity zones with accurate zone types identified (Figure 2); and 2) The semantic activity clustering method based on the derived activity zones can aggregate individual travel trajectories into activity clusters more efficiently comparing to DBSCAN and varying DBSCAN (VDBSCAN).

Download Full-text

Post, Mine, and Be Disturbed: Social Media Data Mining

PsycCRITIQUES ◽

10.1037/a0040619 ◽

2016 ◽

Vol 61 (51) ◽

Author(s):

Daniel Keyes

Keyword(s):

Data Mining ◽

Social Media ◽

Social Media Data ◽

Media Data

Download Full-text

Understanding the Interrelationships between Infrastructure Resilience and Social Equity Using Social Media Data

Construction Research Congress 2020 ◽

10.1061/9780784482858.065 ◽

2020 ◽

Author(s):

Sunil Dhakal ◽

Lu Zhang

Keyword(s):

Social Media ◽

Social Equity ◽

Social Media Data ◽

Infrastructure Resilience ◽

Media Data

Download Full-text

Psychological Stress Detection from Social Media Data using a Novel Hybrid Model

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i8.853862 ◽

2018 ◽

Vol 6 (8) ◽

pp. 853-862

Author(s):

Shaikha Hajera ◽

Mohammed Mahmood Ali

Keyword(s):

Social Media ◽

Psychological Stress ◽

Hybrid Model ◽

Stress Detection ◽

Social Media Data ◽

Media Data

Download Full-text

Smart De-Identification of Social Media Data

10.21236/ada608548 ◽

2014 ◽

Author(s):

Kathleen M. Carley ◽

L. R. Carley ◽

Jonathan Storrick

Keyword(s):

Social Media ◽

Social Media Data ◽

Media Data

Download Full-text

The Psychology of Job Loss: Using Social Media Data to Characterize and Predict Unemployment

SSRN Electronic Journal ◽

10.2139/ssrn.2783520 ◽

2016 ◽

Cited By ~ 1

Author(s):

Davide Proserpio ◽

Scott Counts ◽

Apurv Jain

Keyword(s):

Social Media ◽

Job Loss ◽

Social Media Data ◽

Media Data

Download Full-text

Mining Social Media Data to Study the Consequences of Dementia Diagnosis on Caregivers and Relatives (Preprint)

10.2196/preprints.10506 ◽

2018 ◽

Author(s):

Anika Oellrich ◽

George Gkotsis ◽

Richard James Butler Dobson ◽

Tim JP Hubbard ◽

Rina Dutta

Keyword(s):

Social Media ◽

Family Relationships ◽

Text Processing ◽

Automated Analysis ◽

Health Concern ◽

Dementia Diagnosis ◽

Data Set ◽

Social Media Data ◽

Real Time Processing ◽

Media Data

BACKGROUND Dementia is a growing public health concern with approximately 50 million people affected worldwide in 2017 and this number is expected to reach more than 131 million by 2050. The toll on caregivers and relatives cannot be underestimated as dementia changes family relationships, leaves people socially isolated, and affects the finances of all those involved. OBJECTIVE The aim of this study was to explore using automated analysis (i) the age and gender of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) relevant subreddits authors are posting to, (iv) the types of messages posted and (v) the content of these posts. METHODS We analysed Reddit posts concerning dementia diagnoses. We used a previously developed text analysis pipeline to determine attributes of the posts as well as their authors to characterise online communications about dementia diagnoses. The posts were also examined by manual curation for the diagnosis provided and the person affected. Furthermore, we investigated the communities these people engage in and assessed the contents of the posts with an automated topic gathering technique. RESULTS Our results indicate that the majority of posters in our data set are women, and it is mostly close relatives such as parents and grandparents that are mentioned. Both the communities frequented and topics gathered reflect not only the sufferer's diagnosis but also potential outcomes, e.g. hardships experienced by the caregiver. The trends observed from this dataset are consistent with findings based on qualitative review, validating the robustness of social media automated text processing. CONCLUSIONS This work demonstrates the value of social media data sources as a resource for in-depth studies of those affected by a dementia diagnosis and the potential to develop novel support systems based on their real time processing in line with the increasing digitalisation of medical care.

Download Full-text

Citizens, Elites, and Social Media Methodological Challenges and Opportunities in the Study of Persuasion and Mobilization

The Oxford Handbook of Electoral Persuasion ◽

10.1093/oxfordhb/9780190860806.013.27 ◽

2019 ◽

pp. 1036-1058

Author(s):

Philip Habel ◽

Yannis Theocharis

Keyword(s):

Social Media ◽

Big Data ◽

Supply And Demand ◽

Political Process ◽

The Political ◽

The Novel ◽

Complete Understanding ◽

Social Media Data ◽

Challenges And Opportunities ◽

Media Data

In the last decade, big data, and social media in particular, have seen increased popularity among citizens, organizations, politicians, and other elites—which in turn has created new and promising avenues for scholars studying long-standing questions of communication flows and influence. Studies of social media play a prominent role in our evolving understanding of the supply and demand sides of the political process, including the novel strategies adopted by elites to persuade and mobilize publics, as well as the ways in which citizens react, interact with elites and others, and utilize platforms to persuade audiences. While recognizing some challenges, this chapter speaks to the myriad of opportunities that social media data afford for evaluating questions of mobilization and persuasion, ultimately bringing us closer to a more complete understanding Lasswell’s (1948) famous maxim: “who, says what, in which channel, to whom, [and] with what effect.”

Download Full-text

Emergency flood detection using multiple information sources: Integrated analysis of natural hazard monitoring and social media data

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.144371 ◽

2020 ◽

pp. 144371

Author(s):

Kikuko Shoyama ◽

Qinglin Cui ◽

Makoto Hanashima ◽

Hiroaki Sano ◽

Yuichiro Usuda

Keyword(s):

Social Media ◽

Information Sources ◽

Natural Hazard ◽

Integrated Analysis ◽

Social Media Data ◽

Flood Detection ◽

Media Data

Download Full-text

Missing value or behaviour: how to increase the signal of social media data

METRON ◽

10.1007/s40300-021-00216-7 ◽

2021 ◽

Author(s):

Paolo Mariani ◽

Andrea Marletta

Keyword(s):

Social Media ◽

Missing Data ◽

Everyday Life ◽

Processing Technique ◽

Missing Value ◽

Social Media Data ◽

Practical Strategy ◽

Specific Behaviour ◽

Complex Features ◽

Media Data

AbstractSocial media has become a widespread element of people’s everyday life, which is used to communicate and generate contents. Among the several ways to express a reaction to social media contents, the “Likes” are critical. Indeed, they convey preferences, which drive existing markets or allow the creation of new ones. Nevertheless, the appreciation indicators have some complex features, as for example the interpretation of the absence of “Likes”. In this case, the lack of approval may be considered as a specific behaviour. The present study aimed to define whether the absence of Likes may indicate the presence of a specific behaviour through the contextualization of the treatment of missing data applied to real cases. We provided a practical strategy for extracting more knowledge from social media data, whose synthesis raises several measurement problems. We proposed an approach based on the disambiguation of missing data in two modalities: “Dislike” and “Nothing”. Finally, a data pre-processing technique was suggested to increase the signal of social media data.

Download Full-text