Semantic trajectory inference from geo-tagged tweets

Abstract. Individual travel trajectories denote a series of places people visit along the time. These places (e.g., home, workspace, and park) reflect people’s corresponding activities (e.g., dwelling, work, and entertainment), which are discussed as semantic knowledge and could be implicit under raw data (Yan et al. 2013, Cai et al. 2016). Traditional survey data directly describe people’ activities at certain places, while costing tremendous labors and resources (Huang and Wong 2016). GPS data such as taxi logs record exact origin-destination pairs as well as people’s stay time along the way, from which semantics can be easily inferred combining with geographical context data (Yan et al. 2013). Research has been done to understand the activity sequences indicated by either individual or collective spatiotemporal (ST) travel trajectories using those dense data. Different models are proposed for trajectory mining and activity inference, including location categorization, frequent region detection, and so on (Njoo et al. 2015). A typical method for matching a location or region with a known activity type is to detect stay points and stay intervals of trajectories and to find geographical context of these stay occurrences (Furtado et al. 2013, Njoo et al. 2015, Beber et al. 2016, Beber et al. 2017).However, limited progress has been made to mine semantics of trajectory data collected from social media platforms. Specifically, detection of stay points and their intervals could be inaccurate using online trajectories because of data sparsity. Huang et al. (2014) define the notion of activity zone to detect activity types from digital footprints. In this method, individual travel trajectories first are aggregated using spatial clustering method such as density-based spatial clustering of applications with noise (DBSCAN). Then produced clusters are classified based on a regional land use map and Google Places application programming interface (API). Such land use data are only published at specific places, such as the state cartography office’s website at University of Wisconsin-Madison. Researchers need to search for those data based on their study area. Moreover, while major land use maps can be searched for large areas such as the whole United States, detailed land use data for statewide or citywide areas are made in diverse standards, which adds extra work to classify activity zones consistently. Besides, Google Places API is a service that Google opened for developers and will return information about a place, given the place location (e.g., address or GPS coordinates), in the search request. However, API keys need to be generated before we can use these interfaces and each user can only make a limited number of free-charged requests every day (i.e., 1,000 requests per 24 hours period). In sum, previous methods to detect activity zone types using social media data are not sufficient and can hardly achieve effective data fusion. Comparing to the high cost of using officially published dataset, emerging Volunteered Geographic Information (VGI) data offer an alternative to infer the types of an individual’s activities performed in each zone (i.e., cluster).Using geo-tagged tweets as an example, this research proposes a framework for mining social media data, detecting individual semantic travel trajectories, and individual representative daily travel trajectory paths by fusing with VGI data, specifically OpenStreetMap (OSM) datasets. First, inactive users and abnormal users (e.g., users representing a company with account being shared by many employees) are removed through data pre-processing (Step 1 in Figure 1). Next, a multi-scale spatial clustering method is developed to aggregate online trajectories captured through geo-tagged tweets of a group of users into collective spatial hot-spots (i.e., activity zones; Step 2). By integrating multiple OSM datasets the activity type (e.g., dwelling, service, transportation and work) of each collective zone then can be identified (Step 3). Each geo-tagged tweet of an individual, represented as a ST point, is then attached with a collective activity zone that either includes or overlaps a buffer zone of the ST point. Herein, the buffer zone is generated by using the point as the centroid and a predefined threshold as the radius. Given an individual’s ST points with semantics (i.e., activity type information) derived from the attached collective activity zone, a semantic activity clustering method is then developed to detect daily representative activity clusters of the individual (Step 4). Finally, individual representative daily semantic travel trajectory paths (i.e., semantic travel trajectory, defined as chronological travel activity sequences) are constructed between every two subsequent activity clusters (Step 5). Experiments with the historic geo-tagged tweets collected within Madison, Wisconsin reveal that: 1) The proposed method can detect most significant activity zones with accurate zone types identified (Figure 2); and 2) The semantic activity clustering method based on the derived activity zones can aggregate individual travel trajectories into activity clusters more efficiently comparing to DBSCAN and varying DBSCAN (VDBSCAN).

Download Full-text

A dynamic human activity-driven model for mixed land use evaluation using social media data

Transactions in GIS ◽

10.1111/tgis.12447 ◽

2018 ◽

Vol 22 (5) ◽

pp. 1130-1151 ◽

Cited By ~ 9

Author(s):

Hanfa Xing ◽

Yuan Meng ◽

Yan Shi

Keyword(s):

Social Media ◽

Land Use ◽

Human Activity ◽

Social Media Data ◽

Mixed Land Use ◽

Media Data

Download Full-text

Classifying urban land use by integrating remote sensing and social media data

International Journal of Geographical Information Science ◽

10.1080/13658816.2017.1324976 ◽

2017 ◽

Vol 31 (8) ◽

pp. 1675-1696 ◽

Cited By ~ 85

Author(s):

Xiaoping Liu ◽

Jialv He ◽

Yao Yao ◽

Jinbao Zhang ◽

Haolin Liang ◽

...

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Urban Land ◽

Urban Land Use ◽

Social Media Data ◽

Media Data

Download Full-text

Urban Land Use and Land Cover Classification Using Multisource Remote Sensing Images and Social Media Data

Remote Sensing ◽

10.3390/rs11222719 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2719 ◽

Cited By ~ 4

Author(s):

Shi ◽

Qi ◽

Liu ◽

Niu ◽

Zhang

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Land Cover ◽

Commercial Buildings ◽

Landsat 8 ◽

Remote Sensing Images ◽

Social Media Data ◽

Urban Village ◽

Media Data

Land use and land cover (LULC) are diverse and complex in urban areas. Remotely sensed images are commonly used for land cover classification but hardly identifies urban land use and functional areas because of the semantic gap (i.e., different definitions of similar or identical buildings). Social media data, “marks” left by people using mobile phones, have great potential to overcome this semantic gap. Multisource remote sensing data are also expected to be useful in distinguishing different LULC types. This study examined the capability of combined multisource remote sensing images and social media data in urban LULC classification. Multisource remote sensing images included a Chinese ZiYuan-3 (ZY-3) high-resolution image, a Landsat 8 Operational Land Imager (OLI) multispectral image, and a Sentinel-1A synthetic aperture radar (SAR) image. Social media data consisted of the hourly spatial distribution of WeChat users, which is a ubiquitous messaging and payment platform in China. LULC was classified into 10 types, namely, vegetation, bare land, road, water, urban village, greenhouses, residential, commercial, industrial, and educational buildings. A method that integrates object-based image analysis, decision trees, and random forests was used for LULC classification. The overall accuracy and kappa value attained by the combination of multisource remote sensing images and WeChat data were 87.55% and 0.84, respectively. They further improved to 91.55% and 0.89, respectively, by integrating the textural and spatial features extracted from the ZY-3 image. The ZY-3 high-resolution image was essential for urban LULC classification because it is necessary for the accurate delineation of land parcels. The addition of Landsat 8 OLI, Sentinel-1A SAR, or WeChat data also made an irreplaceable contribution to the classification of different LULC types. The Landsat 8 OLI image helped distinguish between the urban village, residential buildings, commercial buildings, and roads, while the Sentinel-1A SAR data reduced the confusion between commercial buildings, greenhouses, and water. Rendering the spatial and temporal dynamics of population density, the WeChat data improved the classification accuracies of an urban village, greenhouses, and commercial buildings.

Download Full-text

A spatial regression and clustering method for developing place-specific social vulnerability indices using census and social media data

International Journal of Disaster Risk Reduction ◽

10.1016/j.ijdrr.2019.101224 ◽

2019 ◽

Vol 38 ◽

pp. 101224 ◽

Cited By ~ 2

Author(s):

Danielle Nicholson ◽

O. Arda Vanli ◽

Sungmoon Jung ◽

Eren Erman Ozguven

Keyword(s):

Social Media ◽

Social Vulnerability ◽

Spatial Regression ◽

Clustering Method ◽

Social Media Data ◽

Media Data

Download Full-text

REVEALING TOURIST HOTSPOTS IN YOGYAKARTA CITY BASED ON SOCIAL MEDIA DATA CLUSTERING

GeoJournal of Tourism and Geosites ◽

10.30892/gtg.34129-640 ◽

2021 ◽

Vol 34 (1) ◽

pp. 218-225

Author(s):

Totok Wahyu WIBOWO ◽

◽

Sigit Heru Murti Budi SANTOSA ◽

Bowo SUSILO ◽

Taufik Hery PURWANTO ◽

...

Keyword(s):

Social Media ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Research Area ◽

City Development ◽

Social Media Data ◽

Shopping Centres ◽

The Impact ◽

Media Data ◽

Yogyakarta City

Cities have a common characteristic in the form of land utilisation, which is dominated by built-up areas. Tourism is an essential aspect of city development because it can involve the identity of the city. Historical buildings, landmarks, shopping centres and museums are generally interesting places for tourists to visit. Yogyakarta, the research area, is synonymous as a city of culture and of students. Knowledge of the spatial clustering patterns of tourists can be one of the references for urban development. Social media data were used in the study as an alternative to direct data collection, which requires considerable resources. Flickr and Twitter were used as proxies to dete rmine the distribution of tourists, and the DBSCAN and HDBSCAN clustering algorithms were used to determine the centres of tourist activity. Furthermore, Flickr data were analysed temporally to determine the impact of the COVID-19 pandemic on tourism in Yogyakarta City. The clustering of social media data results shows that there are several city hotspots, besides the already well-known tourist attractions. Apart from city landmarks, several other tourist hotspots were revealed through the clustering process, such as accommodation, shopping centres, entertainment venues and souvenir shops, which also support tourism activities. The impact of COVID-19 on tourism in Yogyakarta City can be reflected through the number of uploaded photos by tourists on Flickr, which has decreased since March 2020.

Download Full-text