Land use classification from social media data and satellite imagery

Abstract. Individual travel trajectories denote a series of places people visit along the time. These places (e.g., home, workspace, and park) reflect people’s corresponding activities (e.g., dwelling, work, and entertainment), which are discussed as semantic knowledge and could be implicit under raw data (Yan et al. 2013, Cai et al. 2016). Traditional survey data directly describe people’ activities at certain places, while costing tremendous labors and resources (Huang and Wong 2016). GPS data such as taxi logs record exact origin-destination pairs as well as people’s stay time along the way, from which semantics can be easily inferred combining with geographical context data (Yan et al. 2013). Research has been done to understand the activity sequences indicated by either individual or collective spatiotemporal (ST) travel trajectories using those dense data. Different models are proposed for trajectory mining and activity inference, including location categorization, frequent region detection, and so on (Njoo et al. 2015). A typical method for matching a location or region with a known activity type is to detect stay points and stay intervals of trajectories and to find geographical context of these stay occurrences (Furtado et al. 2013, Njoo et al. 2015, Beber et al. 2016, Beber et al. 2017).However, limited progress has been made to mine semantics of trajectory data collected from social media platforms. Specifically, detection of stay points and their intervals could be inaccurate using online trajectories because of data sparsity. Huang et al. (2014) define the notion of activity zone to detect activity types from digital footprints. In this method, individual travel trajectories first are aggregated using spatial clustering method such as density-based spatial clustering of applications with noise (DBSCAN). Then produced clusters are classified based on a regional land use map and Google Places application programming interface (API). Such land use data are only published at specific places, such as the state cartography office’s website at University of Wisconsin-Madison. Researchers need to search for those data based on their study area. Moreover, while major land use maps can be searched for large areas such as the whole United States, detailed land use data for statewide or citywide areas are made in diverse standards, which adds extra work to classify activity zones consistently. Besides, Google Places API is a service that Google opened for developers and will return information about a place, given the place location (e.g., address or GPS coordinates), in the search request. However, API keys need to be generated before we can use these interfaces and each user can only make a limited number of free-charged requests every day (i.e., 1,000 requests per 24 hours period). In sum, previous methods to detect activity zone types using social media data are not sufficient and can hardly achieve effective data fusion. Comparing to the high cost of using officially published dataset, emerging Volunteered Geographic Information (VGI) data offer an alternative to infer the types of an individual’s activities performed in each zone (i.e., cluster).Using geo-tagged tweets as an example, this research proposes a framework for mining social media data, detecting individual semantic travel trajectories, and individual representative daily travel trajectory paths by fusing with VGI data, specifically OpenStreetMap (OSM) datasets. First, inactive users and abnormal users (e.g., users representing a company with account being shared by many employees) are removed through data pre-processing (Step 1 in Figure 1). Next, a multi-scale spatial clustering method is developed to aggregate online trajectories captured through geo-tagged tweets of a group of users into collective spatial hot-spots (i.e., activity zones; Step 2). By integrating multiple OSM datasets the activity type (e.g., dwelling, service, transportation and work) of each collective zone then can be identified (Step 3). Each geo-tagged tweet of an individual, represented as a ST point, is then attached with a collective activity zone that either includes or overlaps a buffer zone of the ST point. Herein, the buffer zone is generated by using the point as the centroid and a predefined threshold as the radius. Given an individual’s ST points with semantics (i.e., activity type information) derived from the attached collective activity zone, a semantic activity clustering method is then developed to detect daily representative activity clusters of the individual (Step 4). Finally, individual representative daily semantic travel trajectory paths (i.e., semantic travel trajectory, defined as chronological travel activity sequences) are constructed between every two subsequent activity clusters (Step 5). Experiments with the historic geo-tagged tweets collected within Madison, Wisconsin reveal that: 1) The proposed method can detect most significant activity zones with accurate zone types identified (Figure 2); and 2) The semantic activity clustering method based on the derived activity zones can aggregate individual travel trajectories into activity clusters more efficiently comparing to DBSCAN and varying DBSCAN (VDBSCAN).

Download Full-text

A dynamic human activity-driven model for mixed land use evaluation using social media data

Transactions in GIS ◽

10.1111/tgis.12447 ◽

2018 ◽

Vol 22 (5) ◽

pp. 1130-1151 ◽

Cited By ~ 9

Author(s):

Hanfa Xing ◽

Yuan Meng ◽

Yan Shi

Keyword(s):

Social Media ◽

Land Use ◽

Human Activity ◽

Social Media Data ◽

Mixed Land Use ◽

Media Data

Download Full-text

Classifying urban land use by integrating remote sensing and social media data

International Journal of Geographical Information Science ◽

10.1080/13658816.2017.1324976 ◽

2017 ◽

Vol 31 (8) ◽

pp. 1675-1696 ◽

Cited By ~ 85

Author(s):

Xiaoping Liu ◽

Jialv He ◽

Yao Yao ◽

Jinbao Zhang ◽

Haolin Liang ◽

...

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Urban Land ◽

Urban Land Use ◽

Social Media Data ◽

Media Data

Download Full-text

SUPPLEMENTING SATELLITE IMAGERY WITH SOCIAL MEDIA DATA FOR REMOTE RECONNAISSANCE: A CASE STUDY OF THE 2020 TAAL VOLCANO ERUPTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlvi-4-w6-2021-329-2021 ◽

2021 ◽

Vol XLVI-4/W6-2021 ◽

pp. 329-335

Author(s):

A. L. F. Yute ◽

E. J. G. Merin ◽

C. J. S. Sarmiento ◽

E. E. Elazagui

Keyword(s):

Social Media ◽

Satellite Imagery ◽

The Philippines ◽

Entity Recognition ◽

Support Vector ◽

Social Media Data ◽

Volcano Eruption ◽

Taal Volcano ◽

Dot Density ◽

Media Data

Abstract. Social sensing and satellite imagery are named as the top emerging data sources for disaster management. There is a wealth of data, both in quantity and quality that can be extracted from social media platforms such as Twitter, given that the content published by users is generally in real-time and includes a geotag or toponym. To reduce costs, risks, and time, performing reconnaissance using remote sources of information is highly suggested. This study explores how social media data can be used to supplement satellite imagery in post-disaster remote reconnaissance using the January 2020 Taal Volcano Eruption in the Philippines. Tweets about the volcanic eruption were scraped, and ashfall-affected locations mentioned in tweet content were extracted using Named Entity Recognition (NER). To visualize the progression of the tweeted locations, dot density maps and hotspot maps were generated. Additionally, a potential ashfall extent map was generated from processed DIWATA-2 satellite imagery using Support Vector Machine (SVM) classification. An intersection of both dot density map and ashfall extent map was performed for comparative analysis of both data. Validation was carried out by matching the ashfall-affected locations with ground reports from local government offices and news reports. The use of social media data complements satellite image classification in the detection of disaster damage for a quick and cost-efficient remote reconnaissance. This information can be utilized by rescue teams for faster emergency response and relief operations during and after a disaster.

Download Full-text

Urban Land Use and Land Cover Classification Using Multisource Remote Sensing Images and Social Media Data

Remote Sensing ◽

10.3390/rs11222719 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2719 ◽

Cited By ~ 4

Author(s):

Shi ◽

Qi ◽

Liu ◽

Niu ◽

Zhang

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Land Cover ◽

Commercial Buildings ◽

Landsat 8 ◽

Remote Sensing Images ◽

Social Media Data ◽

Urban Village ◽

Media Data

Land use and land cover (LULC) are diverse and complex in urban areas. Remotely sensed images are commonly used for land cover classification but hardly identifies urban land use and functional areas because of the semantic gap (i.e., different definitions of similar or identical buildings). Social media data, “marks” left by people using mobile phones, have great potential to overcome this semantic gap. Multisource remote sensing data are also expected to be useful in distinguishing different LULC types. This study examined the capability of combined multisource remote sensing images and social media data in urban LULC classification. Multisource remote sensing images included a Chinese ZiYuan-3 (ZY-3) high-resolution image, a Landsat 8 Operational Land Imager (OLI) multispectral image, and a Sentinel-1A synthetic aperture radar (SAR) image. Social media data consisted of the hourly spatial distribution of WeChat users, which is a ubiquitous messaging and payment platform in China. LULC was classified into 10 types, namely, vegetation, bare land, road, water, urban village, greenhouses, residential, commercial, industrial, and educational buildings. A method that integrates object-based image analysis, decision trees, and random forests was used for LULC classification. The overall accuracy and kappa value attained by the combination of multisource remote sensing images and WeChat data were 87.55% and 0.84, respectively. They further improved to 91.55% and 0.89, respectively, by integrating the textural and spatial features extracted from the ZY-3 image. The ZY-3 high-resolution image was essential for urban LULC classification because it is necessary for the accurate delineation of land parcels. The addition of Landsat 8 OLI, Sentinel-1A SAR, or WeChat data also made an irreplaceable contribution to the classification of different LULC types. The Landsat 8 OLI image helped distinguish between the urban village, residential buildings, commercial buildings, and roads, while the Sentinel-1A SAR data reduced the confusion between commercial buildings, greenhouses, and water. Rendering the spatial and temporal dynamics of population density, the WeChat data improved the classification accuracies of an urban village, greenhouses, and commercial buildings.

Download Full-text