A dynamic human activity-driven model for mixed land use evaluation using social media data

Abstract. Individual travel trajectories denote a series of places people visit along the time. These places (e.g., home, workspace, and park) reflect people’s corresponding activities (e.g., dwelling, work, and entertainment), which are discussed as semantic knowledge and could be implicit under raw data (Yan et al. 2013, Cai et al. 2016). Traditional survey data directly describe people’ activities at certain places, while costing tremendous labors and resources (Huang and Wong 2016). GPS data such as taxi logs record exact origin-destination pairs as well as people’s stay time along the way, from which semantics can be easily inferred combining with geographical context data (Yan et al. 2013). Research has been done to understand the activity sequences indicated by either individual or collective spatiotemporal (ST) travel trajectories using those dense data. Different models are proposed for trajectory mining and activity inference, including location categorization, frequent region detection, and so on (Njoo et al. 2015). A typical method for matching a location or region with a known activity type is to detect stay points and stay intervals of trajectories and to find geographical context of these stay occurrences (Furtado et al. 2013, Njoo et al. 2015, Beber et al. 2016, Beber et al. 2017).However, limited progress has been made to mine semantics of trajectory data collected from social media platforms. Specifically, detection of stay points and their intervals could be inaccurate using online trajectories because of data sparsity. Huang et al. (2014) define the notion of activity zone to detect activity types from digital footprints. In this method, individual travel trajectories first are aggregated using spatial clustering method such as density-based spatial clustering of applications with noise (DBSCAN). Then produced clusters are classified based on a regional land use map and Google Places application programming interface (API). Such land use data are only published at specific places, such as the state cartography office’s website at University of Wisconsin-Madison. Researchers need to search for those data based on their study area. Moreover, while major land use maps can be searched for large areas such as the whole United States, detailed land use data for statewide or citywide areas are made in diverse standards, which adds extra work to classify activity zones consistently. Besides, Google Places API is a service that Google opened for developers and will return information about a place, given the place location (e.g., address or GPS coordinates), in the search request. However, API keys need to be generated before we can use these interfaces and each user can only make a limited number of free-charged requests every day (i.e., 1,000 requests per 24 hours period). In sum, previous methods to detect activity zone types using social media data are not sufficient and can hardly achieve effective data fusion. Comparing to the high cost of using officially published dataset, emerging Volunteered Geographic Information (VGI) data offer an alternative to infer the types of an individual’s activities performed in each zone (i.e., cluster).Using geo-tagged tweets as an example, this research proposes a framework for mining social media data, detecting individual semantic travel trajectories, and individual representative daily travel trajectory paths by fusing with VGI data, specifically OpenStreetMap (OSM) datasets. First, inactive users and abnormal users (e.g., users representing a company with account being shared by many employees) are removed through data pre-processing (Step 1 in Figure 1). Next, a multi-scale spatial clustering method is developed to aggregate online trajectories captured through geo-tagged tweets of a group of users into collective spatial hot-spots (i.e., activity zones; Step 2). By integrating multiple OSM datasets the activity type (e.g., dwelling, service, transportation and work) of each collective zone then can be identified (Step 3). Each geo-tagged tweet of an individual, represented as a ST point, is then attached with a collective activity zone that either includes or overlaps a buffer zone of the ST point. Herein, the buffer zone is generated by using the point as the centroid and a predefined threshold as the radius. Given an individual’s ST points with semantics (i.e., activity type information) derived from the attached collective activity zone, a semantic activity clustering method is then developed to detect daily representative activity clusters of the individual (Step 4). Finally, individual representative daily semantic travel trajectory paths (i.e., semantic travel trajectory, defined as chronological travel activity sequences) are constructed between every two subsequent activity clusters (Step 5). Experiments with the historic geo-tagged tweets collected within Madison, Wisconsin reveal that: 1) The proposed method can detect most significant activity zones with accurate zone types identified (Figure 2); and 2) The semantic activity clustering method based on the derived activity zones can aggregate individual travel trajectories into activity clusters more efficiently comparing to DBSCAN and varying DBSCAN (VDBSCAN).

Download Full-text

Classifying urban land use by integrating remote sensing and social media data

International Journal of Geographical Information Science ◽

10.1080/13658816.2017.1324976 ◽

2017 ◽

Vol 31 (8) ◽

pp. 1675-1696 ◽

Cited By ~ 85

Author(s):

Xiaoping Liu ◽

Jialv He ◽

Yao Yao ◽

Jinbao Zhang ◽

Haolin Liang ◽

...

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Urban Land ◽

Urban Land Use ◽

Social Media Data ◽

Media Data

Download Full-text

Urban Land Use and Land Cover Classification Using Multisource Remote Sensing Images and Social Media Data

Remote Sensing ◽

10.3390/rs11222719 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2719 ◽

Cited By ~ 4

Author(s):

Shi ◽

Qi ◽

Liu ◽

Niu ◽

Zhang

Keyword(s):

Remote Sensing ◽

Social Media ◽

Land Use ◽

Land Cover ◽

Commercial Buildings ◽

Landsat 8 ◽

Remote Sensing Images ◽

Social Media Data ◽

Urban Village ◽

Media Data

Land use and land cover (LULC) are diverse and complex in urban areas. Remotely sensed images are commonly used for land cover classification but hardly identifies urban land use and functional areas because of the semantic gap (i.e., different definitions of similar or identical buildings). Social media data, “marks” left by people using mobile phones, have great potential to overcome this semantic gap. Multisource remote sensing data are also expected to be useful in distinguishing different LULC types. This study examined the capability of combined multisource remote sensing images and social media data in urban LULC classification. Multisource remote sensing images included a Chinese ZiYuan-3 (ZY-3) high-resolution image, a Landsat 8 Operational Land Imager (OLI) multispectral image, and a Sentinel-1A synthetic aperture radar (SAR) image. Social media data consisted of the hourly spatial distribution of WeChat users, which is a ubiquitous messaging and payment platform in China. LULC was classified into 10 types, namely, vegetation, bare land, road, water, urban village, greenhouses, residential, commercial, industrial, and educational buildings. A method that integrates object-based image analysis, decision trees, and random forests was used for LULC classification. The overall accuracy and kappa value attained by the combination of multisource remote sensing images and WeChat data were 87.55% and 0.84, respectively. They further improved to 91.55% and 0.89, respectively, by integrating the textural and spatial features extracted from the ZY-3 image. The ZY-3 high-resolution image was essential for urban LULC classification because it is necessary for the accurate delineation of land parcels. The addition of Landsat 8 OLI, Sentinel-1A SAR, or WeChat data also made an irreplaceable contribution to the classification of different LULC types. The Landsat 8 OLI image helped distinguish between the urban village, residential buildings, commercial buildings, and roads, while the Sentinel-1A SAR data reduced the confusion between commercial buildings, greenhouses, and water. Rendering the spatial and temporal dynamics of population density, the WeChat data improved the classification accuracies of an urban village, greenhouses, and commercial buildings.

Download Full-text

Social media data as a proxy for hourly fine-scale electric power consumption estimation

Environment and Planning A Economy and Space ◽

10.1177/0308518x18786250 ◽

2018 ◽

Vol 50 (8) ◽

pp. 1553-1557 ◽

Cited By ~ 2

Author(s):

Chengbin Deng ◽

Weiying Lin ◽

Xinyue Ye ◽

Zhenlong Li ◽

Ziang Zhang ◽

...

Keyword(s):

Social Media ◽

Power Consumption ◽

Electric Power ◽

Human Activity ◽

Electric Power Consumption ◽

Fine Scale ◽

Human Beings ◽

Social Media Data ◽

Building Level ◽

Media Data

Accurate forecasting of electric demand is essential for the operation of modern power system. Inaccurate load forecasting will considerably affect the power grid efficiency. Forecasting the electric demand for a small area, such as a building, has long been a well-known challenge. In this research, we examined the association between geotagged tweets and hourly electric consumption at a fine scale. All available geotagged tweets and electric meter readings were retrieved and spatially aggregated to each building in the study area. Comparing to traditional studies, the usage of geotagged tweets is to reflect human activity dynamics to some degree by considering human beings as sensors, which therefore can be employed at the building level. High correlation is found between the human activity indicator and the power consumption as supported by a correlation coefficient level over 0.8. To the best of our knowledge, rare studies placed an emphasis on hourly electric power consumption using social media data, especially at such a fine scale. This research shows the great potential of using Twitter data as a proxy of human activities to model hourly electric power consumption at the building level. More studies are warranted in the future to further examine the effectiveness of the proposed method in this research.

Download Full-text