Urban Region Function Mining Service Based on Social Media Text Analysis

Author(s):  
Yanchun Sun ◽  
Hang Yin ◽  
Jiu Wen ◽  
Zhiyu Sun

Urban region functions are the types of potential activities in an urban region, such as residence, commerce, transportation, entertainment, etc. A service which mines urban region functions is of great value for various applications, including urban planning and transportation management, etc. Many studies have been carried out to dig out different regions’ functions, but few studies are based on social media text analysis. Considering that the semantic information embedded in social media texts is very useful to infer an urban region’s main functions, we design a service which extracts human activities using Sina Weibo ( www.weibo.com ; the largest microblog system in Chinese, similar to Twitter) with location information and further describes a region’s main functions with a function vector based on the human activities. First, we predefine a variety of human activities to get the related activities corresponding to each Weibo post using an urban function classification model. Second, urban regions’ function vectors are generated, with which we can easily do some high-level work such as similar place recommendation. At last, with the function vectors generated, we develop a Web application for urban region function querying. We also conduct a case study among the urban regions in Beijing, and the experiment results demonstrate the feasibility of our method.

2019 ◽  
pp. 016555151988860
Author(s):  
Salim Afra ◽  
Reda Alhajj

Extracting criminals’ information and discovering their network are techniques that investigators often rely on to get extra information about criminal incidents and potential criminals. With the recent advances of the Web, a.k.a. Web 2.0, it has become a rich source of data which provides a variety of information sources. In this article, we propose an integrated framework that combines a variety of available components and makes use of different sources of information provided on the Web to get a better knowledge about criminals or terrorists (we will use criminals to cover all terrorists in the rest of this article). Our system extracts criminals’ information and their corresponding network using Web sources, such as online newspapers, official reports, and social media. It uses text analysis to identify key persons and topics from crawled Web documents. We build a criminal graph from the analysed text based on the co-occurrence of mentioning of criminals. Further analysis is applied on the constructed graph to get key people, hidden relationships and interactions between criminals, as well as hierarchical criminal groups within a network. For every process in the framework, we analysed various available works and implementations that could be used in the process. While analysing social media posts, we identified several challenges which show what solutions could be used for that purpose. Finally, we provide a Web application which implements the proposed framework. It also shows how helpful and efficient the system is in extracting and analysing criminal information.


Author(s):  
Chao Ye ◽  
Fan Zhang ◽  
Lan Mu ◽  
Yong Gao ◽  
Yu Liu

Recognizing urban functions is crucial for understanding urban spatial structures and urban planning. Previous work has investigated urban functions based on human activities that were derived from mobile phone positioning data, check-in data, taxi data, etc. However, urban functions can only be comprehensively sensed from both human activities and the physical environment together. To do so, a deep learning method was proposed to predict urban functions by integrating social media data and street-level imagery. The verbs extracted from social media posts were taken as the proxy for human activities, and we identified urban physical environmental information from street-level imagery. Then urban functions were uncovered from both the verbs in terms of human activities and street-level imagery from the perspective of the physical environment. Twelve types of urban function were recognized by verbs in social media posts, which were then improved by integrating street-level imagery within the 5th Ring Road of Beijing, China. The experiment demonstrated that verbs as direct proxies for human activities can avoid noise, and the multi-source data integration eliminated biases caused by a single data source. This work provides a comprehensive understanding of urban structure and dynamics for urban management and planning.


Author(s):  
Shashi Shekhar ◽  
Hitendra Garg ◽  
Rohit Agrawal ◽  
Shivendra Shivani ◽  
Bhisham Sharma

AbstractThe paper describes the usage of self-learning Hierarchical LSTM technique for classifying hatred and trolling contents in social media code-mixed data. The Hierarchical LSTM-based learning is a novel learning architecture inspired from the neural learning models. The proposed HLSTM model is trained to identify the hatred and trolling words available in social media contents. The proposed HLSTM systems model is equipped with self-learning and predicting mechanism for annotating hatred words in transliteration domain. The Hindi–English data are ordered into Hindi, English, and hatred labels for classification. The mechanism of word embedding and character-embedding features are used here for word representation in the sentence to detect hatred words. The method developed based on HLSTM model helps in recognizing the hatred word context by mining the intention of the user for using that word in the sentence. Wide experiments suggests that the HLSTM-based classification model gives the accuracy of 97.49% when evaluated against the standard parameters like BLSTM, CRF, LR, SVM, Random Forest and Decision Tree models especially when there are some hatred and trolling words in the social media data.


Author(s):  
Dominik Wawrzuta ◽  
Mariusz Jaworski ◽  
Joanna Gotlib ◽  
Mariusz Panczyk

Author(s):  
Edward Ombui ◽  
Lawrence Muchemi ◽  
Peter Wagacha

This study uses natural language processing to identify hate speech in social media codeswitched text. It trains nine models and tests their predictiveness in recognizing hate speech in a 50k human-annotated dataset. The article proposes a novel hierarchical approach that leverages Latent Dirichlet Analysis to develop topic models that assist build a high-level Psychosocial feature set we call PDC. PDC organizes words into word families, which helps capture codeswitching during preprocessing for supervised learning models. Informed by the duplex theory of hate, the PDC features are based on a hate speech annotation framework. Frequency-based models employing the PDC feature on tweets from the 2012 and 2017 Kenyan presidential elections yielded an f-score of 83 percent (precision: 81 percent, recall: 85 percent) in recognizing hate speech. The study is notable because it publicly exposes a rich codeswitched dataset for comparative studies. Second, it describes how to create a novel PDC feature set to detect subtle types of hate speech hidden in codeswitched data that previous approaches could not detect.


Sign in / Sign up

Export Citation Format

Share Document