Spatio-Temporal Machine Learning Analysis of Social Media Data and Refugee Movement Statistics

Clemens Havas; Lorenz Wendlinger; Julian Stier; Sahib Julka; Veronika Krieger; Cornelia Ferner; Andreas Petutschnig; Michael Granitzer; Stefan Wegenkittl; Bernd Resch

doi:10.3390/ijgi10080498

Spatio-Temporal Machine Learning Analysis of Social Media Data and Refugee Movement Statistics

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10080498 ◽

2021 ◽

Vol 10 (8) ◽

pp. 498

Author(s):

Clemens Havas ◽

Lorenz Wendlinger ◽

Julian Stier ◽

Sahib Julka ◽

Veronika Krieger ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Public Authorities ◽

Social Media Data ◽

Spatial Features ◽

Syrian Civil War ◽

Information Gap ◽

The Status ◽

Spatio Temporal ◽

Media Data

In 2015, within the timespan of only a few months, more than a million people made their way from Turkey to Central Europe in the wake of the Syrian civil war. At the time, public authorities and relief organisations struggled with the admission, transfer, care, and accommodation of refugees due to the information gap about ongoing refugee movements. Therefore, we propose an approach utilising machine learning methods and publicly available data to provide more information about refugee movements. The approach combines methods to analyse the textual, temporal and spatial features of social media data and the number of arriving refugees of historical refugee movement statistics to provide relevant and up to date information about refugee movements and expected numbers. The results include spatial patterns and factual information about collective refugee movements extracted from social media data that match actual movement patterns. Furthermore, our approach enables us to forecast and simulate refugee movements to forecast an increase or decrease in the number of incoming refugees and to analyse potential future scenarios. We demonstrate that the approach proposed in this article benefits refugee management and vastly improves the status quo.

Download Full-text

Hybrid features prediction model of movie quality using Multi-machine learning techniques for effective business resource planning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201844 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9361-9382 ◽

Cited By ~ 1

Author(s):

Naeem Iqbal ◽

Rashid Ahmad ◽

Faisal Jamil ◽

Do-Hyeun Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Resource Planning ◽

Experimental Results ◽

Quality Prediction ◽

Classification Models ◽

Hybrid Features ◽

Social Media Data ◽

Media Data

Quality prediction plays an essential role in the business outcome of the product. Due to the business interest of the concept, it has extensively been studied in the last few years. Advancement in machine learning (ML) techniques and with the advent of robust and sophisticated ML algorithms, it is required to analyze the factors influencing the success of the movies. This paper presents a hybrid features prediction model based on pre-released and social media data features using multiple ML techniques to predict the quality of the pre-released movies for effective business resource planning. This study aims to integrate pre-released and social media data features to form a hybrid features-based movie quality prediction (MQP) model. The proposed model comprises of two different experimental models; (i) predict movies quality using the original set of features and (ii) develop a subset of features based on principle component analysis technique to predict movies success class. This work employ and implement different ML-based classification models, such as Decision Tree (DT), Support Vector Machines with the linear and quadratic kernel (L-SVM and Q-SVM), Logistic Regression (LR), Bagged Tree (BT) and Boosted Tree (BOT), to predict the quality of the movies. Different performance measures are utilized to evaluate the performance of the proposed ML-based classification models, such as Accuracy (AC), Precision (PR), Recall (RE), and F-Measure (FM). The experimental results reveal that BT and BOT classifiers performed accurately and produced high accuracy compared to other classifiers, such as DT, LR, LSVM, and Q-SVM. The BT and BOT classifiers achieved an accuracy of 90.1% and 89.7%, which shows an efficiency of the proposed MQP model compared to other state-of-art- techniques. The proposed work is also compared with existing prediction models, and experimental results indicate that the proposed MQP model performed slightly better compared to other models. The experimental results will help the movies industry to formulate business resources effectively, such as investment, number of screens, and release date planning, etc.

Download Full-text

Real-time spatio-temporal event detection on geotagged social media

Journal Of Big Data ◽

10.1186/s40537-021-00482-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yasmeen George ◽

Shanika Karunasekera ◽

Aaron Harwood ◽

Kwan Hui Lim

Keyword(s):

New York ◽

Social Media ◽

Event Detection ◽

Detection System ◽

Time And Space ◽

Social Media Data ◽

Event Time ◽

Spatio Temporal ◽

Geographical Space ◽

Media Data

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

Download Full-text

Predicting ethnicity with data on personal names in Russia

10.31235/osf.io/wf6p4 ◽

2021 ◽

Author(s):

Alexey Bessudnov ◽

Denis Tarasov ◽

Viacheslav Panasovets ◽

Veronica Kostenko ◽

Ivan Smirnov ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Ethnic Groups ◽

Geographical Location ◽

Ethnic Relations ◽

Social Media Data ◽

Personal Names ◽

Learning Classifier ◽

Media Data

In this paper we develop a machine learning classifier that predicts perceived ethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity has been determined from languages spoken by users and their geographical location, with the data manually cleaned by crowd workers. The classifier shows the accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relations in Russia, in particular with VK and other social media data.

Download Full-text

Sentiment Analysis in Social Media using Machine Learning Techniques

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.1.22 ◽

2020 ◽

pp. 193-201 ◽

Cited By ~ 1

Author(s):

Hayder A. Alatabi ◽

Ayad R. Abbas

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Machine Learning Techniques ◽

Great Success ◽

Social Media Data ◽

Learning Techniques ◽

The World ◽

Analysis System ◽

Media Data

Over the last period, social media achieved a widespread use worldwide where the statistics indicate that more than three billion people are on social media, leading to large quantities of data online. To analyze these large quantities of data, a special classification method known as sentiment analysis, is used. This paper presents a new sentiment analysis system based on machine learning techniques, which aims to create a process to extract the polarity from social media texts. By using machine learning techniques, sentiment analysis achieved a great success around the world. This paper investigates this topic and proposes a sentiment analysis system built on Bayesian Rough Decision Tree (BRDT) algorithm. The experimental results show the success of this system where the accuracy of the system is more than 95% on social media data.

Download Full-text

Communication Sentiment Analyzer using Machine Learning with Naive Bayes Bernoullinb

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1610.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5976-5979

Keyword(s):

Machine Learning ◽

Social Media ◽

Major Part ◽

Naive Bayes ◽

Naïve Bayes ◽

User Preferences ◽

Social Media Data ◽

Machine Learning Model ◽

The World ◽

Media Data

In this never-ending social media era it is estimated that over 5 billion people use smartphones. Out of these, there are over 1.5 billion active users in the world. In which we all are a major part and before opening our messages we all are curious about what message we have received. No doubt, we all always hope for a good message to be received. So Sentiment analysis on social media data has been seen by many as an effective tool to monitor user preferences and inclination. Finally, we propose a scalable machine learning model to analyze the polarity of a communicative text using Naive Bayes’ Bernoulli classifier. This paper works on only two polarities that is whether the sentence is positive or negative. Bernoulli classifier is used in this paper because it is best suited for binary inputs which in turn enhances the accuracy of up to 97%.

Download Full-text

An unsupervised machine learning model for discovering latent infectious diseases using social media data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2016.12.007 ◽

2017 ◽

Vol 66 ◽

pp. 82-94 ◽

Cited By ~ 43

Author(s):

Sunghoon Lim ◽

Conrad S. Tucker ◽

Soundar Kumara

Keyword(s):

Machine Learning ◽

Social Media ◽

Infectious Diseases ◽

Learning Model ◽

Unsupervised Machine Learning ◽

Social Media Data ◽

Machine Learning Model ◽

Media Data

Download Full-text

Analysis of Social Media Data to Classify and Detect Frequent Issues Using Machine Learning Approach

2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT) ◽

10.1109/icaict51780.2020.9333452 ◽

2020 ◽

Author(s):

Pankaj Bhowmik ◽

Md. Sohrawordi ◽

U.A. Md. Ehsan Ali ◽

Md. Najmul Hasan ◽

Prodip Kumar Roy

Keyword(s):

Machine Learning ◽

Social Media ◽

Learning Approach ◽

Social Media Data ◽

Machine Learning Approach ◽

Media Data

Download Full-text

EXTRACTING AND COMPARING PLACES USING GEO-SOCIAL MEDIA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-3-w5-311-2015 ◽

2015 ◽

Vol II-3/W5 ◽

pp. 311-316

Author(s):

F. O. Ostermann ◽

H. Huang ◽

G. Andrienko ◽

N. Andrienko ◽

C. Capineri ◽

...

Keyword(s):

Social Media ◽

Semantic Similarity ◽

Data Set ◽

Social Media Data ◽

Temporal Clustering ◽

Depth Study ◽

Data Source ◽

Spatio Temporal ◽

Media Data

Increasing availability of Geo-Social Media (e.g. Facebook, Foursquare and Flickr) has led to the accumulation of large volumes of social media data. These data, especially geotagged ones, contain information about perception of and experiences in various environments. Harnessing these data can be used to provide a better understanding of the semantics of places. We are interested in the similarities or differences between different Geo-Social Media in the description of places. This extended abstract presents the results of a first step towards a more in-depth study of semantic similarity of places. Particularly, we took places extracted through spatio-temporal clustering from one data source (Twitter) and examined whether their structure is reflected semantically in another data set (Flickr). Based on that, we analyse how the semantic similarity between places varies over space and scale, and how Tobler's first law of geography holds with regards to scale and places.

Download Full-text

Assessing Patient-Perceived Hospital Service Quality and Sentiment in Malaysian Public Hospitals Using Machine Learning and Facebook Reviews

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18189912 ◽

2021 ◽

Vol 18 (18) ◽

pp. 9912

Author(s):

Afiq Izzudin A. Rahim ◽

Mohd Ismail Ibrahim ◽

Kamarul Imran Musa ◽

Sook-Ling Chua ◽

Najib Majdi Yaacob

Keyword(s):

Machine Learning ◽

Social Media ◽

Quality Of Care ◽

Service Quality ◽

Hospital Quality ◽

Public Hospitals ◽

Social Media Data ◽

Positive Sentiment ◽

Media Data

Social media is emerging as a new avenue for hospitals and patients to solicit input on the quality of care. However, social media data is unstructured and enormous in volume. Moreover, no empirical research on the use of social media data and perceived hospital quality of care based on patient online reviews has been performed in Malaysia. The purpose of this study was to investigate the determinants of positive sentiment expressed in hospital Facebook reviews in Malaysia, as well as the association between hospital accreditation and sentiments expressed in Facebook reviews. From 2017 to 2019, we retrieved comments from 48 official public hospitals’ Facebook pages. We used machine learning to build a sentiment analyzer and service quality (SERVQUAL) classifier that automatically classifies the sentiment and SERVQUAL dimensions. We utilized logistic regression analysis to determine our goals. We evaluated a total of 1852 reviews and our machine learning sentiment analyzer detected 72.1% of positive reviews and 27.9% of negative reviews. We classified 240 reviews as tangible, 1257 reviews as trustworthy, 125 reviews as responsive, 356 reviews as assurance, and 1174 reviews as empathy using our machine learning SERVQUAL classifier. After adjusting for hospital characteristics, all SERVQUAL dimensions except Tangible were associated with positive sentiment. However, no significant relationship between hospital accreditation and online sentiment was discovered. Facebook reviews powered by machine learning algorithms provide valuable, real-time data that may be missed by traditional hospital quality assessments. Additionally, online patient reviews offer a hitherto untapped indication of quality that may benefit all healthcare stakeholders. Our results confirm prior studies and support the use of Facebook reviews as an adjunct method for assessing the quality of hospital services in Malaysia.

Download Full-text

Suicide Risk and Protective Factors in Online Support Forum Posts: Annotation Scheme Development and Validation Study (Preprint)

10.2196/preprints.24471 ◽

2020 ◽

Author(s):

Stevie Chancellor ◽

Steven A Sumner ◽

Corinne David-Ferdon ◽

Tahirah Ahmad ◽

Munmun De Choudhury

Keyword(s):

Public Health ◽

Machine Learning ◽

Social Media ◽

Protective Factors ◽

Suicide Risk ◽

Risk And Protective Factors ◽

Prior Work ◽

Annotation Scheme ◽

Social Media Data ◽

Media Data

BACKGROUND Online communities provide support for individuals looking for help with suicidal ideation and crisis. As community data are increasingly used to devise machine learning models to infer who might be at risk, there have been limited efforts to identify both risk and protective factors in web-based posts. These annotations can enrich and augment computational assessment approaches to identify appropriate intervention points, which are useful to public health professionals and suicide prevention researchers. OBJECTIVE This qualitative study aims to develop a valid and reliable annotation scheme for evaluating risk and protective factors for suicidal ideation in posts in suicide crisis forums. METHODS We designed a valid, reliable, and clinically grounded process for identifying risk and protective markers in social media data. This scheme draws on prior work on construct validity and the social sciences of measurement. We then applied the scheme to annotate 200 posts from r/SuicideWatch—a Reddit community focused on suicide crisis. RESULTS We documented our results on producing an annotation scheme that is consistent with leading public health information coding schemes for suicide and advances attention to protective factors. Our study showed high internal validity, and we have presented results that indicate that our approach is consistent with findings from prior work. CONCLUSIONS Our work formalizes a framework that incorporates construct validity into the development of annotation schemes for suicide risk on social media. This study furthers the understanding of risk and protective factors expressed in social media data. This may help public health programming to prevent suicide and computational social science research and investigations that rely on the quality of labels for downstream machine learning tasks.

Download Full-text