A Novel Machine Learning Framework for Comparison of Viral COVID-19–Related Sina Weibo and Twitter Posts: Workflow Development and Content Analysis (Preprint)

BACKGROUND Social media plays a critical role in health communications, especially during global health emergencies such as the current COVID-19 pandemic. However, there is a lack of a universal analytical framework to extract, quantify, and compare content features in public discourse of emerging health issues on different social media platforms across a broad sociocultural spectrum. OBJECTIVE We aimed to develop a novel and universal content feature extraction and analytical framework and contrast how content features differ with sociocultural background in discussions of the emerging COVID-19 global health crisis on major social media platforms. METHODS We sampled the 1000 most shared viral Twitter and Sina Weibo posts regarding COVID-19, developed a comprehensive coding scheme to identify 77 potential features across six major categories (eg, clinical and epidemiological, countermeasures, politics and policy, responses), quantified feature values (0 or 1, indicating whether or not the content feature is mentioned in the post) in each viral post across social media platforms, and performed subsequent comparative analyses. Machine learning dimension reduction and clustering analysis were then applied to harness the power of social media data and provide more unbiased characterization of web-based health communications. RESULTS There were substantially different distributions, prevalence, and associations of content features in public discourse about the COVID-19 pandemic on the two social media platforms. Weibo users were more likely to focus on the disease itself and health aspects, while Twitter users engaged more about policy, politics, and other societal issues. CONCLUSIONS We extracted a rich set of content features from social media data to accurately characterize public discourse related to COVID-19 in different sociocultural backgrounds. In addition, this universal framework can be adopted to analyze social media discussions of other emerging health issues beyond the COVID-19 pandemic.

Download Full-text

A Novel Machine Learning Framework for Comparison of Viral COVID-19–Related Sina Weibo and Twitter Posts: Workflow Development and Content Analysis

Journal of Medical Internet Research ◽

10.2196/24889 ◽

2021 ◽

Vol 23 (1) ◽

pp. e24889

Author(s):

Shi Chen ◽

Lina Zhou ◽

Yunya Song ◽

Qian Xu ◽

Ping Wang ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Public Discourse ◽

Analytical Framework ◽

Health Issues ◽

Social Media Data ◽

Sina Weibo ◽

Social Media Platforms ◽

Media Data ◽

Content Feature

Background Social media plays a critical role in health communications, especially during global health emergencies such as the current COVID-19 pandemic. However, there is a lack of a universal analytical framework to extract, quantify, and compare content features in public discourse of emerging health issues on different social media platforms across a broad sociocultural spectrum. Objective We aimed to develop a novel and universal content feature extraction and analytical framework and contrast how content features differ with sociocultural background in discussions of the emerging COVID-19 global health crisis on major social media platforms. Methods We sampled the 1000 most shared viral Twitter and Sina Weibo posts regarding COVID-19, developed a comprehensive coding scheme to identify 77 potential features across six major categories (eg, clinical and epidemiological, countermeasures, politics and policy, responses), quantified feature values (0 or 1, indicating whether or not the content feature is mentioned in the post) in each viral post across social media platforms, and performed subsequent comparative analyses. Machine learning dimension reduction and clustering analysis were then applied to harness the power of social media data and provide more unbiased characterization of web-based health communications. Results There were substantially different distributions, prevalence, and associations of content features in public discourse about the COVID-19 pandemic on the two social media platforms. Weibo users were more likely to focus on the disease itself and health aspects, while Twitter users engaged more about policy, politics, and other societal issues. Conclusions We extracted a rich set of content features from social media data to accurately characterize public discourse related to COVID-19 in different sociocultural backgrounds. In addition, this universal framework can be adopted to analyze social media discussions of other emerging health issues beyond the COVID-19 pandemic.

Download Full-text

Social Media Data-Based Sentiment Analysis of Tourists’ Air Quality Perceptions

Sustainability ◽

10.3390/su11185070 ◽

2019 ◽

Vol 11 (18) ◽

pp. 5070 ◽

Cited By ~ 3

Author(s):

Yuguo Tao ◽

Feng Zhang ◽

Chunyun Shi ◽

Yun Chen

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Air Quality ◽

Sentiment Analysis ◽

Social Media Data ◽

Sina Weibo ◽

Emotion Words ◽

Tourist Destinations ◽

Media Data

Analyzing tourists’ perceptions of air quality is of great significance to the study of tourist experience satisfaction and the image construction of tourism destinations. In this study, using the web crawler technique, we collected 27,500 comments regarding the air quality of 195 of China’s Class 5A tourist destinations posted by tourists on Sina Weibo from January 2011 to December 2017; these comments were then subjected to a content analysis using the Gooseeker, ROST CM (Content Mining System) and BosonNLP (Natural Language Processing) tools. Based on an analysis of the proportions of sentences with different emotional polarities with ROST EA (Emotion Analysis), we measured the sentiment value of texts using the artificial neural network (ANN) machine learning method implemented through a Chinese social media data-oriented Boson platform based on the Python programming language. The content analysis results indicated that in the adaption stage in Sina Weibo, tourists’ perceptions of air quality were mainly positive and had poor air pollution crisis awareness. Objective emotion words exhibited a similarly high proportion as subjective emotion words, indicating that taking both objective and subjective emotion words into account simultaneously helps to comprehensively understand the emotional content of the comments. The sentiment analysis results showed that for the entire text, sentences with positive emotions accounted for 85.53% of the total comments, with a sentiment value of 0.786, which belonged to the positive medium level; the direction of the temporal “up-down-up” changes and the spatial pattern of high in the south and low in the north (while having little difference between the east and the west) were basically consistent with reality. A further exploration of the theoretical basis of the semi-supervised ANN approach or the introduction of other machine learning methods using different data sources will help to analyze this phenomenon in greater depth. The paper provides evidence for new data and methods for air quality research in tourist destinations and provides a new tool for air quality monitoring.

Download Full-text

Hybrid features prediction model of movie quality using Multi-machine learning techniques for effective business resource planning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201844 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9361-9382 ◽

Cited By ~ 1

Author(s):

Naeem Iqbal ◽

Rashid Ahmad ◽

Faisal Jamil ◽

Do-Hyeun Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Resource Planning ◽

Experimental Results ◽

Quality Prediction ◽

Classification Models ◽

Hybrid Features ◽

Social Media Data ◽

Media Data

Quality prediction plays an essential role in the business outcome of the product. Due to the business interest of the concept, it has extensively been studied in the last few years. Advancement in machine learning (ML) techniques and with the advent of robust and sophisticated ML algorithms, it is required to analyze the factors influencing the success of the movies. This paper presents a hybrid features prediction model based on pre-released and social media data features using multiple ML techniques to predict the quality of the pre-released movies for effective business resource planning. This study aims to integrate pre-released and social media data features to form a hybrid features-based movie quality prediction (MQP) model. The proposed model comprises of two different experimental models; (i) predict movies quality using the original set of features and (ii) develop a subset of features based on principle component analysis technique to predict movies success class. This work employ and implement different ML-based classification models, such as Decision Tree (DT), Support Vector Machines with the linear and quadratic kernel (L-SVM and Q-SVM), Logistic Regression (LR), Bagged Tree (BT) and Boosted Tree (BOT), to predict the quality of the movies. Different performance measures are utilized to evaluate the performance of the proposed ML-based classification models, such as Accuracy (AC), Precision (PR), Recall (RE), and F-Measure (FM). The experimental results reveal that BT and BOT classifiers performed accurately and produced high accuracy compared to other classifiers, such as DT, LR, LSVM, and Q-SVM. The BT and BOT classifiers achieved an accuracy of 90.1% and 89.7%, which shows an efficiency of the proposed MQP model compared to other state-of-art- techniques. The proposed work is also compared with existing prediction models, and experimental results indicate that the proposed MQP model performed slightly better compared to other models. The experimental results will help the movies industry to formulate business resources effectively, such as investment, number of screens, and release date planning, etc.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

Google Plus as a Contentious Field of Revolutionary Identity

Comparative Sociology ◽

10.1163/15691330-bja10036 ◽

2021 ◽

Vol 20 (3) ◽

pp. 402-416

Author(s):

Amirhossein Teimouri

Keyword(s):

Social Media ◽

Iranian Revolution ◽

Social Media Data ◽

Social Media Platforms ◽

New Generation ◽

Media Data

Abstract Social media platforms have been increasingly reinvigorating extreme movements, especially rightist movements. Utilizing unique Google Plus data, the author shows the rise and fall of the 2015 rightist anti-Nuclear Deal movement in Iran. He argues that the Google Plus platform in 2015 provided the new generation of revolutionary Islamist rightist activists with a contentious space of mobilization, enabling them to develop a new revolutionary rightist identity. This revolutionary identity and its corresponding language and discourse did not fully unfold in Iranian mainstream rightist media, even though rightist groups, compared to liberal groups, are not censored and repressed. The new generation of rightist activists perceived the Nuclear Deal as an existential threat to revolutionary principles of the country, and thus played out their outrage and identity anxieties on Google Plus. The author contends that this online outrage, due to the activists’ identity bond with the regime and the 1979 Iranian Revolution, however, did not translate into any massive offline mobilization against the Nuclear Deal. He also discusses the methodological implications of using social media data, especially the discontinuation of Google Plus.

Download Full-text

Predicting ethnicity with data on personal names in Russia

10.31235/osf.io/wf6p4 ◽

2021 ◽

Author(s):

Alexey Bessudnov ◽

Denis Tarasov ◽

Viacheslav Panasovets ◽

Veronica Kostenko ◽

Ivan Smirnov ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Ethnic Groups ◽

Geographical Location ◽

Ethnic Relations ◽

Social Media Data ◽

Personal Names ◽

Learning Classifier ◽

Media Data

In this paper we develop a machine learning classifier that predicts perceived ethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity has been determined from languages spoken by users and their geographical location, with the data manually cleaned by crowd workers. The classifier shows the accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relations in Russia, in particular with VK and other social media data.

Download Full-text

Sentiment Analysis in Social Media using Machine Learning Techniques

Iraqi Journal of Science ◽

10.24996/ijs.2020.61.1.22 ◽

2020 ◽

pp. 193-201 ◽

Cited By ~ 1

Author(s):

Hayder A. Alatabi ◽

Ayad R. Abbas

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Machine Learning Techniques ◽

Great Success ◽

Social Media Data ◽

Learning Techniques ◽

The World ◽

Analysis System ◽

Media Data

Over the last period, social media achieved a widespread use worldwide where the statistics indicate that more than three billion people are on social media, leading to large quantities of data online. To analyze these large quantities of data, a special classification method known as sentiment analysis, is used. This paper presents a new sentiment analysis system based on machine learning techniques, which aims to create a process to extract the polarity from social media texts. By using machine learning techniques, sentiment analysis achieved a great success around the world. This paper investigates this topic and proposes a sentiment analysis system built on Bayesian Rough Decision Tree (BRDT) algorithm. The experimental results show the success of this system where the accuracy of the system is more than 95% on social media data.

Download Full-text

Communication Sentiment Analyzer using Machine Learning with Naive Bayes Bernoullinb

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1610.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5976-5979

Keyword(s):

Machine Learning ◽

Social Media ◽

Major Part ◽

Naive Bayes ◽

Naïve Bayes ◽

User Preferences ◽

Social Media Data ◽

Machine Learning Model ◽

The World ◽

Media Data

In this never-ending social media era it is estimated that over 5 billion people use smartphones. Out of these, there are over 1.5 billion active users in the world. In which we all are a major part and before opening our messages we all are curious about what message we have received. No doubt, we all always hope for a good message to be received. So Sentiment analysis on social media data has been seen by many as an effective tool to monitor user preferences and inclination. Finally, we propose a scalable machine learning model to analyze the polarity of a communicative text using Naive Bayes’ Bernoulli classifier. This paper works on only two polarities that is whether the sentence is positive or negative. Bernoulli classifier is used in this paper because it is best suited for binary inputs which in turn enhances the accuracy of up to 97%.

Download Full-text

An unsupervised machine learning model for discovering latent infectious diseases using social media data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2016.12.007 ◽

2017 ◽

Vol 66 ◽

pp. 82-94 ◽

Cited By ~ 43

Author(s):

Sunghoon Lim ◽

Conrad S. Tucker ◽

Soundar Kumara

Keyword(s):

Machine Learning ◽

Social Media ◽

Infectious Diseases ◽

Learning Model ◽

Unsupervised Machine Learning ◽

Social Media Data ◽

Machine Learning Model ◽

Media Data

Download Full-text

Review of Data Visualization for Social Media Postings

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.38.27613 ◽

2018 ◽

Vol 7 (4.38) ◽

pp. 939

Author(s):

Nur Atiqah Sia Abdullah ◽

Hamizah Binti Anuar

Keyword(s):

Social Media ◽

Data Visualization ◽

Line Graph ◽

Data Types ◽

Social Media Data ◽

Data Analyst ◽

The Social ◽

Social Media Platforms ◽

Types Of Information ◽

Media Data

Facebook and Twitter are the most popular social media platforms among netizen. People are now more aggressive to express their opinions, perceptions, and emotions through social media platforms. These massive data provide great value for the data analyst to understand patterns and emotions related to a certain issue. Mining the data needs techniques and time, therefore data visualization becomes trending in representing these types of information. This paper aims to review data visualization studies that involved data from social media postings. Past literature used node-link diagram, node-link tree, directed graph, line graph, heatmap, and stream graph to represent the data collected from the social media platforms. An analysis by comparing the social media data types, representation, and data visualization techniques is carried out based on the previous studies. This paper critically discussed the comparison and provides a suggestion for the suitability of data visualization based on the type of social media data in hand.

Download Full-text