Towards Generating Spam Queries for Retrieving Spam Accounts in Large-Scale Twitter Data

Author(s):  
Mahdi Washha ◽  
Aziz Qaroush ◽  
Manel Mezghani ◽  
Florence Sedes
Keyword(s):  
Author(s):  
Usman Naseem ◽  
Imran Razzak ◽  
Matloob Khushi ◽  
Peter W. Eklund ◽  
Jinman Kim

Author(s):  
Fan Zuo ◽  
Abdullah Kurkcu ◽  
Kaan Ozbay ◽  
Jingqin Gao

Emergency events affect human security and safety as well as the integrity of the local infrastructure. Emergency response officials are required to make decisions using limited information and time. During emergency events, people post updates to social media networks, such as tweets, containing information about their status, help requests, incident reports, and other useful information. In this research project, the Latent Dirichlet Allocation (LDA) model is used to automatically classify incident-related tweets and incident types using Twitter data. Unlike the previous social media information models proposed in the related literature, the LDA is an unsupervised learning model which can be utilized directly without prior knowledge and preparation for data in order to save time during emergencies. Twitter data including messages and geolocation information during two recent events in New York City, the Chelsea explosion and Hurricane Sandy, are used as two case studies to test the accuracy of the LDA model for extracting incident-related tweets and labeling them by incident type. Results showed that the model could extract emergency events and classify them for both small and large-scale events, and the model’s hyper-parameters can be shared in a similar language environment to save model training time. Furthermore, the list of keywords generated by the model can be used as prior knowledge for emergency event classification and training of supervised classification models such as support vector machine and recurrent neural network.


2019 ◽  
Vol 11 (9) ◽  
pp. 190 ◽  
Author(s):  
Jamal ◽  
Xianqiao ◽  
Aldabbas

Emotions detection in social media is very effective to measure the mood of people about a specific topic, news, or product. It has a wide range of applications, including identifying psychological conditions such as anxiety or depression in users. However, it is a challenging task to distinguish useful emotions’ features from a large corpus of text because emotions are subjective, with limited fuzzy boundaries that may be expressed in different terminologies and perceptions. To tackle this issue, this paper presents a hybrid approach of deep learning based on TensorFlow with Keras for emotions detection on a large scale of imbalanced tweets’ data. First, preprocessing steps are used to get useful features from raw tweets without noisy data. Second, the entropy weighting method is used to compute the importance of each feature. Third, class balancer is applied to balance each class. Fourth, Principal Component Analysis (PCA) is applied to transform high correlated features into normalized forms. Finally, the TensorFlow based deep learning with Keras algorithm is proposed to predict high-quality features for emotions classification. The proposed methodology is analyzed on a dataset of 1,600,000 tweets collected from the website ‘kaggle’. Comparison is made of the proposed approach with other state of the art techniques on different training ratios. It is proved that the proposed approach outperformed among other techniques.


Author(s):  
Yunwei Zhao ◽  
Can Wang ◽  
Chi-Hung Chi ◽  
Kwok-Yan Lam ◽  
Sen Wang

The availability of massive social media data has enabled the prediction of people’s future behavioral trends at an unprecedented large scale. Information cascades study on Twitter has been an integral part of behavior analysis. A number of methods based on the transactional features (such as keyword frequency) and the semantic features (such as sentiment) have been proposed to predict the future cascading trends. However, an in-depth understanding of the pros and cons of semantic and transactional models is lacking. This paper conducts a comparative study of both approaches in predicting information diffusion with three mechanisms: retweet cascade, url cascade, and hashtag cascade. Experiments on Twitter data show that the semantic model outperforms the transactional model, if the exterior pattern is less directly observable (i.e. hashtag cascade). When it becomes more directly observable (i.e. retweet and url cascades), the semantic method yet delivers approximate accuracy (i.e. url cascade) or even worse accuracy (i.e. retweet cascade). Further, we demonstrate that the transactional and semantic models are not independent, and the performance gets greatly enhanced when combining both.


Author(s):  
Yasunobu Sumikawa ◽  
Adam Jatowt

Abstract Microblogging platforms such as Twitter have been increasingly used nowadays to share information between users. They are also convenient means for propagating content related to history. Hence, from the research viewpoint they can offer opportunities to analyze the way in which users refer to the past, and how as well when such references appear and what purposes they serve. Such study could allow to quantify the interest degree and the mechanisms behind content dissemination. We report the results of a large scale exploratory analysis of history-oriented posts in microblogs based on a 28-month-long snapshot of Twitter data. The results can increase our understanding of the characteristics of history-focused content sharing in Twitter. They can also be used for guiding the design of content recommendation systems as well as time-aware search applications.


2018 ◽  
Vol 44 (4) ◽  
pp. 619-632 ◽  
Author(s):  
Xinyan Zhao ◽  
Mengqi Zhan ◽  
Cheng Jie

2017 ◽  
Vol 22 (3) ◽  
pp. 65-88
Author(s):  
Mahdi WASHHA ◽  
Manel MEZGHANI ◽  
Florence SÈDES

2020 ◽  
Author(s):  
Tasmiah Nuzhath ◽  
Samia Tasnim ◽  
Rahul Kumar Sanjwal ◽  
Nusrat Fahmida Trisha ◽  
Mariya Rahman ◽  
...  

Background: The coronavirus disease (COVID-19) pandemic has caused a significant burden of mortality and morbidity. A vaccine will be the most effective global preventive strategy to end the pandemic. Studies have maintained that exposure to negative sentiments related to vaccination on social media increase vaccine hesitancy and refusal. Despite the influence social media has on vaccination behavior, there is a lack of studies exploring the public's exposure to misinformation, conspiracy theories, and concerns on Twitter regarding a potential COVID-19 vaccination. Objective: The study aims to identify the major thematic areas about a potential COVID-19 vaccination based on the contents of Twitter data. Method: We retrieved 1,286,659 publicly available tweets posted within the timeline of July 19, 2020, to August 19, 2020, leveraging the Twint package. Following the extraction, we used Latent Dirichlet Allocation for topic modelling and identified 20 topics discussed in the tweets. We selected 4,868 tweets with the highest probability of belonging in the specific cluster and manually labeled as positive, negative, neutral, or irrelevant. The negative tweets were further assigned to a theme and subtheme based on the contentResult: The negative tweets were further categorized into 7 major themes: "safety and effectiveness,” "misinformation,” "conspiracy theories,” "mistrust of scientists and governments,” "lack of intent to get a COVID-19 vaccine,” "freedom of choice," and "religious beliefs. Negative tweets predominantly consisted of misleading statements (n=424) that immunization against coronavirus is unnecessary as the survival rate is high. The second most prevalent theme to emerge was tweets constituting safety and effectiveness related concerns (n=276) regarding the side effects of a potential vaccine developed at an unprecedented speed. Conclusion: Our findings suggest a need to formulate a large-scale vaccine communication plan that will address the safety concerns and debunk the misinformation and conspiracy theories spreading across social media platforms, increasing the public's acceptance of a COVID-19 vaccination.


Sign in / Sign up

Export Citation Format

Share Document