Cluster-discovery of twitter messages for event detection and trending.

10.32920/ryerson.14663040 ◽

2021 ◽

Author(s):

Shakira Banu Kaleel

Keyword(s):

Event Detection ◽

Large Scale ◽

Feature Vector ◽

Research Work ◽

Locality Sensitive Hashing ◽

Social Media Data ◽

Speed Up ◽

Event Based ◽

Media Data ◽

Cluster Quality

Social media data carries abundant hidden occurrences of real-time events in the world which raises the demand for efficient event detection and trending system. The Locality Sensitive Hashing (LSH) technique is capable of processing the large-scale big datasets. In this thesis, a novel framework is proposed for detecting and trending events from tweet clusters presence in Twitter1 dataset that are discovered using LSH. The experimental results obtained from this research work showed that the LSH technique took only 12.99% of the running time compared to that required for K-means to find all of the tweet clusters. Key challenges include: 1) construction of dictionary using incremental TF-IDF in high-dimensional data in order to create tweet feature vector 2) leveraging LSH to find truly interesting events 3) trending the behavior of event based on time, geo-locations and cluster size and 4) speed-up the cluster-discovery process while retaining the cluster quality.

Download Full-text

Real-time spatio-temporal event detection on geotagged social media

Journal Of Big Data ◽

10.1186/s40537-021-00482-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yasmeen George ◽

Shanika Karunasekera ◽

Aaron Harwood ◽

Kwan Hui Lim

Keyword(s):

New York ◽

Social Media ◽

Event Detection ◽

Detection System ◽

Time And Space ◽

Social Media Data ◽

Event Time ◽

Spatio Temporal ◽

Geographical Space ◽

Media Data

AbstractA key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text

Learning Topic Map from Large Scale Social Media Data

Companion Proceedings of the Web Conference 2020 ◽

10.1145/3366424.3382088 ◽

2020 ◽

Author(s):

Hui-Kuo Yang

Keyword(s):

Social Media ◽

Large Scale ◽

Social Media Data ◽

Topic Map ◽

Media Data

Download Full-text

Semantic-Aware Visual Abstraction of Large-Scale Social Media Data With Geo-Tags

IEEE Access ◽

10.1109/access.2019.2935471 ◽

2019 ◽

Vol 7 ◽

pp. 114851-114861 ◽

Cited By ~ 1

Author(s):

Zhiguang Zhou ◽

Xinlong Zhang ◽

Xiaoyun Zhou ◽

Yuhua Liu

Keyword(s):

Social Media ◽

Large Scale ◽

Social Media Data ◽

Media Data

Download Full-text

Who’s Tweeting About the President? What Big Survey Data Can Tell Us About Digital Traces?

Social Science Computer Review ◽

10.1177/0894439318822007 ◽

2019 ◽

Vol 38 (5) ◽

pp. 633-650 ◽

Cited By ~ 2

Author(s):

Josh Pasek ◽

Colleen A. McClain ◽

Frank Newport ◽

Stephanie Marken

Keyword(s):

Social Media ◽

Survey Data ◽

Large Scale ◽

Presidential Approval ◽

Social Phenomena ◽

Social Media Data ◽

Complex Picture ◽

Large Scale Survey ◽

Media Data

Researchers hoping to make inferences about social phenomena using social media data need to answer two critical questions: What is it that a given social media metric tells us? And who does it tell us about? Drawing from prior work on these questions, we examine whether Twitter sentiment about Barack Obama tells us about Americans’ attitudes toward the president, the attitudes of particular subsets of individuals, or something else entirely. Specifically, using large-scale survey data, this study assesses how patterns of approval among population subgroups compare to tweets about the president. The findings paint a complex picture of the utility of digital traces. Although attention to subgroups improves the extent to which survey and Twitter data can yield similar conclusions, the results also indicate that sentiment surrounding tweets about the president is no proxy for presidential approval. Instead, after adjusting for demographics, these two metrics tell similar macroscale, long-term stories about presidential approval but very different stories at a more granular level and over shorter time periods.

Download Full-text

Fad or Here to Stay: Predicting Product Market Adoption and Longevity Using Large Scale, Social Media Data

Volume 2B: 33rd Computers and Information in Engineering Conference ◽

10.1115/detc2013-12661 ◽

2013 ◽

Cited By ~ 20

Author(s):

Suppawong Tuarob ◽

Conrad S. Tucker

Keyword(s):

Social Media ◽

Knowledge Discovery ◽

Large Scale ◽

Product Information ◽

Product Market ◽

Knowledge Discovery In Databases ◽

Product Reviews ◽

Product Model ◽

Social Media Data ◽

Media Data

The authors of this work propose a Knowledge Discovery in Databases (KDD) model for predicting product market adoption and longevity using large scale, social media data. Social media data, available through sites such as Twitter® and Facebook®, have been shown to be leading indicators and predictors of events ranging from influenza spread, financial stock market prices, and movie revenues. Being ubiquitous and colloquial in nature allows users to honestly express their opinions in a unified, dynamic manner. This makes social media a relatively new data gathering source that can potentially appeal to designers and enterprise decision makers aiming to understand consumers response to their upcoming/newly launched products. Existing design methodologies for leveraging large scale data have traditionally relied on product reviews available on the internet to mine product information. However, such web reviews often come from disparate sources, making the aggregation and knowledge discovery process quite cumbersome, especially reviews for poorly received products. Furthermore, such web reviews have not been shown to be strong indicators of new product market adoption. In this paper, the authors demonstrate how social media can be used to predict and mine information relating to product features, product competition and market adoption. In particular, the authors analyze the sentiment in tweets and use the results to predict product sales. The authors present a mathematical model that can quantify the correlations between social media sentiment and product market adoption in an effort to compute the ability to stay in the market of individual products. The proposed technique involves computing the Subjectivity, Polarity, and Favorability of the product. Finally, the authors utilize Information Retrieval techniques to mine users’ opinions about strong, weak, and controversial features of a given product model. The authors evaluate their approaches using the real-world smartphone data, which are obtained from www.statista.com and www.gsmarena.com.

Download Full-text