Keyword Extraction for Social Media Short Text

Author(s):  
Dexin Zhao ◽  
Nana Du ◽  
Zhi Chang ◽  
Yukun Li
Author(s):  
Anastasios Lytos ◽  
Thomas Lagkas ◽  
Panagiotis Sarigiannidis ◽  
Vasileios Argyriou ◽  
George Eleftherakis
Keyword(s):  

2021 ◽  
pp. 1-10
Author(s):  
Wang Gao ◽  
Hongtao Deng ◽  
Xun Zhu ◽  
Yuan Fang

Harmful information identification is a critical research topic in natural language processing. Existing approaches have been focused either on rule-based methods or harmful text identification of normal documents. In this paper, we propose a BERT-based model to identify harmful information from social media, called Topic-BERT. Firstly, Topic-BERT utilizes BERT to take additional information as input to alleviate the sparseness of short texts. The GPU-DMM topic model is used to capture hidden topics of short texts for attention weight calculation. Secondly, the proposed model divides harmful short text identification into two stages, and different granularity labels are identified by two similar sub-models. Finally, we conduct extensive experiments on a real-world social media dataset to evaluate our model. Experimental results demonstrate that our model can significantly improve the classification performance compared with baseline methods.


2019 ◽  
Vol 127 ◽  
pp. 113142 ◽  
Author(s):  
Cecil Eng Huang Chua ◽  
Veda C. Storey ◽  
Xiaolin Li ◽  
Mala Kaul

2020 ◽  
Vol 4 (1) ◽  
pp. 18
Author(s):  
Sozan Abdulla Mahmood ◽  
Qani Qabil Qasim

With the rapid evolution of the internet, using social media networks such as Twitter, Facebook, and Tumblr, is becoming so common that they have made a great impact on every aspect of human life. Twitter is one of the most popular micro-blogging social media that allow people to share their emotions in short text about variety of topics such as company’s products, people, politics, and services. Analyzing sentiment could be possible as emotions and reviews on different topics are shared every second, which makes social media to become a useful source of information in different fields such as business, politics, applications, and services. Twitter Application Programming Interface (Twitter-API), which is an interface between developers and Twitter, allows them to search for tweets based on the desired keyword using some secret keys and tokens. In this work, Twitter-API used to download the most recent tweets about four keywords, namely, (Trump, Bitcoin, IoT, and Toyota) with a different number of tweets. “Vader” that is a lexicon rule-based method used to categorize downloaded tweets into “Positive” and “Negative” based on their polarity, then the tweets were protected in Mongo database for the next processes. After pre-processing, the hold-out technique was used to split each dataset to 80% as “training-set” and rest 20% “testing-set.” After that, a deep learning-based Document to Vector model was used for feature extraction. To perform the classification task, Radial Bias Function kernel-based support vector machine (SVM) has been used. The accuracy of (RBF-SVM) mainly depends on the value of hyperplane “Soft Margin” penalty “C” and γ “gamma” parameters. The main goal of this work is to select best values for those parameters in order to improve the accuracy of RBF-SVM classifier. The objective of this study is to show the impacts of using four meta-heuristic optimizer algorithms, namely, particle swarm optimizer (PSO), modified PSO (MPSO), grey wolf optimizer (GWO), and hybrid of PSO-GWO in improving SVM classification accuracy by selecting the best values for those parameters. To the best of our knowledge, hybrid PSO-GWO has never been used in SVM optimization. The results show that these optimizers have a significant impact on increasing SVM accuracy. The best accuracy of the model with traditional SVM was 87.885%. After optimization, the highest accuracy obtained with GWO is 91.053% while PSO, hybrid PSO-GWO, and MPSO best accuracies are 90.736%, 90.657%, and 90.557%, respectively.


2020 ◽  
Author(s):  
Qixuan Hou ◽  
Meng Han ◽  
Feiyang Qu

Abstract Social media has been broadly applied in many applications in sales, marketing, event detection, etc. With high-volume and real-time data, social media has also been used for disaster responses. However, distinguishing between rumors and reliable information can be challenging, since social media, a user-generated content system, has a great number of users who update massive information every second. Furthermore, the rich information is not only included in the short text content but also embedded in the images, videos. In this paper, to address the emerging challenge of disaster response, we introduce a reliable framework for disaster information understanding and response with a practice on Twitter. The framework integrates both textual and imagery content from tweets in hope to fully utilize the information. The text classifier is built to remove noises, which can achieve 0.92 F1-score in classifying individual tweet. The image classifier is constructed by fine-tuning pre-trained VGG-F network, which can achieve 90\% accuracy. The image classifier serves as a verifier in the pipeline to reject or confirm the detected events. The evaluation indicates that the verifier can significantly reduce false positive events. We also explore Twitter-based drought management system and infrastructure monitoring system to further study the impacts of imagery content on event detection systems and we are able to pinpoint additional benefits which can be gained from social media imagery content.


Author(s):  
Sucipto Sucipto ◽  
Aditya Gusti Tammam ◽  
Rini Indriati

<p style="text-indent: 0.36cm; margin-top: 0.04cm; margin-bottom: 0cm; line-height: 100%;" align="justify"><em>Hoax is a current issue that is troubling the public and causes riot in various fields, ranging from politics, culture, security and order, to economics. This problem cannot be separated from the impact of rapid use of social media. As a result, every day there are thousands of information spread on social media, which is not necessarily valid, so that people are potentially exposed to hoax on social media. The hoax detection system in this study was designed with an Unsupervised Learning approach so that it did not require data training. The system is built using the Text Rank algorithm for keyword extraction and the Cosine Similarity algorithm to calculate the level of document similarity. The keyword extraction results will be used to search for content related to input from users using the search engine, then calculate the similarity value. If the related content tends to come from trusted media, then the content is potentially factual. Likewise, if the related content tends to be published by unreliable media, then there is the potential for hoax. The hoax detection system has been tested using confusion matrix, from 20 news content data consisting of 10 correct issues and 10 wrong issues. Then the system produces a classification with details of 13 issues including wrong and 7 issues including true, then the number of classifications that match the original label are 15 issues. Based on the results of the classification, an accuracy value of 75% was obtained.</em></p>


Sign in / Sign up

Export Citation Format

Share Document