scholarly journals Topic Modelling pada Sentimen Terhadap Headline Berita Online Berbahasa Indonesia Menggunakan LDA dan LSTM

2021 ◽  
Vol 5 (1) ◽  
pp. 24 ◽  
Author(s):  
Chairullah Naury ◽  
Dhomas Hatta Fudholi ◽  
Ahmad Fathan Hidayatullah

The online mass media is the source of the fastest and up-to-date information. A model that can provide mapping will help in sorting out information more precisely. In this study, the authors applied topic modeling to the results of sentiment analysis on online news headlines in Indonesian. Sources of data in this study were obtained from online mass media in Indonesian. The data collected were analyzed for sentiment using the Long Short-term Memory (LSTM) method, in order to obtain news headlines with positive, negative, and neutral sentiments. The classification obtained from the results of the sentiment analysis process is continued with the topic modeling process using the Latent Dirichlet Allocation (LDA) method and visualized in the form of wordcloud and intertopic distance map (pyLDAVis) to determine the relationship between one topic and another. The result of sentiment analysis is a model with 71.13% of accuracy level and the results of topic modeling are in the form of some topics that are easy to interpret.

Author(s):  
Puji Winar Cahyo ◽  
Muhammad Habibi

The efficiency of using social media affected modern society's nature and communication; they are more interested in talking through social media than meeting in the real world. The number of talks on social media content depends on the topic being discussed. The more topic interesting will impact the amount of data on social media will be. The data can be analyzed to get the influence of actors (account mentions) on the conversation. The power of an actor can be measured from how often the actor is mentioned in the conversation. This paper aims to conduct entity profiling on social media content to analyze an actor's influence on discussion. Furthermore, using sentiment analysis can determine the sentiment about an actor from a conversation topic. The Latent Dirichlet Allocation (LDA) method is used for analyzes topic modeling, while the Support Vector Machine (SVM) is used for sentiment analysis. This research can show that topics with positive sentiment are more likely to be involved in disaster management accounts, while topics with negative sentiment are more towards involvement in politicians, critics, and online news.


2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


2020 ◽  
Vol 44 (5) ◽  
pp. 1027-1055
Author(s):  
Thanh-Tho Quan ◽  
Duc-Trung Mai ◽  
Thanh-Duy Tran

PurposeThis paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels. Categorical influencers are important for media marketing but to automatically detect them remains a challenge.Design/methodology/approachWe deployed the emerging deep learning approaches. Precisely, we used word embedding to encode semantic information of words occurring in the common microtext of social media and used variational autoencoder (VAE) to approximate the topic modeling process, through which the active categories of influencers are automatically detected. We developed a system known as Categorical Influencer Detection (CID) to realize those ideas.FindingsThe approach of using VAE to simulate the Latent Dirichlet Allocation (LDA) process can effectively handle the task of topic modeling on the vast dataset of microtext on social media channels.Research limitations/implicationsThis work has two major contributions. The first one is the detection of topics on microtexts using deep learning approach. The second is the identification of categorical influencers in social media.Practical implicationsThis work can help brands to do digital marketing on social media effectively by approaching appropriate influencers. A real case study is given to illustrate it.Originality/valueIn this paper, we discuss an approach to automatically identify the active categories of influencers by performing topic detection from the microtext related to the influencers in social media channels. To do so, we use deep learning to approximate the topic modeling process of the conventional approaches (such as LDA).


2021 ◽  
Author(s):  
Adebayo Abayomi-Alli ◽  
Olusola Abayomi-Alli ◽  
Sanjay Misra ◽  
Luis Fernandez-Sanz

Abstract BackgroundSocial media opinion has become a medium to quickly access large, valuable, and rich details of information on any subject matter within a short period. Twitter being a social microblog site, generate over 330 million tweets monthly across different countries. Analyzing trending topics on Twitter presents opportunities to extract meaningful insight into different opinions on various issues.AimThis study aims to gain insights into the trending yahoo-yahoo topic on Twitter using content analysis of selected historical tweets.MethodologyThe widgets and workflow engine in the Orange Data mining toolbox were employed for all the text mining tasks. 5500 tweets were collected from Twitter using the 'yahoo yahoo' hashtag. The corpus was pre-processed using a pre-trained tweet tokenizer, Valence Aware Dictionary for Sentiment Reasoning (VADER) was used for the sentiment and opinion mining, Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) was used for topic modeling. In contrast, Multidimensional scaling (MDS) was used to visualize the modeled topics. ResultsResults showed that "yahoo" appeared in the corpus 9555 times, 175 unique tweets were returned after duplicate removal. Contrary to expectation, Spain had the highest number of participants tweeting on the 'yahoo yahoo' topic within the period. The result of Vader sentiment analysis returned 35.85%, 24.53%, 15.09%, and 24.53%, negative, neutral, no-zone, and positive sentiment tweets, respectively. The word yahoo was highly representative of the LDA topics 1, 3, 4, 6, and LSI topic 1.ConclusionIt can be concluded that emojis are even more representative of the sentiments in tweets faster than the textual contents. Also, despite popular belief, a significant number of youths regard cybercrime as a detriment to society.


2020 ◽  
Author(s):  
Qian Liu ◽  
Zequan Zheng ◽  
Jiabin Zheng ◽  
Qiuyi Chen ◽  
Guan Liu ◽  
...  

BACKGROUND In December 2019, a few coronavirus disease (COVID-19) cases were first reported in Wuhan, Hubei, China. Soon after, increasing numbers of cases were detected in other parts of China, eventually leading to a disease outbreak in China. As this dreadful disease spreads rapidly, the mass media has been active in community education on COVID-19 by delivering health information about this novel coronavirus, such as its pathogenesis, spread, prevention, and containment. OBJECTIVE The aim of this study was to collect media reports on COVID-19 and investigate the patterns of media-directed health communications as well as the role of the media in this ongoing COVID-19 crisis in China. METHODS We adopted the WiseSearch database to extract related news articles about the coronavirus from major press media between January 1, 2020, and February 20, 2020. We then sorted and analyzed the data using Python software and Python package Jieba. We sought a suitable topic number with evidence of the coherence number. We operated latent Dirichlet allocation topic modeling with a suitable topic number and generated corresponding keywords and topic names. We then divided these topics into different themes by plotting them into a 2D plane via multidimensional scaling. RESULTS After removing duplications and irrelevant reports, our search identified 7791 relevant news reports. We listed the number of articles published per day. According to the coherence value, we chose 20 as the number of topics and generated the topics’ themes and keywords. These topics were categorized into nine main primary themes based on the topic visualization figure. The top three most popular themes were prevention and control procedures, medical treatment and research, and global or local social and economic influences, accounting for 32.57% (n=2538), 16.08% (n=1258), and 11.79% (n=919) of the collected reports, respectively. CONCLUSIONS Topic modeling of news articles can produce useful information about the significance of mass media for early health communication. Comparing the number of articles for each day and the outbreak development, we noted that mass media news reports in China lagged behind the development of COVID-19. The major themes accounted for around half the content and tended to focus on the larger society rather than on individuals. The COVID-19 crisis has become a worldwide issue, and society has become concerned about donations and support as well as mental health among others. We recommend that future work addresses the mass media’s actual impact on readers during the COVID-19 crisis through sentiment analysis of news data.


2021 ◽  
Vol 17 (1) ◽  
pp. 37-48
Author(s):  
Nurman Ando Setianas Nugroho

This research analyzed the news quality on an Islamic online news portal in Solo, thepancaran.net, and the concern about the quality of Islamic online media in Solo became the reason for this research. This is a descriptive research using qualitative approach., andthe research data analysis used descriptive analysis. The process was carried out since the data were collected;therefore, researchers had started the data analysis process on the field until the research was complete. The analysis usedparameters, whether the news hadfulfilled the elements of news,and thus the news could be said to be in good quality, less quality, or not worthy of publication due to the code of ethics violation. These elements were news value, 5W + 1H systematic, Inverted Pyramid Systematics, News Headlines, News Lead, News Content, News Quotations, and Journalistic Code of Ethics. In the analysis, there were 7 elements fulfilled in the news onpancaran.net, therefore if there was one element that had not been fulfilled, then the news on pancaran.net could be said to be in good quality, sinceit would have been good if these 7 elements had been fulfilled. However, there was one element that was not fulfilled, which was the element of the journalistic code of ethics. It was found on this research that the pancaran.netwebsite was not recommended for online news readers in Solo due to violations of the journalistic code of ethics found in the news.


2019 ◽  
Author(s):  
Murilo C. Medeiros ◽  
Vinicius R. P. Borges

This paper describes a methodology for analyzing sentiments and for knowledge discovery in tweets regarding the Brazilian stock market. The proposed methodology starts by preprocessing and characterizing tweets to obtain an associated vector-space model. After that, a dimensionality reduction is em- ployed by using Principal Component Analysis and t-Stochastic Neighbor Embedding. Sentiment analysis of stock market tweets is performed by considering the tasks of sentiment classification, topic modeling and clustering, along with a visual analysis process. Experiments results showed satisfactory performances in single and multi-label sentiment classification scenarios. The visual analysis process also revealed interesting relationships among topics and clusters.


2020 ◽  
Author(s):  
Kai Zhang ◽  
Yuan Zhou ◽  
Zheng Chen ◽  
Yufei Liu ◽  
Zhuo Tang ◽  
...  

Abstract The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.


Sign in / Sign up

Export Citation Format

Share Document