Analisis topik konten channel YouTube K-pop Indonesia menggunakan Latent Dirichlet Allocation

Teknologi ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 16-25
Author(s):  
Alfrida Rahmawati ◽  
◽  
Najla Lailin Nikmah ◽  
Reynaldi Drajat Ageng Perwira ◽  
Nur Aini Rakhmawati ◽  
...  

The development of digital technology has brought new media, one of which is Youtube, which is now one of the most widely used applications for internet users in the world. The growth of the audience which is known as viewers, is also suported by the contribution from the content creators or also known as YouTubers from Indonesian. The more the viewers grow, the more their demand for trend content are also grwoing at surprisingly speed in one of the topics which is H-pop. In this study, the author wanted to see the dominant topics that K-pop YouTubers often upload to support content creator. This research was conducted using the Latent Dirichlet Allocation method. The analysis was carried out on after using text mining on 2563 videos from 10 K-pop YouTuber accounts with more than 100,000 subscribers. To determine the optimal number of topics by looking at the value of perplexity and topic conherence. The results obtained are the top 5 topics that are the content material in the uploaded video. These topics include reactions to dance covers, unboxing on albums and conducting reviews, riddles from K-pop dances and vlogs together to discuss about covers and reactions to sounds on K-pop songs.

2019 ◽  
Vol 9 (24) ◽  
pp. 5496 ◽  
Author(s):  
Wafa Shafqat ◽  
Yung-Cheol Byun

The accelerated growth rate of internet users and its applications, primarily e-business, has accustomed people to write their comments and reviews about the product they received. These reviews are remarkably competent to shape customers’ decisions. However, in crowdfunding, where investors finance innovative ideas in exchange for some rewards or products, the comments of investors are often ignored. These comments can play a markedly significant role in helping crowdfunding platforms to battle against the bitter challenge of fraudulent activities. We take advantage of the language modeling techniques and aim to merge them with neural networks to identify some hidden discussion patterns in the comments. Our objective is to design a language modeling based neural network architecture, where Recurrent Neural Networks (RNN) Long Short-Term Memory (LSTM) is used to predict discussion trends, i.e., either towards scam or non-scam. LSTM layers are fed with latent topic distribution learned from the pre-trained Latent Dirichlet Allocation (LDA) model. In order to optimize the recommendations, we used Particle Swarm Optimization (PSO) as a baseline algorithm. This module helps investors find secure projects to invest in (with the highest chances of delivery) within their preferred categories. We used prediction accuracy, an optimal number of identified topics, and the number of epochs, as metrics of performance evaluation for the proposed approach. We compared our results with simple Neural Networks (NNs) and NN-LDA based on these performance metrics. The strengths of both integrated models suggest that the proposed model can play a substantial role in a better understanding of crowdfunding comments.


Author(s):  
Małgorzata Molęda-Zdziech

The aim of the study is to analyse the role of media in shaping of the modern man identity. I narrow my approach to the postmodern approaches of A. Giddens, M. Castells and M. Maffesoli. Those authors combine in their work changes taking place in the world of media and changes on the level of the individual identity. Based on the work of M. Maffesoli, I reconstruct the ideal type of postmodern individual identity – homo creator. Then, I describe the mediality as postmodern value and a component of postmodern identity. The study presents the results of a 2014 TNS Connected Life research report prepared on a sample of 55,000 Internet users from around the World. The results illustrate the habits in the use of traditional and new media.


2021 ◽  
Vol 13 (19) ◽  
pp. 10856
Author(s):  
I-Cheng Chang ◽  
Tai-Kuei Yu ◽  
Yu-Jie Chang ◽  
Tai-Yi Yu

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.


2020 ◽  
Vol 12 (16) ◽  
pp. 6673 ◽  
Author(s):  
Kiattipoom Kiatkawsin ◽  
Ian Sutherland ◽  
Jin-Young Kim

Airbnb has emerged as a platform where unique accommodation options can be found. Due to the uniqueness of each accommodation unit and host combination, each listing offers a one-of-a-kind experience. As consumers increasingly rely on text reviews of other customers, managers are also increasingly gaining insight from customer reviews. Thus, this present study aimed to extract those insights from reviews using latent Dirichlet allocation, an unsupervised type of topic modeling that extracts latent discussion topics from text data. Findings of Hong Kong’s 185,695 and Singapore’s 93,571 Airbnb reviews, two long-term rival destinations, were compared. Hong Kong produced 12 total topics that can be categorized into four distinct groups whereas Singapore’s optimal number of topics was only five. Topics produced from both destinations covered the same range of attributes, but Hong Kong’s 12 topics provide a greater degree of precision to formulate managerial recommendations. While many topics are similar to established hotel attributes, topics related to the host and listing management are unique to the Airbnb experience. The findings also revealed keywords used when evaluating the experience that provide more insight beyond typical numeric ratings.


2020 ◽  
pp. 016555152095467
Author(s):  
Xian Cheng ◽  
Qiang Cao ◽  
Stephen Shaoyi Liao

The unprecedented outbreak of COVID-19 is one of the most serious global threats to public health in this century. During this crisis, specialists in information science could play key roles to support the efforts of scientists in the health and medical community for combatting COVID-19. In this article, we demonstrate that information specialists can support health and medical community by applying text mining technique with latent Dirichlet allocation procedure to perform an overview of a mass of coronavirus literature. This overview presents the generic research themes of the coronavirus diseases: COVID-19, MERS and SARS, reveals the representative literature per main research theme and displays a network visualisation to explore the overlapping, similarity and difference among these themes. The overview can help the health and medical communities to extract useful information and interrelationships from coronavirus-related studies.


Author(s):  
Bambang Subeno ◽  
Retno Kusumaningrum ◽  
Farikhin Farikhin

<span lang="EN-GB">Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.</span>


Sign in / Sign up

Export Citation Format

Share Document