Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Kartika Rizqi Nastiti; Ahmad Fathan Hidayatullah; Ahmad Rafie Pratama

doi:10.15575/join.v6i1.636

Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Jurnal Online Informatika ◽

10.15575/join.v6i1.636 ◽

2021 ◽

Vol 6 (1) ◽

pp. 17

Author(s):

Kartika Rizqi Nastiti ◽

Ahmad Fathan Hidayatullah ◽

Ahmad Rafie Pratama

Keyword(s):

Latent Dirichlet Allocation ◽

Science Research ◽

Research Field ◽

Time Range ◽

Word Cloud ◽

Document Frequency ◽

Coherence Score ◽

Computer Science Research ◽

Score Method ◽

Dirichlet Allocation

Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.

Download Full-text

WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classification Based on Latent Dirichlet Allocation

TELKOMNIKA (Telecommunication Computing Electronics and Control) ◽

10.12928/telkomnika.v16i4.8194 ◽

2018 ◽

Vol 16 (4) ◽

pp. 1752

Author(s):

Retno Kusumaningrum ◽

Satriyo Adhy ◽

Suryono Suryono

Keyword(s):

Latent Dirichlet Allocation ◽

Word Cloud ◽

Dirichlet Allocation

Download Full-text

Word Cloud Analytics of the Computer Science Research Publications’ Titles over the Past Half Century

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245141 ◽

2020 ◽

Author(s):

L. Abazi-Bexheti ◽

A. Kadriu ◽

M. Apostolova

Keyword(s):

Computer Science ◽

Science Research ◽

Half Century ◽

The Past ◽

Word Cloud ◽

Research Publications ◽

Past Half Century ◽

Computer Science Research ◽

Past Half

Download Full-text

Latent Dirichlet Allocation and t-Distributed Stochastic Neighbor Embedding Enhance Scientific Reading Comprehension of Articles Related to Enterprise Architecture

AI ◽

10.3390/ai2020011 ◽

2021 ◽

Vol 2 (2) ◽

pp. 179-194

Author(s):

Nils Horn ◽

Fabian Gampfer ◽

Rüdiger Buchkremer

Keyword(s):

Artificial Intelligence ◽

Reading Comprehension ◽

Enterprise Architecture ◽

Latent Dirichlet Allocation ◽

Scientific Information ◽

Modeling Languages ◽

Document Frequency ◽

Short Period ◽

Text Study ◽

Dirichlet Allocation

As the amount of scientific information increases steadily, it is crucial to improve fast-reading comprehension. To grasp many scientific articles in a short period, artificial intelligence becomes essential. This paper aims to apply artificial intelligence methodologies to examine broad topics such as enterprise architecture in scientific articles. Analyzing abstracts with latent dirichlet allocation or inverse document frequency appears to be more beneficial than exploring full texts. Furthermore, we demonstrate that t-distributed stochastic neighbor embedding is well suited to explore the degree of connectivity to neighboring topics, such as complexity theory. Artificial intelligence produces results that are similar to those obtained by manual reading. Our full-text study confirms enterprise architecture trends such as sustainability and modeling languages.

Download Full-text

Social-Child-Case Document Clustering based on Topic Modeling using Latent Dirichlet Allocation

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.54507 ◽

2020 ◽

Vol 14 (2) ◽

pp. 179

Author(s):

Nur Annisa Tresnasari ◽

Teguh Bharata Adji ◽

Adhistya Erna Permanasari

Keyword(s):

Social Workers ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

The Past ◽

Coherence Score ◽

The Social ◽

Case Type ◽

The Right ◽

All Treatment ◽

Dirichlet Allocation

Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children. To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases

Download Full-text

Computer Science in Transcaucasia and Baltic States: a Comparative Bibliometric Analysis

Mathematical Problems of Computer Science ◽

10.51408/1963-0038 ◽

2019 ◽

pp. 108-118

Author(s):

Shushanik Sargsyan ◽

Edita Gzoyan

Keyword(s):

Computer Science ◽

International Collaboration ◽

Publication Output ◽

Science Research ◽

Research Field ◽

Baltic States ◽

Primary Position ◽

Computer Science Research ◽

Leading Position ◽

The Baltic

Based on the Web of Science, InCites database, this article will analyze the publication output of the Transcaucasia and Baltic states in computer science research field, their citations, as well as international collaboration in the field of computer science. The obtained results demonstrate that publications on computer science from the Baltic states are nearly 4 times higher than publications from Transcaucasia. Among the Baltic states, Lithuania holds a primary position followed by Estonia and Latvia; while in the Transcaucasia, the leading position is held by Azerbaijan, and followed by Armenia and Georgia. The same picture can be seen in the case of citations on the works on computer science of the studied states. In the international collaboration framework, the European states are the most frequent collaboration countries of the Baltic States. The same tendency can be seen in the case of Georgia and Armenia, while Azerbaijan shows a dramatically different vector of scientific internationalization.

Download Full-text

RESEARCHING ONLINE LABOR STRIKE AND PROTEST PREDICTION TECHNOLOGIES

AoIR Selected Papers of Internet Research ◽

10.5210/spir.v2020i0.11222 ◽

2020 ◽

Author(s):

Gabriel Grill

Keyword(s):

Social Media ◽

Data Science ◽

Science Research ◽

Research Field ◽

Civil Unrest ◽

Academic Knowledge ◽

Computer Science Research ◽

Social Media Platforms ◽

Online Labor ◽

Military Law

Efforts to surveil social media platforms at scale using big data techniques have recently manifested in government-funded research to predict protests following the election of President Trump. This work is part of a computer science research field focused on online “civil unrest prediction” dedicated to forecasting protests across the globe (e.g. Indonesia, Brazil and Australia). Researchers draw upon established data science techniques such as event detection/prediction, but also specific approaches for surveilling social movements are conceived. Besides furthering the academic knowledge-base on civil unrest and protests, the works in this field envision to support a variety of stakeholders with different interests such as governments, the military, law enforcement, human rights organizations and industries such as insurance and supply chain management. I analyze the recent history of civil unrest prediction on social media platforms through examining discourses, implicated actors and technological affordances as encountered in publications and other public online artifacts. In this paper I discuss different risk frames employed by researchers, concerning politics of the technology and argue for a needed public debate on the role of online protest surveillance in democratic societies.

Download Full-text

Root cause analysis of COVID-19 cases by enhanced text mining process

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1807-1817 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1807

Author(s):

Sujatha Arun Kokatnoor ◽

Balachandran Krishnan

Keyword(s):

Dirichlet Process ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Learning Approaches ◽

The Public ◽

Root Cause ◽

Document Frequency ◽

Coherence Score ◽

Index Value

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>

Download Full-text

Evaluation of Text Semantic Features using Latent Dirichlet Allocation Model

International Journal of Performability Engineering ◽

10.23940/ijpe.20.06.p15.968978 ◽

2020 ◽

Vol 16 (6) ◽

pp. 968

Author(s):

Zhou Chunjie ◽

Li Nao ◽

Zhang Chi ◽

Yang Xiaoyu

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Features ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text