scholarly journals Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

2021 ◽  
Vol 6 (1) ◽  
pp. 17
Author(s):  
Kartika Rizqi Nastiti ◽  
Ahmad Fathan Hidayatullah ◽  
Ahmad Rafie Pratama

Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Document Frequency (TF-IDF) to build topic models and employed the coherence score method to determine how many different topics there are for each year’s data. We also provided a visualization of the topic interpretation and word distribution for each topic as well as its relevance using word cloud and PyLDAvis. In the future, we expect to add more features to show the relevance and interconnections between each topic to make it even easier for researchers to use this tool in their research projects.

AI ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 179-194
Author(s):  
Nils Horn ◽  
Fabian Gampfer ◽  
Rüdiger Buchkremer

As the amount of scientific information increases steadily, it is crucial to improve fast-reading comprehension. To grasp many scientific articles in a short period, artificial intelligence becomes essential. This paper aims to apply artificial intelligence methodologies to examine broad topics such as enterprise architecture in scientific articles. Analyzing abstracts with latent dirichlet allocation or inverse document frequency appears to be more beneficial than exploring full texts. Furthermore, we demonstrate that t-distributed stochastic neighbor embedding is well suited to explore the degree of connectivity to neighboring topics, such as complexity theory. Artificial intelligence produces results that are similar to those obtained by manual reading. Our full-text study confirms enterprise architecture trends such as sustainability and modeling languages.


Author(s):  
Nur Annisa Tresnasari ◽  
Teguh Bharata Adji ◽  
Adhistya Erna Permanasari

Children are the future of the nation. All treatment and learning they get would affect their future. Nowadays, there are various kinds of social problems related to children.  To ensure the right solution to their problem, social workers usually refer to the social-child-case (SCC) documents to find similar cases in the past and adapting the solution of the cases. Nevertheless, to read a bunch of documents to find similar cases is a tedious task and needs much time. Hence, this work aims to categorize those documents into several groups according to the case type. We use topic modeling with Latent Dirichlet Allocation (LDA) approach to extract topics from the documents and classify them based on their similarities. The Coherence Score and Perplexity graph are used in determining the best model. The result obtains a model with 5 topics that match the targeted case types. The result supports the process of reusing knowledge about SCC handling that ease the finding of documents with similar cases


2019 ◽  
pp. 108-118
Author(s):  
Shushanik Sargsyan ◽  
Edita Gzoyan

Based on the Web of Science, InCites database, this article will analyze the publication output of the Transcaucasia and Baltic states in computer science research field, their citations, as well as international collaboration in the field of computer science. The obtained results demonstrate that publications on computer science from the Baltic states are nearly 4 times higher than publications from Transcaucasia. Among the Baltic states, Lithuania holds a primary position followed by Estonia and Latvia; while in the Transcaucasia, the leading position is held by Azerbaijan, and followed by Armenia and Georgia. The same picture can be seen in the case of citations on the works on computer science of the studied states. In the international collaboration framework, the European states are the most frequent collaboration countries of the Baltic States. The same tendency can be seen in the case of Georgia and Armenia, while Azerbaijan shows a dramatically different vector of scientific internationalization.


Author(s):  
Gabriel Grill

Efforts to surveil social media platforms at scale using big data techniques have recently manifested in government-funded research to predict protests following the election of President Trump. This work is part of a computer science research field focused on online “civil unrest prediction” dedicated to forecasting protests across the globe (e.g. Indonesia, Brazil and Australia). Researchers draw upon established data science techniques such as event detection/prediction, but also specific approaches for surveilling social movements are conceived. Besides furthering the academic knowledge-base on civil unrest and protests, the works in this field envision to support a variety of stakeholders with different interests such as governments, the military, law enforcement, human rights organizations and industries such as insurance and supply chain management. I analyze the recent history of civil unrest prediction on social media platforms through examining discourses, implicated actors and technological affordances as encountered in publications and other public online artifacts. In this paper I discuss different risk frames employed by researchers, concerning politics of the technology and argue for a needed public debate on the role of online protest surveillance in democratic societies.


Author(s):  
Sujatha Arun Kokatnoor ◽  
Balachandran Krishnan

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>


Author(s):  
Priyanka R. Patil ◽  
Shital A. Patil

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.


Sign in / Sign up

Export Citation Format

Share Document