scholarly journals A Novel Approach for Clone Group Mapping by Using Topic Modeling

2015 ◽  
Vol 6 (2) ◽  
pp. 11-20
Author(s):  
Ruixia Zhang ◽  
Liping Zhang ◽  
Huan Wang ◽  
Zhuo Chen
2019 ◽  
Vol 10 (3) ◽  
pp. 38-50
Author(s):  
Nouh Talal Alhindawi ◽  
Belal Abu Ata ◽  
Lana Mahmoud Obeidat ◽  
Mohammad Subhi Al-Batah ◽  
Muad Abu-Ata

In information retrieval, the accuracy of the retrieval process is mainly dependent on query terms selection; therefore, the user must choose the needed terms carefully and selectively. Traditionally, the process of selecting query terms is done manually. However, in the last two decades, a lot of research has been directed towards automating the process of choosing and enhancing query terms. In this article, a new novel approach is presented, which relies on topic modeling in query building and expansion. Two open source systems were selected to perform the experiments, results show that adding the topic's term to the user's query clearly improves its quality and thus, improves the ranking results.


2019 ◽  
Vol 06 (03) ◽  
pp. 285-309
Author(s):  
Thi Kim Thoa Ho ◽  
Quang Vu Bui ◽  
Marc Bui

In this research, we exploit a novel approach for propagation processes on a network related to textual information by using topic modeling and pretopology theory. We first introduce the textual agent’s network in which each agent represents a node which contains specific properties, particularly the agent’s interest. Agent’s interest is illustrated through the topic’s probability distribution which is estimated based on textual information using topic modeling. Based on textual agent’s network, we proposed two information diffusion models. The first model, namely Textual-Homo-IC, is an expanded model of independent cascade model in which the probability of infection is formed on homophily that is measured based on agent’s interest similarity. In addition to expressing the Textual-Homo-IC model on the static network, we also reveal it on dynamic agent’s network where there is transformation of not only the structure but also the node’s properties during the spreading process. We conducted experiments on two collected datasets from NIPS and a social network platform, Twitter, and have attained satisfactory results. On the other hand, we continue to exploit the dissemination process on a multi-relational agent’s network by integrating the pseudo-closure function from pretopology theory to the cascade model. By using pseudo-closure or stochastic pseudo-closure functions to define the set of neighbors, we can capture more complex kind of neighbors of a set. In this study, we propose the second model, namely Textual-Homo-PCM, an expanded model of pretopological cascade model, a general model for information diffusion process that can take place in more complex networks such as multi-relational networks or stochastic graphs. In Textual-Homo-PCM, pretopology theory will be applied to determine the neighborhood set on multi-relational agent’s network through pseudo-closure functions. Besides, threshold rule based on homophily will be used for activation. Experiments are implemented for simulating Textual-Homo-PCM and we obtained expected results. The work in this paper is an extended version of our paper [T. K. T. Ho, Q. V. Bui and M. Bui, Homophily independent cascade diffusion model based on textual information, in Computational Collective Intelligence, eds. N. T. Nguyen, E. Pimenidis, Z. Khan and B. Trawiski, Lecture Notes in Computer Science, Vol. 11055 (Springer International Publishing, 2018), pp. 134–145] presented in ICCCI 2018 conference.


2021 ◽  
Vol 9 (1) ◽  
pp. 1270-1282
Author(s):  
Venkateswara Rao P, A.P Siva kumar

The emerging trend in technical research is to use customer-generated data collected by community media to probe community opinion and scientific communication on employment and care issues. This review of the collected data, the launch of a question-and-answer social website, is a separate stack for exploring the key factors that influence public preferences for technical knowledge and opinions. by means of a web search engine, topic modeling, and regression data modeling, this study quantified the effect of the response textual and auxiliary functions on the number of votes received with the response. Compared to previous studies based on open estimates, the model results show that Quora users are more likely to only talk about technology. It can fail when the keywords in the query do not match the text content of large documents that contain relevant questions of existing methods, ie. CNNMF and NMF, as well as some restrictions are not enough. Also, users are often not experts and provide ambiguous queries leading to mixed results and encountering problems with existing methods. To address this problem, in this article we propose a Hadoop model, distributed using semantics, non-negative matrix factorization (HDiSANNMF), to find topics for short texts. It effectively incorporates the semantic correlations of the word context into the model, where the semantic connections between words and their context are learned by omitting the grammatical view of the corpus. The researchers are trying to reorganize the main results and present modern techniques for modeling distributed themes to address technologies and platforms with increasing attributes, as well as how much time and space it takes to generate the model. This document briefly describes the structure of public questions and answers around the world and tracks the development of the main topics Housing and employment opportunities for next generation technologies in the world in real time.


2019 ◽  
Vol 8 (2S8) ◽  
pp. 1366-1371

Topic modeling, such as LDA is considered as a useful tool for the statistical analysis of text document collections and other text-based data. Recently, topic modeling becomes an attractive researching field due to its wide applications. However, there are remained disadvantages of traditional topic modeling like as LDA due the shortcoming of bag-of-words (BOW) model as well as low-performance in handle large text corpus. Therefore, in this paper, we present a novel approach of topic model, called LDA-GOW, which is the combination of word co-occurrence, also called: graph-of-words (GOW) model and traditional LDA topic discovering model. The LDA-GOW topic model not only enable to extract more informative topics from text but also be able to leverage the topic discovering process from large-scaled text corpus. We test our proposed model in comparing with the traditional LDA topic model, within several standardized datasets, include: WebKB, Reuters-R8 and annotated scientific documents which are collected from ACM digital library to demonstrate the effectiveness of our proposed model. For overall experiments, our proposed LDA-GOW model gains approximately 70.86% in accuracy.


2016 ◽  
Vol 25 (01) ◽  
pp. 1660002 ◽  
Author(s):  
Guangbing Yang

Oft-decried information overload is a serious problem that negatively impacts the comprehension of information in the digital age. Text summarization is a helpful process that can be used to alleviate this problem. With the aim of seeking a novel method to enhance the performance of multi-document summarization, this study proposes a novel approach to analyze the problem of multi-document summarization based on a mixture model, consisting of a contextual topic model from a Bayesian hierarchical topic modeling family for selecting candidate summary sentences, and a regression model in machine learning for generating the summary. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, recognize latent topics, and arrange them hierarchically. The quantitative evaluation results from a practical application demonstrates that a system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.


Sign in / Sign up

Export Citation Format

Share Document