Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Author(s):  
V. S. Anoop ◽  
S. Asharaf ◽  
P. Deepak
Author(s):  
Youngseok Lee ◽  
Jungwon Cho

In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the efficiency of data retrieval can be improved because new information can be automatically classified and transmitted. By expanding the scope of the method to big data based web pages and improving it for application to various websites, it is expected that more effective information retrieval will be possible.


Author(s):  
Maria A. Milkova

Nowadays the process of information accumulation is so rapid that the concept of the usual iterative search requires revision. Being in the world of oversaturated information in order to comprehensively cover and analyze the problem under study, it is necessary to make high demands on the search methods. An innovative approach to search should flexibly take into account the large amount of already accumulated knowledge and a priori requirements for results. The results, in turn, should immediately provide a roadmap of the direction being studied with the possibility of as much detail as possible. The approach to search based on topic modeling, the so-called topic search, allows you to take into account all these requirements and thereby streamline the nature of working with information, increase the efficiency of knowledge production, avoid cognitive biases in the perception of information, which is important both on micro and macro level. In order to demonstrate an example of applying topic search, the article considers the task of analyzing an import substitution program based on patent data. The program includes plans for 22 industries and contains more than 1,500 products and technologies for the proposed import substitution. The use of patent search based on topic modeling allows to search immediately by the blocks of a priori information – terms of industrial plans for import substitution and at the output get a selection of relevant documents for each of the industries. This approach allows not only to provide a comprehensive picture of the effectiveness of the program as a whole, but also to visually obtain more detailed information about which groups of products and technologies have been patented.


2020 ◽  
Vol 16 (2) ◽  
pp. 83-115
Author(s):  
Mira Kim ◽  
◽  
Hye Sun Hwang ◽  
Xu Li

2019 ◽  
Vol 58 (6) ◽  
pp. 197-207
Author(s):  
Juhae Baeck ◽  
Hyungil Kwon ◽  
Mihwa Choi ◽  
Yi-Hsiu Lin

Sign in / Sign up

Export Citation Format

Share Document