A method of query expansion based on topic models and user profile for search in folksonomy

2021 ◽  
pp. 1-11
Author(s):  
Zhinan Gou ◽  
Yan Li

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.

2016 ◽  
Vol 68 (4) ◽  
pp. 448-477 ◽  
Author(s):  
Dong Zhou ◽  
Séamus Lawless ◽  
Xuan Wu ◽  
Wenyu Zhao ◽  
Jianxun Liu

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value – Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.


Author(s):  
Max Chevalier ◽  
Christine Julien ◽  
Chantal Soulé-Dupuy

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.


2019 ◽  
Vol 48 (4) ◽  
pp. 626-636
Author(s):  
Bo Xu ◽  
Hongfei Lin ◽  
Yuan Lin ◽  
Kan Xu ◽  
Lin Wang ◽  
...  

Microblog information retrieval has attracted much attention of researchers to capture the desired information in daily communications on social networks. Since the contents of microblogs are always non-standardized and flexible, including many popular Internet expressions, the retrieval accuracy of microblogs has much room for improvement. To enhance microblog information retrieval, we propose a novel query expansion method to enrich user queries with semantic word representations. In our method, we use a neural network model to map each word in the corpus to a low-dimensional vector representation. The mapped word vectors satisfy the algebraic vector addition operation, and the new vector obtained by the addition operation can express some common attributes of the two words. In this sense, we represent keywords in user queries as vectors, sum all the keyword vectors, and use the obtained query vectors to select the expansion words. In addition, we also combine the traditional pseudo-relevance feedback query expansion method with the proposed query expansion method. Experimental results show that the proposed method is effective and reduces noises in the expanded query, which improves the accuracy of microblog retrieval.


2012 ◽  
Vol 3 (1) ◽  
pp. 18-30 ◽  
Author(s):  
Jiangning Wu ◽  
Yunfei Shi ◽  
Chonghui Guo

Collaborative tagging has been very popular with the development of the Web 2.0, which helps users manage, share and utilize resources effectively. For various kinds of resources, the way to recommend appropriate resources to right users is the key problem in tagging system. This paper proposes a user taste diffusion model based on the tripartite hypergraph to deal with the tri-relation of user-resource-tag in folksonomies and the data sparsity problem in personalized recommendation. Through the defined tri-relation model and diffusion probability matrix, the user’s taste is diffused from itself to other users, resources and tags. When diffusion stops, the candidate resources can be identified then be ranked according to the taste values. As a result the top resources that have not been collected by the given user are selected as the final recommendations. Benefiting from the introduction of iterative diffusion mechanism, the recommendation results not only cover the resources collected by the given user’s direct neighbors but also cover the ones which are collected by his/her extended neighbors. Experimental results show that our method performs better in terms of precision and recall than other recommendation methods.


2018 ◽  
Vol 15 (2) ◽  
pp. 595-600
Author(s):  
R. Sathish Kumar ◽  
M. Chandrasekaran

Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.


2016 ◽  
Vol 40 (7) ◽  
pp. 1054-1070 ◽  
Author(s):  
Shihchieh Chou ◽  
Zhangting Dai

Purpose Conventional studies mainly classify a term’s appearance in the retrieved documents as either relevant or irrelevant for application. The purpose of this paper is to differentiate the term’s appearances in the retrieved documents in more detailed situations to generate relevance information and demonstrate the applicability of the derived information in combination with current methods of query expansion. Design/methodology/approach A method was designed first to utilize the derived information owing to term appearance differentiation within a conventional query expansion approach that has been proven as an effective technology in the enhancement of information retrieval. Then, an information retrieval system was developed to demonstrate the realization and sustain the study of the method. Formal tests were conducted to examine the distinguishing capability of the proposed information utilized in the method. Findings The experimental results show that substantial differences in performances can be achieved between the proposed method and the conventional query expansion method alone. Practical implications Since the proposed information resides at the bottom of the information hierarchy of relevance feedback, any technology regarding the application of relevance feedback information could consider the utilization of this piece of information. Originality/value The importance of the study is the disclosure of the applicability of the proposed information beyond current usage of term appearances in relevant/irrelevant documents and the initiation of a query expansion technology in the application of this information.


2014 ◽  
Vol 667 ◽  
pp. 277-285 ◽  
Author(s):  
Fang Chen ◽  
Yan Hui Zhou

With the rapid development of Internet, tag technology has been widely used in various sites. The brief text labels of network resources are greatly convenient for people to access the massive data. Social tags allows the user to use any word ----to tag network objects, and to share these tags, because of its simple and flexible operation, and it has become one of the popular applications. However, there exists some problems like noise of tags, lack of using criteria, and sparse distribution etc. Especially sparsity of tags seriously limits its application in the semantic analysis of web pages. This paper, by exploiting the user-related tag expansion method to overcome this problem, at the same time by using the topic model----LDA to model the web tags, mine its potential topic from the large-scale web page, and obtain the topic distribution of the text to the text clustering analysis. The experimental results show that, compared with the traditional clustering algorithm, the method of based LDA clustering on the analysis of the web tags have a larger increase.


Author(s):  
Aicha Ghoulam ◽  
Fatiha Barigou ◽  
Ghalem Belalem ◽  
Farid Meziane

This article describes how many users' queries contain references to named entities, and this is particularly true in the medical field. Doctors express their information needs using medical entities as they are element rich with information that helps better target relevant documents. At the same time, many resources have been recognized as a large container of medical entities and relationships between them such as clinical reports; which are medical texts written by doctors. In this article, the authors present a query expansion method that uses medical entities and their semantic relations in the query context based on an external resource in OWL. The goal of this method is to evaluate the effectiveness of an information retrieval system to support doctors in accessing easily relevant information. Experiments on a collection of real clinical reports show that their approach reveals interesting improvements in precision, recall and MAP in medical information retrieval.


Sign in / Sign up

Export Citation Format

Share Document