Information Retrieval
Latest Publications


TOTAL DOCUMENTS

473
(FIVE YEARS 51)

H-INDEX

41
(FIVE YEARS 3)

Published By Springer-Verlag

1573-7659, 1386-4564

Author(s):  
Meng Yuan ◽  
Justin Zobel ◽  
Pauline Lin

AbstractClustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods.


Author(s):  
Graham McDonald ◽  
Craig Macdonald ◽  
Iadh Ounis

AbstractProviding users with relevant search results has been the primary focus of information retrieval research. However, focusing on relevance alone can lead to undesirable side effects. For example, small differences between the relevance scores of documents that are ranked by relevance alone can result in large differences in the exposure that the authors of relevant documents receive, i.e., the likelihood that the documents will be seen by searchers. Therefore, developing fair ranking techniques to try to ensure that search results are not dominated, for example, by certain information sources is of growing interest, to mitigate against such biases. In this work, we argue that generating fair rankings can be cast as a search results diversification problem across a number of assumed fairness groups, where groups can represent the demographics or other characteristics of information sources. In the context of academic search, as in the TREC Fair Ranking Track, which aims to be fair to unknown groups of authors, we evaluate three well-known search results diversification approaches from the literature to generate rankings that are fair to multiple assumed fairness groups, e.g. early-career researchers vs. highly-experienced authors. Our experiments on the 2019 and 2020 TREC datasets show that explicit search results diversification is a viable approach for generating effective rankings that are fair to information sources. In particular, we show that building on xQuAD diversification as a fairness component can result in a significant ($$p<0.05$$ p < 0.05 ) increase (up to  50% in our experiments) in the fairness of exposure that authors from unknown protected groups receive.


Author(s):  
Mohamed Trabelsi ◽  
Zhiyu Chen ◽  
Brian D. Davison ◽  
Jeff Heflin

AbstractRanking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.


2021 ◽  
Vol 24 (4-5) ◽  
pp. 347-369
Author(s):  
Zaiqiao Meng ◽  
Richard McCreadie ◽  
Craig Macdonald ◽  
Iadh Ounis

AbstractRepresentation learning has been widely applied in real-world recommendation systems to capture the features of both users and items. Existing grocery recommendation methods only represent each user and item by single deterministic points in a low-dimensional continuous space, which limit the expressive ability of their embeddings, resulting in recommendation performance bottlenecks. In addition, existing representation learning methods for grocery recommendation only consider the items (products) as independent entities, neglecting their other valuable side information, such as the textual descriptions and the categorical data of items. In this paper, we propose the Variational Bayesian Context-Aware Representation (VBCAR) model for grocery recommendation. VBCAR is a novel variational Bayesian model that learns distributional representations of users and items by leveraging basket context information from historical interactions. Our VBCAR model is also extendable to leverage side information by encoding contextual features into representations based on the inference encoder. We conduct extensive experiments on three real-world grocery datasets to assess the effectiveness of our model as well as the impact of different construction strategies for item side information. Our results show that our VBCAR model outperforms the current state-of-the-art grocery recommendation models while integrating item side information (especially the categorical features with the textual information of items) results in further significant performance gains. Furthermore, we demonstrate through analysis that our model is able to effectively encode similarities between product types, which we argue is the primary reason for the observed effectiveness gains.


Author(s):  
Binsheng Liu ◽  
Xiaolu Lu ◽  
J. Shane Culpepper

Author(s):  
Chia-Yang Chang ◽  
Shie-Jue Lee ◽  
Chih-Hung Wu ◽  
Chih-Feng Liu ◽  
Ching-Kuan Liu

Author(s):  
Avi Arampatzis ◽  
Georgios Peikos ◽  
Symeon Symeonidis

Author(s):  
Shicheng Tan ◽  
Zhen Duan ◽  
Shu Zhao ◽  
Jie Chen ◽  
Yanping Zhang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document