Information Retrieval

Measurement of clustering effectiveness for document collections

Information Retrieval ◽

10.1007/s10791-021-09401-8 ◽

2022 ◽

Author(s):

Meng Yuan ◽

Justin Zobel ◽

Pauline Lin

Keyword(s):

Information Retrieval ◽

Measurement Techniques ◽

High Dimensionality ◽

Clustering Methods ◽

Clustering Method ◽

Similar Material ◽

Document Collections ◽

Clustering Techniques

AbstractClustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods.

Download Full-text

Search results diversification for effective fair ranking in academic search

Information Retrieval ◽

10.1007/s10791-021-09399-z ◽

2021 ◽

Author(s):

Graham McDonald ◽

Craig Macdonald ◽

Iadh Ounis

Keyword(s):

Information Retrieval ◽

Side Effects ◽

Information Sources ◽

Early Career ◽

Search Results ◽

Early Career Researchers ◽

Primary Focus ◽

Viable Approach ◽

Fair Ranking ◽

Academic Search

AbstractProviding users with relevant search results has been the primary focus of information retrieval research. However, focusing on relevance alone can lead to undesirable side effects. For example, small differences between the relevance scores of documents that are ranked by relevance alone can result in large differences in the exposure that the authors of relevant documents receive, i.e., the likelihood that the documents will be seen by searchers. Therefore, developing fair ranking techniques to try to ensure that search results are not dominated, for example, by certain information sources is of growing interest, to mitigate against such biases. In this work, we argue that generating fair rankings can be cast as a search results diversification problem across a number of assumed fairness groups, where groups can represent the demographics or other characteristics of information sources. In the context of academic search, as in the TREC Fair Ranking Track, which aims to be fair to unknown groups of authors, we evaluate three well-known search results diversification approaches from the literature to generate rankings that are fair to multiple assumed fairness groups, e.g. early-career researchers vs. highly-experienced authors. Our experiments on the 2019 and 2020 TREC datasets show that explicit search results diversification is a viable approach for generating effective rankings that are fair to information sources. In particular, we show that building on xQuAD diversification as a fairness component can result in a significant ($$p<0.05$$ p < 0.05 ) increase (up to 50% in our experiments) in the fairness of exposure that authors from unknown protected groups receive.

Download Full-text

Neural ranking models for document retrieval

Information Retrieval ◽

10.1007/s10791-021-09398-0 ◽

2021 ◽

Author(s):

Mohamed Trabelsi ◽

Zhiyu Chen ◽

Brian D. Davison ◽

Jeff Heflin

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Document Retrieval ◽

Machine Learning Algorithms ◽

Future Research ◽

Learning Models ◽

Ranking Models ◽

Information Retrieval Systems ◽

Main Components ◽

Ranking Tasks

AbstractRanking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.

Download Full-text

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

Information Retrieval ◽

10.1007/s10791-021-09396-2 ◽

2021 ◽

Author(s):

Faiza Dammak ◽

Hager Kammoun

Keyword(s):

Active Learning ◽

Learning To Rank ◽

Document Retrieval

Download Full-text

Variational Bayesian representation learning for grocery recommendation

Information Retrieval ◽

10.1007/s10791-021-09397-1 ◽

2021 ◽

Vol 24 (4-5) ◽

pp. 347-369

Author(s):

Zaiqiao Meng ◽

Richard McCreadie ◽

Craig Macdonald ◽

Iadh Ounis

Keyword(s):

Real World ◽

Side Information ◽

Representation Learning ◽

Continuous Space ◽

Variational Bayesian ◽

Current State ◽

Significant Performance ◽

Low Dimensional ◽

Performance Gains ◽

The Impact

AbstractRepresentation learning has been widely applied in real-world recommendation systems to capture the features of both users and items. Existing grocery recommendation methods only represent each user and item by single deterministic points in a low-dimensional continuous space, which limit the expressive ability of their embeddings, resulting in recommendation performance bottlenecks. In addition, existing representation learning methods for grocery recommendation only consider the items (products) as independent entities, neglecting their other valuable side information, such as the textual descriptions and the categorical data of items. In this paper, we propose the Variational Bayesian Context-Aware Representation (VBCAR) model for grocery recommendation. VBCAR is a novel variational Bayesian model that learns distributional representations of users and items by leveraging basket context information from historical interactions. Our VBCAR model is also extendable to leverage side information by encoding contextual features into representations based on the inference encoder. We conduct extensive experiments on three real-world grocery datasets to assess the effectiveness of our model as well as the impact of different construction strategies for item side information. Our results show that our VBCAR model outperforms the current state-of-the-art grocery recommendation models while integrating item side information (especially the categorical features with the textual information of items) results in further significant performance gains. Furthermore, we demonstrate through analysis that our model is able to effectively encode similarities between product types, which we argue is the primary reason for the observed effectiveness gains.

Download Full-text

Strong natural language query generation

Information Retrieval ◽

10.1007/s10791-021-09395-3 ◽

2021 ◽

Author(s):

Binsheng Liu ◽

Xiaolu Lu ◽

J. Shane Culpepper

Keyword(s):

Natural Language ◽

Natural Language Query ◽

Query Generation

Download Full-text

Using word semantic concepts for plagiarism detection in text documents

Information Retrieval ◽

10.1007/s10791-021-09394-4 ◽

2021 ◽

Author(s):

Chia-Yang Chang ◽

Shie-Jue Lee ◽

Chih-Hung Wu ◽

Chih-Feng Liu ◽

Ching-Kuan Liu

Keyword(s):

Plagiarism Detection ◽

Text Documents ◽

Semantic Concepts

Download Full-text

Pseudo relevance feedback optimization

Information Retrieval ◽

10.1007/s10791-021-09393-5 ◽

2021 ◽

Author(s):

Avi Arampatzis ◽

Georgios Peikos ◽

Symeon Symeonidis

Keyword(s):

Relevance Feedback ◽

Pseudo Relevance Feedback

Download Full-text

Algorithmic copywriting: automated generation of health-related advertisements to improve their performance

Information Retrieval ◽

10.1007/s10791-021-09392-6 ◽

2021 ◽

Author(s):

Brit Youngmann ◽

Elad Yom-Tov ◽

Ran Gilad-Bachrach ◽

Danny Karmon

Keyword(s):

Automated Generation ◽

Health Related

Download Full-text

Improved reviewer assignment based on both word and semantic features

Information Retrieval ◽

10.1007/s10791-021-09390-8 ◽

2021 ◽

Author(s):

Shicheng Tan ◽

Zhen Duan ◽

Shu Zhao ◽

Jie Chen ◽

Yanping Zhang

Keyword(s):

Semantic Features

Download Full-text

Information Retrieval
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer-Verlag

Measurement of clustering effectiveness for document collections

Search results diversification for effective fair ranking in academic search

Neural ranking models for document retrieval

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

Variational Bayesian representation learning for grocery recommendation

Strong natural language query generation

Using word semantic concepts for plagiarism detection in text documents

Pseudo relevance feedback optimization

Algorithmic copywriting: automated generation of health-related advertisements to improve their performance

Improved reviewer assignment based on both word and semantic features

Export Citation Format

Information RetrievalLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer-Verlag

Measurement of clustering effectiveness for document collections

Search results diversification for effective fair ranking in academic search

Neural ranking models for document retrieval

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

Variational Bayesian representation learning for grocery recommendation

Strong natural language query generation

Using word semantic concepts for plagiarism detection in text documents

Pseudo relevance feedback optimization

Algorithmic copywriting: automated generation of health-related advertisements to improve their performance

Improved reviewer assignment based on both word and semantic features

Information Retrieval
Latest Publications