scholarly journals Neural methods for effective, efficient, and exposure-aware information retrieval

2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Author(s):  
Qiaozhu Mei ◽  
Dragomir Radev

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.


2018 ◽  
pp. 1307-1321
Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


2021 ◽  
Vol 4 (1) ◽  
pp. 87-89
Author(s):  
Janardan Bhatta

Searching images in a large database is a major requirement in Information Retrieval Systems. Expecting image search results based on a text query is a challenging task. In this paper, we leverage the power of Computer Vision and Natural Language Processing in Distributed Machines to lower the latency of search results. Image pixel features are computed based on contrastive loss function for image search. Text features are computed based on the Attention Mechanism for text search. These features are aligned together preserving the information in each text and image feature. Previously, the approach was tested only in multilingual models. However, we have tested it in image-text dataset and it enabled us to search in any form of text or images with high accuracy.


2020 ◽  
Author(s):  
Nikhil Ranjan Nayak

Information retrieval (IR) is the activity of obtaining information resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web Search Engines are the most visible IR applications.It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.


Author(s):  
Suruchi Chawla

This chapter explains the multi-agent system for effective information retrieval using information scent in query log mining. The precision of search results is low due to difficult to infer the information need of the small size search query and therefore information need of the user is not satisfied effectively. Information Scent is used for modeling the information need of user web search session and clustering is performed to identify the similar information need sessions. Hyper Link-Induced Topic Search (HITS) is executed on clusters to generate the Hubs and authorities for web page recommendations to users who search with similar intents. This multi-agent system based on clustered query sessions uses query operations like expansion and recommendation to infer the information need of user search queries and recommends Hubs and authorities for effective web search.


2020 ◽  
Author(s):  
Elma Kerz ◽  
Daniel Wiechmann ◽  
Felicity Frinsel ◽  
Morten H. Christiansen

A large body of research over the past two decades has demonstrated that children and adults are equipped with statistical learning mechanisms that facilitate their language processing and boost their acquisition. However, this research has been conducted primarily using artificial languages that are highly simplified relative to real language input. Here, we aimed to determine to what extent adult native and non-native speakers show sensitivity to real-life language statistics obtained from large-scale analyses of authentic language use. Through a within-subject design, we conducted a series of behavioral experiments geared towards assessing the sensitivity to two types of distributional statistics (frequency and entropy) during online processing of multiword sequences across four registers of English (spoken, fiction, news and academic language). Our results show that both native and non-native speakers are able to `tune to' multiple distributional statistics inherent in different types of real language input.


Author(s):  
Juntao Li ◽  
Ruidan He ◽  
Hai Ye ◽  
Hwee Tou Ng ◽  
Lidong Bing ◽  
...  

Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and low-resource tasks. Through training on one hundred languages and terabytes of texts, cross-lingual language models have proven to be effective in leveraging high-resource languages to enhance low-resource language processing and outperform monolingual models. In this paper, we further investigate the cross-lingual and cross-domain (CLCD) setting when a pretrained cross-lingual language model needs to adapt to new domains. Specifically, we propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features and domain-invariant features from the entangled pretrained cross-lingual representations, given unlabeled raw texts in the source language. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts. Experimental results show that our proposed method achieves significant performance improvements over the state-of-the-art pretrained cross-lingual language model in the CLCD setting.


2020 ◽  
Vol 54 (1) ◽  
pp. 1-11
Author(s):  
Avishek Anand ◽  
Lawrence Cavedon ◽  
Matthias Hagen ◽  
Hideo Joho ◽  
Mark Sanderson ◽  
...  

In the week of November 10--15, 2019, 44 researchers from the fields of information retrieval and Web search, natural language processing, human computer interaction, and dialogue systems met for the Dagstuhl Seminar 19461 "Conversational Search" to share the latest development in the area of conversational search and discuss its research agenda and future directions. The clear signal from the seminar is that research opportunities to advance conversational search are available to many areas and that collaboration in an interdisciplinary community is essential to achieve the goals. This report overviews the program and selected findings of the working groups.


Author(s):  
Suruchi Chawla

This chapter explains the multi-agent system for effective information retrieval using information scent in query log mining. The precision of search results is low due to difficult to infer the information need of the small size search query and therefore information need of the user is not satisfied effectively. Information Scent is used for modeling the information need of user web search session and clustering is performed to identify the similar information need sessions. Hyper Link-Induced Topic Search (HITS) is executed on clusters to generate the Hubs and authorities for web page recommendations to users who search with similar intents. This multi-agent system based on clustered query sessions uses query operations like expansion and recommendation to infer the information need of user search queries and recommends Hubs and authorities for effective web search.


Sign in / Sign up

Export Citation Format

Share Document