Proactive Information Retrieval: Anticipating Users’ Information Need

Author(s):  
Sumit Bhatia ◽  
Debapriyo Majumdar ◽  
Nitish Aggarwal
2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


2021 ◽  
Vol 20 (4) ◽  
pp. 50-64
Author(s):  
Bissan Audeh ◽  
Michel Beigbeder ◽  
Christine Largeron ◽  
Diana Ramírez-Cifuentes

Digital libraries have become an essential tool for researchers in all scientific domains. With almost unlimited storage capacities, current digital libraries hold a tremendous number of documents. Though some efforts have been made to facilitate access to documents relevant to a specific information need, such a task remains a real challenge for a new researcher. Indeed neophytes do not necessarily use appropriate keywords to express their information need and they might not be qualified enough to evaluate correctly the relevance of documents retrieved by the system. In this study, we suppose that to better meet the needs of neophytes, the information retrieval system in a digital library should take into consideration features other than content-based relevance. To test this hypothesis, we use machine learning methods and build new features from several metadata related to documents. More precisely, we propose to consider as features for machine learning: content-based scores, scores based on the citation graph and scores based on metadata extracted from external resources. As acquiring such features is not a trivial task, we analyze their usefulness and their capacity to detect relevant documents. Our analysis concludes that the use of these additional features improves the performance of the system for a neophyte. In fact, by adding the new features we find more documents suitable for neophytes within the results returned by the system than when using content-based features alone.


Author(s):  
Qiaozhu Mei ◽  
Dragomir Radev

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.


2017 ◽  
Vol 10 (2) ◽  
pp. 311-325
Author(s):  
Suruchi Chawla

The main challenge for effective web Information Retrieval(IR) is to infer the information need from user’s query and retrieve relevant documents. The precision of search results is low due to vague and imprecise user queries and hence could not retrieve sufficient relevant documents. Fuzzy set based query expansion deals with imprecise and vague queries for inferring user’s information need. Trust based web page recommendations retrieve search results according to the user’s information need. In this paper an algorithm is designed for Intelligent Information Retrieval using hybrid of Fuzzy set and Trust in web query session mining to perform Fuzzy query expansion for inferring user’s information need and trust is used for recommendation of web pages according to the user’s information need. Experiment was performed on the data set collected in domains Academics, Entertainment and Sports and search results confirm the improvement of precision.


Author(s):  
Iris Xie

The nature of information retrieval (IR) is interaction. However, the traditional IR model only focuses on the comparison between user input and system output. It does not illustrate the changeable interaction process (Saracevic, 1997). The human involvement of IR makes the process complicated and dynamic. Belkin (1993) further identified the two underlying assumptions of the traditional IR view: (1) The information need is static, and can be specified; and (2) there is only one form of information-seeking behavior. The limitations of the traditional IR model are becoming more evident. In the 1990s researchers started to develop interactive IR models. Among them, Ingwersen’s cognitive model (1992, 1996), Belkin’s episode model of interaction with texts (1996), and Saracevic’s stratified model (1996a, 1997) are the most cited ones.


10.2196/12621 ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. e12621
Author(s):  
Alvet Miranda ◽  
Shah Jahan Miah

Background Practicing evidence-based health care is challenging because of overwhelming results presented to practitioners by Google-like Web-scale discovery (WSD) services that index millions of resources while retrieving information based on relevancy algorithms with limited consideration for user information need. Objective On the basis of the user-oriented theory of information need and following design science principles, this study aimed to develop and evaluate an innovative contextual model for information retrieval from WSD services to improve evidence-based practice (EBP) by health care practitioners. Methods We identified problems from literature to support real-world requirements for this study. We used design science research methodology to guide artefact design. We iteratively improved prototype of the context model using artificial formative evaluation. We performed naturalistic summative evaluation using convergent interviewing of health care practitioners and content analysis from a confirmatory focus group consisting of health researchers to evaluate the model’s validity and utility. Results The study iteratively designed and applied the context model to a WSD service to meet 5 identified requirements. All 5 health care practitioners interviewed found the artefact satisfied the 5 requirements to successfully evaluate the model as having validity and utility. Content analysis results from the confirmatory focus group mapped top 5 descriptors per requirement to support a true hypothesis that there is significant discussion among participants to justify concluding that the artefact had validity and utility. Conclusions The context model for WSD satisfied all requirements and was evaluated successfully for information retrieval to improve EBP. Outcomes from this study justify further research into the model.


2020 ◽  
Author(s):  
Nikhil Ranjan Nayak

Information retrieval (IR) is the activity of obtaining information resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.Automated information retrieval systems are used to reduce what has been called information overload. An IR system is a software system that provides access to books, journals and other documents; stores and manages those documents. Web Search Engines are the most visible IR applications.It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.


Author(s):  
Thomas Mandl

This article describes the most prominent approaches to apply artificial intelligence technologies to information retrieval (IR). Information retrieval is a key technology for knowledge management. It deals with the search for information and the representation, storage and organization of knowledge. Information retrieval is concerned with search processes in which a user needs to identify a subset of information which is relevant for his information need within a large amount of knowledge. The information seeker formulates a query trying to describe his information need. The query is compared to document representations which were extracted during an indexing phase. The representations of documents and queries are typically matched by a similarity function such as the Cosine. The most similar documents are presented to the users who can evaluate the relevance with respect to their problem (Belkin, 2000). The problem to properly represent documents and to match imprecise representations has soon led to the application of techniques developed within Artificial Intelligence to information retrieval.


Author(s):  
Suruchi Chawla

This chapter explains the multi-agent system for effective information retrieval using information scent in query log mining. The precision of search results is low due to difficult to infer the information need of the small size search query and therefore information need of the user is not satisfied effectively. Information Scent is used for modeling the information need of user web search session and clustering is performed to identify the similar information need sessions. Hyper Link-Induced Topic Search (HITS) is executed on clusters to generate the Hubs and authorities for web page recommendations to users who search with similar intents. This multi-agent system based on clustered query sessions uses query operations like expansion and recommendation to infer the information need of user search queries and recommends Hubs and authorities for effective web search.


Sign in / Sign up

Export Citation Format

Share Document