Information Retrieval and Web Search

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Query Recommendation Using Hybrid Query Relevance

Future Internet ◽

10.3390/fi10110112 ◽

2018 ◽

Vol 10 (11) ◽

pp. 112

Author(s):

Jialu Xu ◽

Feiyue Ye

Keyword(s):

Information Retrieval ◽

Information Search ◽

Search Engines ◽

Web Search ◽

Superior Performance ◽

Recommendation Algorithm ◽

Web Information ◽

Query Recommendation

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.

Download Full-text

Information Retrieval

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.022 ◽

2016 ◽

Author(s):

Qiaozhu Mei ◽

Dragomir Radev

Keyword(s):

Information Retrieval ◽

Digital Libraries ◽

Web Search ◽

Retrieval System ◽

Information Retrieval System ◽

Information Need ◽

System A ◽

Recent Developments ◽

Text Information ◽

Text Information Retrieval

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.

Download Full-text

Memory versus logic: two models of organizing information and their influences on web retrieval strategies

tripleC Communication Capitalism & Critique Open Access Journal for a Global Sustainable Information Society ◽

10.31269/triplec.v4i2.34 ◽

1970 ◽

Vol 4 (2) ◽

pp. 178-186

Author(s):

Teresa Numerico

Keyword(s):

Information Retrieval ◽

Web Search ◽

Data Representation ◽

Philosophical Tradition ◽

Formal Representation ◽

Mathematical Functions ◽

Von Neumann ◽

Vannevar Bush ◽

Social Topology ◽

Information Retrieval Methods

We can find the first anticipation of the World Wide Web hypertextual structure in Bush paper of 1945, where he described a “selection” and storage machine called the Memex, capable of keeping the useful information of a user and connecting it to other relevant material present in the machine or added by other users. We will argue that Vannevar Bush, who conceived this type of machine, did it because its involvement with analogical devices. During the 1930s, in fact, he invented and built the Differential Analyzer, a powerful analogue machine, used to calculate various relevant mathematical functions. The model of the Memex is not the digital one, because it relies on another form of data representation that emulates more the procedures of memory than the attitude of the logic used by the intellect. Memory seems to select and arrange information according to association strategies, i.e., using analogies and connections that are very often arbitrary, sometimes even chaotic and completely subjective. The organization of information and the knowledge creation process suggested by logic and symbolic formal representation of data is deeply different from the former one, though the logic approach is at the core of the birth of computer science (i.e., the Turing Machine and the Von Neumann Machine). We will discuss the issues raised by these two “visions” of information management and the influences of the philosophical tradition of the theory of knowledge on the hypertextual organization of content. We will also analyze all the consequences of these different attitudes with respect to information retrieval techniques in a hypertextual environment, as the web. Our position is that it necessary to take into accounts the nature and the dynamic social topology of the network when we choose information retrieval methods for the network; otherwise, we risk creating a misleading service for the end user of web search tools (i.e., search engines).

Download Full-text

Information Retrieval systems and Web Search Engines: A Survey

10.22161/ijaers/nctet.2017.25 ◽

2017 ◽

Author(s):

Arun Kumar ◽

M. A. Jabbar ◽

Y.V. Bhaskar Reddy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Web Search Engines

Download Full-text

Web Search: Bridging Information Retrieval and Microeconomic Modeling

High Performance Computing – HiPC 2007 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-77220-0_5 ◽

2008 ◽

pp. 6-6

Author(s):

Prabhakar Raghavan

Keyword(s):

Information Retrieval ◽

Web Search

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

Web Semantics for Personalized Information Retrieval

Web Semantics for Textual and Visual Information Retrieval - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2483-0.ch008 ◽

2017 ◽

pp. 166-186 ◽

Cited By ~ 1

Author(s):

Aarti Singh ◽

Anu Sharma

Keyword(s):

Information Retrieval ◽

Semantic Web ◽

Web Search ◽

Information Overload ◽

Software Agents ◽

Web Personalization ◽

Comprehensive Review ◽

Web Technologies ◽

Intelligent Interface

This chapter explores the synergy between Semantic Web (SW) technologies and Web Personalization (WP) for demonstrating an intelligent interface for Personalized Information Retrieval (PIR) on web. Benefits of adding semantics to WP through ontologies and Software Agents (SA) has already been realized. These approaches are expected to prove useful in handling the information overload problem encountered in web search. A brief introduction to PIR process is given, followed by description of SW, ontologies and SA. A comprehensive review of existing web technologies for PIR has been presented. Although, a huge contribution by various researchers has been seen and analyzed but still there exist some gap areas where the benefits of these technologies are still to be realized in future personalized web search.

Download Full-text

World Wide Web Search Engines

Architectural Issues of Web-Enabled Electronic Business ◽

10.4018/978-1-59140-049-3.ch010 ◽

2011 ◽

pp. 155-169 ◽

Cited By ~ 1

Author(s):

Wen-Chen Hu ◽

Jyh-Haw Yeh

Keyword(s):

Information Retrieval ◽

World Wide Web ◽

Search Engines ◽

World Wide ◽

Web Search ◽

Future Research ◽

Structural Style ◽

Future Research Directions ◽

Web Search Engines ◽

Almost All

The World Wide Web now holds more than 800 million pages covering almost all issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval. Numerous search technologies have been applied to Web search engines; however, the dominant search method has yet to be identified. This chapter provides an overview of the existing technologies for Web search engines and classifies them into six categories: 1) hyperlink exploration, 2) information retrieval, 3) metasearches, 4) SQL approaches, 5) content-based multimedia searches, and 6) others. At the end of this chapter, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested.

Download Full-text

Enhancing Web Search through Query Expansion

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch116 ◽

2011 ◽

pp. 752-757 ◽

Cited By ~ 2

Author(s):

Daniel Crabtree

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

User Involvement ◽

Semantic Knowledge ◽

Web Pages ◽

Search Performance ◽

Interactive Query ◽

Web Search Engines

Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the identified pages have low relevance, the query must be refined to capture the search goal more effectively. However, finding appropriate refinement terms is difficult and time consuming for users, so researchers developed query expansion approaches to identify refinement terms automatically. There are two broad approaches to query expansion, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven et al., 2003). AQE has no user involvement, which is simpler for the user, but limits its performance. IQE has user involvement, which is more complex for the user, but means it can tackle more problems such as ambiguous queries. Searches fail by finding too many irrelevant pages (low precision) or by finding too few relevant pages (low recall). AQE has a long history in the field of information retrieval, where the focus has been on improving recall (Velez et al., 1997). Unfortunately, AQE often decreased precision as the terms used to expand a query often changed the query’s meaning (Croft and Harper (1979) identified this effect and named it query drift). The problem is that users typically consider just the first few results (Jansen et al., 2005), which makes precision vital to web search performance. In contrast, IQE has historically balanced precision and recall, leading to an earlier uptake within web search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge.

Download Full-text