Next Generation Search Engines

This chapter shows that the wider use of Web search engines, reconsidering the theoretical and methodological frameworks to grasp new information practices. Beginning with an overview of the recent challenges implied by the dynamic nature of the Web, this chapter then traces the information behavior related concepts in order to present the different approaches from the user perspective. The authors pay special attention to the concept of “information practice” and other related concepts such as “use”, “activity”, and “behavior” largely used in the literature but not always strictly defined. The authors provide an overview of user-oriented studies that are meaningful to understand the different contexts of use of electronic information access systems, focusing on five approaches: the system-oriented approaches, the theories of information seeking, the cognitive and psychological approaches, the management science approaches, and the marketing approaches. Future directions of work are then shaped, including social searching and the ethical, cultural, and political dimensions of Web search engines. The authors conclude considering the importance of Critical theory to better understand the role of Web Search engines in our modern society.

Download Full-text

Spatio-Temporal Based Personalization for Mobile Search

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch017 ◽

2012 ◽

pp. 386-409 ◽

Cited By ~ 3

Author(s):

Ourdia Bouidghaghen ◽

Lynda Tamine

Keyword(s):

Information Retrieval ◽

Contextual Information ◽

The Internet ◽

Search Results ◽

Search Result ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Spatio Temporal ◽

Mobile Context

The explosion of the information available on the Internet has made traditional information retrieval systems, characterized by one size fits all approaches, less effective. Indeed, users are overwhelmed by the information delivered by such systems in response to their queries, particularly when the latter are ambiguous. In order to tackle this problem, the state-of-the-art reveals that there is a growing interest towards contextual information retrieval (CIR) which relies on various sources of evidence issued from the user’s search background and environment, in order to improve the retrieval accuracy. This chapter focuses on mobile context, highlights challenges they present for IR, and gives an overview of CIR approaches applied in this environment. Then, the authors present an approach to personalize search results for mobile users by exploiting both cognitive and spatio-temporal contexts. The experimental evaluation undertaken in front of Yahoo search shows that the approach improves the quality of top search result lists and enhances search result precision.

Download Full-text

Finding Answers to Questions, in Text Collections or Web, in Open Domain or Specialty Domains

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch015 ◽

2012 ◽

pp. 344-370

Author(s):

Brigitte Grau

Keyword(s):

Natural Language ◽

Information Management ◽

Question Answering ◽

Target Language ◽

Open Domain ◽

Text Collections ◽

Factual Question ◽

Structured Knowledge ◽

Answering Questions ◽

The Web

This chapter is dedicated to factual question answering, i.e., extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e., a query made of a list of words), and provides clues for finding precise answers. The author first focuses on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. The author first presents how to answer factual question in open domain. The author also presents answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, this chapter presents main approaches and the remaining problems.

Download Full-text

Extensions of Web Browsers useful to Knowledge Workers

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch011 ◽

2012 ◽

pp. 239-273

Author(s):

Sarah Vert

Keyword(s):

Knowledge Workers ◽

Knowledge Worker ◽

Working Environment ◽

The Internet ◽

Web Browsers ◽

Web Page ◽

Web Browser ◽

Internet Explorer ◽

The One ◽

The Web

This chapter focuses on the Internet working environment of Knowledge Workers through the customization of the Web browser on their computer. Given that a Web browser is designed to be used by anyone browsing the Internet, its initial configuration must meet generic needs such as reading a Web page, searching for information, and bookmarking. In the absence of a universal solution that meets the specific needs of each user, browser developers offer additional programs known as extensions, or add-ons. Among the various browsers that can be modified with add-ons, Mozilla’s Firefox is perhaps the one that first springs to mind; indeed, Mozilla has built the Firefox brand around these extensions. Using this example, and also considering the browsers Google Chrome, Internet Explorer, Opera and Safari, the author will attempt to demonstrate the potential of Web browsers in terms of the resources they can offer when they are customizable and available within the working environment of a Knowledge Worker.

Download Full-text

The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch008 ◽

2012 ◽

pp. 174-190

Author(s):

Michael W. Berry ◽

Reed Esau ◽

Bruce Kiefer

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Matrix Factorization ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Electronic Documents ◽

Collection Process ◽

Relevance Judgments ◽

Electronic Discovery ◽

Non Negative Matrix Factorization

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

Download Full-text

Semantic Models in Information Retrieval

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch007 ◽

2012 ◽

pp. 138-173

Author(s):

Edmond Lassalle ◽

Emmanuel Lassalle

Keyword(s):

Information Retrieval ◽

Probabilistic Models ◽

Boolean Model ◽

Point Of View ◽

Language Models ◽

Linguistic Processing ◽

Dirichlet Prior ◽

Set Up ◽

Frequency Counting ◽

Formal Point

Robertson and Spärck Jones pioneered experimental probabilistic models (Binary Independence Model) with both a typology generalizing the Boolean model, a frequency counting to calculate elementary weightings, and their combination into a global probabilistic estimation. However, this model did not consider indexing terms dependencies. An extension to mixture models (e.g., using a 2-Poisson law) made it possible to take into account these dependencies from a macroscopic point of view (BM25), as well as a shallow linguistic processing of co-references. New approaches (language models, for example “bag of words” models, probabilistic dependencies between requests and documents, and consequently Bayesian inference using Dirichlet prior conjugate) furnished new solutions for documents structuring (categorization) and for index smoothing. Presently, in these probabilistic models the main issues have been addressed from a formal point of view only. Thus, linguistic properties are neglected in the indexing language. The authors examine how a linguistic and semantic modeling can be integrated in indexing languages and set up a hybrid model that makes it possible to deal with different information retrieval problems in a unified way.

Download Full-text

Decentralized Search and the Clustering Paradox in Large Scale Information Networks

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch002 ◽

2012 ◽

pp. 29-46

Author(s):

Weimao Ke

Keyword(s):

Information Retrieval ◽

Large Scale ◽

Information Networks ◽

Search Performance ◽

Retrieval Models ◽

Information Spaces ◽

Decentralized Search ◽

Key Issues ◽

Large Scale Networks ◽

The Impact

Amid the rapid growth of information today is the increasing challenge for people to navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web raise important questions about information retrieval in these environments. Collection of all information in advance and centralization of IR operations are extremely difficult, if not impossible, because systems are dynamic and information is distributed. The chapter discusses some of the key issues facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and discusses a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit.

Download Full-text

Indexing the World Wide Web

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch001 ◽

2012 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

Abhishek Das ◽

Ankit Jain

Keyword(s):

World Wide Web ◽

Search Engine ◽

Data Structures ◽

World Wide ◽

Web Search ◽

Distributed Processing ◽

Data Partitioning ◽

Information Organization ◽

Trade Offs ◽

The World

In this chapter, the authors describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.

Download Full-text

A Framework for Evaluating the Retrieval Effectiveness of Search Engines

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch020 ◽

2012 ◽

pp. 456-479 ◽

Cited By ~ 5

Author(s):

Dirk Lewandowski

Keyword(s):

Search Engines ◽

Web Search ◽

Theoretical Framework ◽

Short Description ◽

Next Generation ◽

Reliable Test ◽

General Design ◽

Additional Information ◽

Retrieval Effectiveness

This chapter presents a theoretical framework for evaluating next generation search engines. The author focuses on search engines whose results presentation is enriched with additional information and does not merely present the usual list of “10 blue links,” that is, of ten links to results, accompanied by a short description. While Web search is used as an example here, the framework can easily be applied to search engines in any other area. The framework not only addresses the results presentation, but also takes into account an extension of the general design of retrieval effectiveness tests. The chapter examines the ways in which this design might influence the results of such studies and how a reliable test is best designed.

Download Full-text

Using Association Rules for Query Reformulation

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch013 ◽

2012 ◽

pp. 291-303

Author(s):

Ismaïl Biskri ◽

Louis Rompré

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Association Rules ◽

Text Classification ◽

Large Volume ◽

Query Reformulation ◽

Long Time ◽

Hidden Knowledge

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.

Download Full-text

Next Generation Search Engines
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Studying Web Search Engines from a User Perspective

Spatio-Temporal Based Personalization for Mobile Search

Finding Answers to Questions, in Text Collections or Web, in Open Domain or Specialty Domains

Extensions of Web Browsers useful to Knowledge Workers

The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Semantic Models in Information Retrieval

Decentralized Search and the Clustering Paradox in Large Scale Information Networks

Indexing the World Wide Web

A Framework for Evaluating the Retrieval Effectiveness of Search Engines

Using Association Rules for Query Reformulation

Export Citation Format

Next Generation Search EnginesLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Studying Web Search Engines from a User Perspective

Spatio-Temporal Based Personalization for Mobile Search

Finding Answers to Questions, in Text Collections or Web, in Open Domain or Specialty Domains

Extensions of Web Browsers useful to Knowledge Workers

The Use of Text Mining Techniques in Electronic Discovery for Legal Matters

Semantic Models in Information Retrieval

Decentralized Search and the Clustering Paradox in Large Scale Information Networks

Indexing the World Wide Web

A Framework for Evaluating the Retrieval Effectiveness of Search Engines

Using Association Rules for Query Reformulation

Next Generation Search Engines
Latest Publications