Information Retrieval and Search Engines

The urgency of the problem is associated with an increase in the number of electronic resources in many information and library institutions, the need to search for information from any sources, including external ones, the provision of documents from a group of funds (corporations), the presence of electronic catalogs and search systems. Finding information from catalogs and other search engines has always preceded the execution of orders in the interlibrary service. Borrowing and using documents from different collections (provision of interlibrary services) is possible only if there is up-to-date metadata of modern information retrieval systems (ISS). The purpose of the article is to summarize the results of studying several types of search engines. At the same time, attention was drawn to new scientific publications on the topic under study. An analysis of domestic and foreign materials on the options for searching for information is presented, which is very necessary for users, including those who are remote in the provision of interlibrary services.

Download Full-text

Query Recommendation Using Hybrid Query Relevance

Future Internet ◽

10.3390/fi10110112 ◽

2018 ◽

Vol 10 (11) ◽

pp. 112

Author(s):

Jialu Xu ◽

Feiyue Ye

Keyword(s):

Information Retrieval ◽

Information Search ◽

Search Engines ◽

Web Search ◽

Superior Performance ◽

Recommendation Algorithm ◽

Web Information ◽

Query Recommendation

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.

Download Full-text

Information Retrieval systems and Web Search Engines: A Survey

10.22161/ijaers/nctet.2017.25 ◽

2017 ◽

Author(s):

Arun Kumar ◽

M. A. Jabbar ◽

Y.V. Bhaskar Reddy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Web Search Engines

Download Full-text

Retrieval Effectiveness of Cross Language Information Retrieval Search Engines

Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24826-9_37 ◽

2011 ◽

pp. 296-306

Author(s):

Schubert Foo

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Retrieval Effectiveness ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

World Wide Web Search Engines

Architectural Issues of Web-Enabled Electronic Business ◽

10.4018/978-1-59140-049-3.ch010 ◽

2011 ◽

pp. 155-169 ◽

Cited By ~ 1

Author(s):

Wen-Chen Hu ◽

Jyh-Haw Yeh

Keyword(s):

Information Retrieval ◽

World Wide Web ◽

Search Engines ◽

World Wide ◽

Web Search ◽

Future Research ◽

Structural Style ◽

Future Research Directions ◽

Web Search Engines ◽

Almost All

The World Wide Web now holds more than 800 million pages covering almost all issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval. Numerous search technologies have been applied to Web search engines; however, the dominant search method has yet to be identified. This chapter provides an overview of the existing technologies for Web search engines and classifies them into six categories: 1) hyperlink exploration, 2) information retrieval, 3) metasearches, 4) SQL approaches, 5) content-based multimedia searches, and 6) others. At the end of this chapter, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested.

Download Full-text

Improving Domain Searches through Customized Search Engines

Intelligent, Adaptive and Reasoning Technologies ◽

10.4018/978-1-60960-595-7.ch001 ◽

2011 ◽

pp. 1-22

Author(s):

Cecil Eng Huang Chua ◽

Roger H. Chiang ◽

Veda C. Storey

Keyword(s):

Information Retrieval ◽

Software Development ◽

Search Engine ◽

Search Engines ◽

Information Society ◽

The Internet ◽

Design Features ◽

Development Environment ◽

Software Development Environment ◽

Automated Software

Search engines are ubiquitous tools for seeking information from the Internet and, as such, have become an integral part of our information society. New search engines that combine ideas from separate search engines generally outperform the search engines from which they took ideas. Designers, however, may not be aware of the work of other search engine developers or such work may not be available in modules that can be incorporated into another search engine. This research presents an interoperability architecture for building customized search engines. Existing search engines are analyzed and decomposed into self-contained components that are classified into six categories. A prototype, called the Automated Software Development Environment for Information Retrieval, was developed to implement the interoperability architecture, and an assessment of its feasibility was carried out. The prototype resolves conflicts between components of separate search engines and demonstrates how design features across search engines can be integrated.

Download Full-text

Technologies for Information Access and Knowledge Management

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch587 ◽

2011 ◽

pp. 3680-3685

Author(s):

Thomas Mandl

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Information Access ◽

Partial Match ◽

Retrieval Models ◽

Automatic Indexing ◽

Patent Retrieval ◽

Information Retrieval Systems ◽

Information Work ◽

The 1960S

In the 1960s, automatic indexing methods for texts were developed. They had already implemented the “bag-ofwords” approach, which still prevails. Although automatic indexing is widely used today, many information providers and even Internet services still rely on human information work. In the 1970s, research shifted its interest to partial-match retrieval models and proved their superiority over Boolean retrieval models. Vector-space and later probabilistic retrieval models were developed. However, it took until the 1990s for partial-match models to succeed in the market. The Internet played a great role in this success. All Web search engines were based on partial-match models and provided ranked lists as results rather than unordered sets of documents. Consumers got used to this kind of search systems, and all big search engines included partial-match functionality. However, there are many niches in which Boolean methods still dominate, for example, patent retrieval. The basis for information retrieval systems may be pictures, graphics, videos, music objects, structured documents, or combinations thereof. This article is mainly concerned with information retrieval for text documents.

Download Full-text

Enhancing Web Search through Query Expansion

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch116 ◽

2011 ◽

pp. 752-757 ◽

Cited By ~ 2

Author(s):

Daniel Crabtree

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

User Involvement ◽

Semantic Knowledge ◽

Web Pages ◽

Search Performance ◽

Interactive Query ◽

Web Search Engines

Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the identified pages have low relevance, the query must be refined to capture the search goal more effectively. However, finding appropriate refinement terms is difficult and time consuming for users, so researchers developed query expansion approaches to identify refinement terms automatically. There are two broad approaches to query expansion, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven et al., 2003). AQE has no user involvement, which is simpler for the user, but limits its performance. IQE has user involvement, which is more complex for the user, but means it can tackle more problems such as ambiguous queries. Searches fail by finding too many irrelevant pages (low precision) or by finding too few relevant pages (low recall). AQE has a long history in the field of information retrieval, where the focus has been on improving recall (Velez et al., 1997). Unfortunately, AQE often decreased precision as the terms used to expand a query often changed the query’s meaning (Croft and Harper (1979) identified this effect and named it query drift). The problem is that users typically consider just the first few results (Jansen et al., 2005), which makes precision vital to web search performance. In contrast, IQE has historically balanced precision and recall, leading to an earlier uptake within web search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge.

Download Full-text

Information Retrieval in the Hidden Web

10.4018/978-1-7998-8061-5.ch003 ◽

2021 ◽

pp. 50-71

Author(s):

Shakeel Ahmed ◽

Shubham Sharma ◽

Saneh Lata Yadav

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Deep Web ◽

Web Content ◽

Web Data ◽

Hidden Web ◽

Special Software ◽

Surface Web ◽

Dark Web

Information retrieval is finding material of unstructured nature within large collections stored on computers. Surface web consists of indexed content accessible by traditional browsers whereas deep or hidden web content cannot be found with traditional search engines and requires a password or network permissions. In deep web, dark web is also growing as new tools make it easier to navigate hidden content and accessible with special software like Tor. According to a study by Nature, Google indexes no more than 16% of the surface web and misses all of the deep web. Any given search turns up just 0.03% of information that exists online. So, the key part of the hidden web remains inaccessible to the users. This chapter deals with positing some questions about this research. Detailed definitions, analogies are explained, and the chapter discusses related work and puts forward all the advantages and limitations of the existing work proposed by researchers. The chapter identifies the need for a system that will process the surface and hidden web data and return integrated results to the users.

Download Full-text