Information Retrieval and Search Engines

2018 ◽  
pp. 259-304 ◽  
Author(s):  
Charu C. Aggarwal
Infolib ◽  
2020 ◽  
Vol 24 (4) ◽  
pp. 16-21
Author(s):  
Irina Krasilnikova ◽  

The urgency of the problem is associated with an increase in the number of electronic resources in many information and library institutions, the need to search for information from any sources, including external ones, the provision of documents from a group of funds (corporations), the presence of electronic catalogs and search systems. Finding information from catalogs and other search engines has always preceded the execution of orders in the interlibrary service. Borrowing and using documents from different collections (provision of interlibrary services) is possible only if there is up-to-date metadata of modern information retrieval systems (ISS). The purpose of the article is to summarize the results of studying several types of search engines. At the same time, attention was drawn to new scientific publications on the topic under study. An analysis of domestic and foreign materials on the options for searching for information is presented, which is very necessary for users, including those who are remote in the provision of interlibrary services.


2018 ◽  
Vol 10 (11) ◽  
pp. 112
Author(s):  
Jialu Xu ◽  
Feiyue Ye

With the explosion of web information, search engines have become main tools in information retrieval. However, most queries submitted in web search are ambiguous and multifaceted. Understanding the queries and mining query intention is critical for search engines. In this paper, we present a novel query recommendation algorithm by combining query information and URL information which can get wide and accurate query relevance. The calculation of query relevance is based on query information by query co-concurrence and query embedding vector. Adding the ranking to query-URL pairs can calculate the strength between query and URL more precisely. Empirical experiments are performed based on AOL log. The results demonstrate the effectiveness of our proposed query recommendation algorithm, which achieves superior performance compared to other algorithms.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


Author(s):  
Wen-Chen Hu ◽  
Jyh-Haw Yeh

The World Wide Web now holds more than 800 million pages covering almost all issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval. Numerous search technologies have been applied to Web search engines; however, the dominant search method has yet to be identified. This chapter provides an overview of the existing technologies for Web search engines and classifies them into six categories: 1) hyperlink exploration, 2) information retrieval, 3) metasearches, 4) SQL approaches, 5) content-based multimedia searches, and 6) others. At the end of this chapter, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested.


Author(s):  
Cecil Eng Huang Chua ◽  
Roger H. Chiang ◽  
Veda C. Storey

Search engines are ubiquitous tools for seeking information from the Internet and, as such, have become an integral part of our information society. New search engines that combine ideas from separate search engines generally outperform the search engines from which they took ideas. Designers, however, may not be aware of the work of other search engine developers or such work may not be available in modules that can be incorporated into another search engine. This research presents an interoperability architecture for building customized search engines. Existing search engines are analyzed and decomposed into self-contained components that are classified into six categories. A prototype, called the Automated Software Development Environment for Information Retrieval, was developed to implement the interoperability architecture, and an assessment of its feasibility was carried out. The prototype resolves conflicts between components of separate search engines and demonstrates how design features across search engines can be integrated.


Author(s):  
Thomas Mandl

In the 1960s, automatic indexing methods for texts were developed. They had already implemented the “bag-ofwords” approach, which still prevails. Although automatic indexing is widely used today, many information providers and even Internet services still rely on human information work. In the 1970s, research shifted its interest to partial-match retrieval models and proved their superiority over Boolean retrieval models. Vector-space and later probabilistic retrieval models were developed. However, it took until the 1990s for partial-match models to succeed in the market. The Internet played a great role in this success. All Web search engines were based on partial-match models and provided ranked lists as results rather than unordered sets of documents. Consumers got used to this kind of search systems, and all big search engines included partial-match functionality. However, there are many niches in which Boolean methods still dominate, for example, patent retrieval. The basis for information retrieval systems may be pictures, graphics, videos, music objects, structured documents, or combinations thereof. This article is mainly concerned with information retrieval for text documents.


Author(s):  
Daniel Crabtree

Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the identified pages have low relevance, the query must be refined to capture the search goal more effectively. However, finding appropriate refinement terms is difficult and time consuming for users, so researchers developed query expansion approaches to identify refinement terms automatically. There are two broad approaches to query expansion, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven et al., 2003). AQE has no user involvement, which is simpler for the user, but limits its performance. IQE has user involvement, which is more complex for the user, but means it can tackle more problems such as ambiguous queries. Searches fail by finding too many irrelevant pages (low precision) or by finding too few relevant pages (low recall). AQE has a long history in the field of information retrieval, where the focus has been on improving recall (Velez et al., 1997). Unfortunately, AQE often decreased precision as the terms used to expand a query often changed the query’s meaning (Croft and Harper (1979) identified this effect and named it query drift). The problem is that users typically consider just the first few results (Jansen et al., 2005), which makes precision vital to web search performance. In contrast, IQE has historically balanced precision and recall, leading to an earlier uptake within web search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge.


2021 ◽  
pp. 50-71
Author(s):  
Shakeel Ahmed ◽  
Shubham Sharma ◽  
Saneh Lata Yadav

Information retrieval is finding material of unstructured nature within large collections stored on computers. Surface web consists of indexed content accessible by traditional browsers whereas deep or hidden web content cannot be found with traditional search engines and requires a password or network permissions. In deep web, dark web is also growing as new tools make it easier to navigate hidden content and accessible with special software like Tor. According to a study by Nature, Google indexes no more than 16% of the surface web and misses all of the deep web. Any given search turns up just 0.03% of information that exists online. So, the key part of the hidden web remains inaccessible to the users. This chapter deals with positing some questions about this research. Detailed definitions, analogies are explained, and the chapter discusses related work and puts forward all the advantages and limitations of the existing work proposed by researchers. The chapter identifies the need for a system that will process the surface and hidden web data and return integrated results to the users.


Sign in / Sign up

Export Citation Format

Share Document