A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines

Author(s):  
Safaa I. Hajeer ◽  
Rasha M. Ismail ◽  
Nagwa L. Badr ◽  
Mohamed Fahmy Tolba
Infolib ◽  
2020 ◽  
Vol 24 (4) ◽  
pp. 16-21
Author(s):  
Irina Krasilnikova ◽  

The urgency of the problem is associated with an increase in the number of electronic resources in many information and library institutions, the need to search for information from any sources, including external ones, the provision of documents from a group of funds (corporations), the presence of electronic catalogs and search systems. Finding information from catalogs and other search engines has always preceded the execution of orders in the interlibrary service. Borrowing and using documents from different collections (provision of interlibrary services) is possible only if there is up-to-date metadata of modern information retrieval systems (ISS). The purpose of the article is to summarize the results of studying several types of search engines. At the same time, attention was drawn to new scientific publications on the topic under study. An analysis of domestic and foreign materials on the options for searching for information is presented, which is very necessary for users, including those who are remote in the provision of interlibrary services.


Author(s):  
Wen-Chen Hu ◽  
Jyh-Haw Yeh

The World Wide Web now holds more than 800 million pages covering almost all issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval. Numerous search technologies have been applied to Web search engines; however, the dominant search method has yet to be identified. This chapter provides an overview of the existing technologies for Web search engines and classifies them into six categories: 1) hyperlink exploration, 2) information retrieval, 3) metasearches, 4) SQL approaches, 5) content-based multimedia searches, and 6) others. At the end of this chapter, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested.


Author(s):  
Daniel Crabtree

Web search engines help users find relevant web pages by returning a result set containing the pages that best match the user’s query. When the identified pages have low relevance, the query must be refined to capture the search goal more effectively. However, finding appropriate refinement terms is difficult and time consuming for users, so researchers developed query expansion approaches to identify refinement terms automatically. There are two broad approaches to query expansion, automatic query expansion (AQE) and interactive query expansion (IQE) (Ruthven et al., 2003). AQE has no user involvement, which is simpler for the user, but limits its performance. IQE has user involvement, which is more complex for the user, but means it can tackle more problems such as ambiguous queries. Searches fail by finding too many irrelevant pages (low precision) or by finding too few relevant pages (low recall). AQE has a long history in the field of information retrieval, where the focus has been on improving recall (Velez et al., 1997). Unfortunately, AQE often decreased precision as the terms used to expand a query often changed the query’s meaning (Croft and Harper (1979) identified this effect and named it query drift). The problem is that users typically consider just the first few results (Jansen et al., 2005), which makes precision vital to web search performance. In contrast, IQE has historically balanced precision and recall, leading to an earlier uptake within web search. However, like AQE, the precision of IQE approaches needs improvement. Most recently, approaches have started to improve precision by incorporating semantic knowledge.


Author(s):  
Max Chevalier ◽  
Christine Julien ◽  
Chantal Soulé-Dupuy

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.


Author(s):  
Wen-Chen Hu ◽  
Hung-Jen Yang ◽  
Jyh-haw Yeh ◽  
Chung-wei Lee

The World Wide Web now holds more than six billion pages covering almost all daily issues. The Web’s fast growing size and lack of structural style present a new challenge for information retrieval (Lawrence & Giles, 1999a). Traditional search techniques are based on users typing in search keywords which the search services can then use to locate the desired Web pages. However, this approach normally retrieves too many documents, of which only a small fraction are relevant to the users’ needs. Furthermore, the most relevant documents do not necessarily appear at the top of the query output list. Numerous search technologies have been applied to Web search engines; however, the dominant search methods have yet to be identified. This article provides an overview of the existing technologies for Web search engines and classifies them into six categories: i) hyperlink exploration, ii) information retrieval, iii) metasearches, iv) SQL approaches, v) content-based multimedia searches, and vi) others. At the end of this article, a comparative study of major commercial and experimental search engines is presented, and some future research directions for Web search engines are suggested. Related Web search technology review can also be found in Arasu, Cho, Garcia-Molina, Paepcke, and Raghavan (2001) and Lawrence and Giles (1999b).


Author(s):  
S. Naseehath

Webometric research has fallen into two main categories, namely link analysis and search engine evaluation. Search engines are also used to collect data for link analysis. A set of measurements is proposed for evaluating web search engine performance. Some measurements are adapted from the concepts of recall and precision, which are commonly used in evaluating traditional information retrieval systems. Others are newly developed to evaluate search engine stability, which is unique to web information retrieval systems. Overlapping of search results, annual growth of search results on each search engines, variation of results on search using synonyms are also used to evaluate the relative efficiency of search engines. In this study, the investigator attempts to conduct a webometric study on the topic medical tourism in Kerala using six search engines; these include three general search engines, namely Bing, Google, and Lycos, and three metasearch engines, namely Dogpile, ixquick, and WebCrawler.


Author(s):  
Fabrizio Sebastiani

The categorization of documents into subject-specific categories is a useful enhancement for large document collections addressed by information retrieval systems, as a user can first browse a category tree in search of the category that best matches her interests and then issue a query for more specific documents “from within the category.” This approach combines two modalities in information seeking that are most popular in Web-based search engines, i.e., category-based site browsing (as exemplified by, e.g., Yahoo™) and keyword-based document querying (as exemplified by, e.g., AltaVista™). Appropriate query expansion tools need to be provided, though, in order to allow the user to incrementally refine her query through further retrieval passes, thus allowing the system to produce a series of subsequent document rankings that hopefully converge to the user’s expected ranking. In this work we propose that automatically generated, category-specific “associative” thesauri be used for such purpose. We discuss a method for their generation and discuss how the thesaurus specific to a given category may usefully be endowed with “gateways” to the thesauri specific to its parent and children categories.


Author(s):  
Pankaj Dadure ◽  
Partha Pakray ◽  
Sivaji Bandyopadhyay

Mathematical formulas are widely used to express ideas and fundamental principles of science, technology, engineering, and mathematics. The rapidly growing research in science and engineering leads to a generation of a huge number of scientific documents which contain both textual as well as mathematical terms. In a scientific document, the sense of mathematical formulae is conveyed through the context and the symbolic structure which follows the strong domain specific conventions. In contrast to textual information, developed mathematical information retrieval systems have demonstrated the unique and elite indexing and matching approaches which are beneficial to the retrieval of formulae and scientific term. This chapter discusses the recent advancement in formula-based search engines, various formula representation styles and indexing techniques, benefits of formula-based search engines in various future applications like plagiarism detection, math recommendation system, etc.


Sign in / Sign up

Export Citation Format

Share Document