Towards a Model for Evaluating Web Retrieval Systems in Non-English Queries

Author(s):  
Fotis Lazarinis

As the Web population continues to grow, more non-English users will be amassed online. The purpose of this chapter is to describe the methods and the criteria used for evaluating search engines and to propose a model for evaluating the searching effectiveness of Web retrieval systems in non-English queries. The qualities and weaknesses related to the handling of Greek and Italian queries are evaluated based on this method. The fundamental purpose of the methodology is to establish quality measurements on search engine utilization from the perspective of end users. Application of the proposed evaluation methodology aids users to select the most effective search engine and developers to identify some of the modules of their software that need improvements.

2001 ◽  
Vol 1 (3) ◽  
pp. 28-31 ◽  
Author(s):  
Valerie Stevenson

Looking back to 1999, there were a number of search engines which performed equally well. I recommended defining the search strategy very carefully, using Boolean logic and field search techniques, and always running the search in more than one search engine. Numerous articles and Web columns comparing the performance of different search engines came to different conclusions on the ‘best’ search engines. Over the last year, however, all the speakers at conferences and seminars I have attended have recommended Google as their preferred tool for locating all kinds of information on the Web. I confess that I have now abandoned most of my carefully worked out search strategies and comparison tests, and use Google for most of my own Web searches.


Author(s):  
Suely Fragoso

This chapter proposes that search engines apply a verticalizing pressure on the WWW many-to-many information distribution model, forcing this to revert to a distributive model similar to that of the mass media. The argument for this starts with a critical descriptive examination of the history of search mechanisms for the Internet. Parallel to this there is a discussion of the increasing ties between the search engines and the advertising market. The chapter then presents questions concerning the concentration of traffic on the Web around a small number of search engines which are in the hands of an equally limited number of enterprises. This reality is accentuated by the confidence that users place in the search engine and by the ongoing acquisition of collaborative systems and smaller players by the large search engines. This scenario demonstrates the verticalizing pressure that the search engines apply to the majority of WWW users, that bring it back toward the mass distribution mode.


Author(s):  
Max Chevalier ◽  
Christine Julien ◽  
Chantal Soulé-Dupuy

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.


Author(s):  
Pavel Šimek ◽  
Jiří Vaněk ◽  
Jan Jarolímek

The majority of Internet users use the global network to search for different information using fulltext search engines such as Google, Yahoo!, or Seznam. The web presentation operators are trying, with the help of different optimization techniques, to get to the top places in the results of fulltext search engines. Right there is a great importance of Search Engine Optimization and Search Engine Marketing, because normal users usually try links only on the first few pages of the fulltext search engines results on certain keywords and in catalogs they use primarily hierarchically higher placed links in each category. Key to success is the application of optimization methods which deal with the issue of keywords, structure and quality of content, domain names, individual sites and quantity and reliability of backward links. The process is demanding, long-lasting and without a guaranteed outcome. A website operator without advanced analytical tools do not identify the contribution of individual documents from which the entire web site consists. If the web presentation operators want to have an overview of their documents and web site in global, it is appropriate to quantify these positions in a specific way, depending on specific key words. For this purpose serves the quantification of competitive value of documents, which consequently sets global competitive value of a web site. Quantification of competitive values is performed on a specific full-text search engine. For each full-text search engine can be and often are, different results. According to published reports of ClickZ agency or Market Share is according to the number of searches by English-speaking users most widely used Google search engine, which has a market share of more than 80%. The whole procedure of quantification of competitive values is common, however, the initial step which is the analysis of keywords depends on a choice of the fulltext search engine.


Compiler ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 71
Author(s):  
Aris Wahyu Murdiyanto ◽  
Adri Priadana

Keyword research is one of the essential activities in Search Engine Optimization (SEO). One of the techniques in doing keyword research is to find out how many articles titles on a website indexed by the Google search engine contain a particular keyword or so-called "allintitle". Moreover, search engines are also able to provide keywords suggestion. Getting keywords suggestions and allintitle will not be effective, efficient, and economical if done manually for relatively extensive keyword research. It will take a long time to decide whether a keyword is needed to be optimized. Based on these problems, this study aimed to analyze the implementation of the web scraping technique to get relevant keyword suggestions from the Google search engine and the number of "allintitle" that are owned automatically. The data used as an experiment in this test consists of ten keywords, which each keyword would generate a maximum of ten keywords suggestion. Therefore, from ten keywords, it will produce at most 100 keywords suggestions and the number of allintitles. Based on the evaluation result, we got an accuracy of 100%. It indicated that the technique could be applied to get keywords suggestions and allintitle from Google search engines with outstanding accuracy values.


2020 ◽  
Vol 38 (3) ◽  
pp. 477-492
Author(s):  
Mahdi Zeynali Tazehkandi ◽  
Mohsen Nowkarizi

Purpose The purpose of this paper is to present a review on the use of the recall metric for evaluating information retrieval systems, especially search engines. Design/methodology/approach This paper investigates different researchers’ views about recall metrics. Findings Five different definitions for recall were identified. For the first group, recall refers to completeness, but it does not specify where all the relevant documents are located. For the second group, recall refers to retrieving all the relevant documents from the collection. However, it seems that the term “collection” is ambiguous. For the third group (first approach), collection means the index of search engines and, for the fourth group (second approach), collection refers to the Web. For the fifth group (third approach), ranking of the retrieved documents should also be accounted for in calculating recall. Practical implications It can be said that in the first, second and third approaches, the components of the retrieval algorithm, the retrieval algorithm and crawler, and the retrieval algorithm and crawler and ranker, respectively, are evaluated. To determine the effectiveness of search engines for the use of users, it is better to use the third approach in recall measurement. Originality/value The value of this paper is to collect, identify and analyse literature that is used in recall. In addition, different views of researchers about recall are identified.


Author(s):  
S. Naseehath

Webometric research has fallen into two main categories, namely link analysis and search engine evaluation. Search engines are also used to collect data for link analysis. A set of measurements is proposed for evaluating web search engine performance. Some measurements are adapted from the concepts of recall and precision, which are commonly used in evaluating traditional information retrieval systems. Others are newly developed to evaluate search engine stability, which is unique to web information retrieval systems. Overlapping of search results, annual growth of search results on each search engines, variation of results on search using synonyms are also used to evaluate the relative efficiency of search engines. In this study, the investigator attempts to conduct a webometric study on the topic medical tourism in Kerala using six search engines; these include three general search engines, namely Bing, Google, and Lycos, and three metasearch engines, namely Dogpile, ixquick, and WebCrawler.


2018 ◽  
pp. 742-748
Author(s):  
Viveka Vardhan Jumpala

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.


Author(s):  
Cláudio Elízio Calazans Campelo ◽  
Cláudio de Souza Baptista ◽  
Ricardo Madeira Fernandes

It is well known that documents available on the Web are extremely heterogeneous in several aspects, such as the use of various idioms, different formats to represent the contents, besides other external factors like source reputation, refresh frequency, and so forth (Page & Brin, 1998). Altogether, these factors increase the complexity of Web information retrieval systems. Superficially, traditional search engines available on the Web nowadays consist of retrieving documents that contain keywords informed by users. Nevertheless, among the variety of search possibilities, it is evident that the user needs a process that involves more sophisticated analysis; for example, temporal or spatial contextualization might be considered. In these keyword-based search engines, for instance, a Web page containing the phrase “…due to the company arrival in London, a thousand java programming jobs will be open…” would not be found if the submitted search was “jobs programming England,” unless the word “England” appeared in another phrase of the page. The explanation to this fact is that the term “London” is treated merely like another word, instead of regarding its geographical position. In a spatial search engine, the expected behavior would be to return the page described in the previous example, since the system shall have information indicating that the term “London” refers to a city located in a country referred to by the term “England.” This result could only be feasible in a traditional search engine if the user repeatedly submitted searches for all possible England sub-regions (e.g., cities). In accordance with the example, it is reasonable that for several user searches, the most interesting results are those related to certain geographical regions. A variety of features extraction and automatic document classification techniques have been proposed, however, acquiring Web-page geographical features involves some peculiar complexities, such as ambiguity (e.g., many places with the same name, various names for a single place, things with place names, etc.). Moreover, a Web page can refer to a place that contains or is contained by the one informed in the user query, which implies knowing the different region topologies used by the system. Many features related to geographical context can be added to the process of elaborating relevance ranking for returned documents. For example, a document can be more relevant than another one if its content refers to a place closer to the user location. Nonetheless, in spatial search engines, there are more complex issues to be considered because of the spatial dimension concerning on ranking elaboration. Jones, Alani, and Tudhope (2001) propose a combination of Euclidian distance between place centroids with hierarchical distances in order to generate a hybrid spatial distance that may be used in the relevance ranking elaboration of returned documents. Further important issues are the indexing mechanisms and query processing. In general, these solutions try to combine well-known textual indexing techniques (e.g., inverted files) with spatial indexing mechanisms. On the subject of user interface, spatial search engines are more complex, because users need to choose regions of interest, as well as possible spatial relationships, in addition to keywords. To visualize the results, it is pleasant to use digital map resources besides textual information.


Author(s):  
Rahul Pradhan ◽  
Dilip Kumar Sharma

Users issuing query on search engine, expect results to more relevant to query topic rather than just the textual match with text in query. Studies conducted by few researchers shows that user want the search engine to understand the implicit intent of query rather than looking the textual match in hypertext structure of document or web page. In this paper the authors will be addressing queries that have any temporal intent and help the web search engines to classify them in certain categories. These classes or categories will help search engine to understand and cater the need of query. The authors will consider temporal expression (e.g. 1943) in document and categories them on the basis of temporal boundary of that query. Their experiment classifies the query and tries to suggest further course of action for search engines. Results shows that classifying the query to these classes will help user to reach his/her seeking information faster.


Sign in / Sign up

Export Citation Format

Share Document