Retriever

Author(s):  
Anupam Joshi ◽  
Zhihua Jiang

Web search engines have become increasingly ineffective as the number of documents on the Web have proliferated. Typical queries retrieve hundreds of documents, most of which have no relation with what the user was looking for. The chapter describes a system named Retriever that uses a recently proposed robust fuzzy algorithm RFCMdd to cluster the results of a query from a search engine into groups. These groups and their associated keywords are presented to the user, who can then look into the URLs for the group(s) that s/he finds interesting. This application requires clustering in the presence of a significant amount of noise, which our system can handle efficiently. N-Gram and Vector Space methods are used to create the dissimilarity matrix for clustering. We discuss the performance of our system by comparing it with other state-of-the-art peers, such as Husky search, and present the results from analyzing the effectiveness of the N-Gram and Vector Space methods during the generation of dissimilarity matrices.

Author(s):  
Rahul Pradhan ◽  
Dilip Kumar Sharma

Users issuing query on search engine, expect results to more relevant to query topic rather than just the textual match with text in query. Studies conducted by few researchers shows that user want the search engine to understand the implicit intent of query rather than looking the textual match in hypertext structure of document or web page. In this paper the authors will be addressing queries that have any temporal intent and help the web search engines to classify them in certain categories. These classes or categories will help search engine to understand and cater the need of query. The authors will consider temporal expression (e.g. 1943) in document and categories them on the basis of temporal boundary of that query. Their experiment classifies the query and tries to suggest further course of action for search engines. Results shows that classifying the query to these classes will help user to reach his/her seeking information faster.


Author(s):  
Shanfeng Zhu ◽  
Xiaotie Deng ◽  
Qizhi Fang ◽  
Weimin Zhang

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.


2008 ◽  
pp. 1926-1937
Author(s):  
Shanfeng Chu ◽  
Xiaotie Deng ◽  
Qizhi Fang ◽  
Weimin Zhang

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.


Author(s):  
Rahul Pradhan ◽  
Dilip Kumar Sharma

Users issuing query on search engine, expect results to more relevant to query topic rather than just the textual match with text in query. Studies conducted by few researchers shows that user want the search engine to understand the implicit intent of query rather than looking the textual match in hypertext structure of document or web page. In this paper the authors will be addressing queries that have any temporal intent and help the web search engines to classify them in certain categories. These classes or categories will help search engine to understand and cater the need of query. The authors will consider temporal expression (e.g. 1943) in document and categories them on the basis of temporal boundary of that query. Their experiment classifies the query and tries to suggest further course of action for search engines. Results shows that classifying the query to these classes will help user to reach his/her seeking information faster.


Author(s):  
B. J. Jansen ◽  
A. Spink

People are now confronted with the task of locating electronic information needed to address the issues of their daily lives. The Web is presently the major information source for many people in the U.S. (Cole, Suman, Schramm, Lunn, & Aquino, 2003), used more than newspapers, magazines, and television as a source of information. Americans are expanding their use of the Web for all sorts of information and commercial purposes (Horrigan, 2004; Horrigan & Rainie, 2002; National Telecommunications and Information Administration, 2002). Searching for information is one of the most popular Web activities, second only to the use of e-mail (Nielsen Media, 1997). However, successfully locating needed information remains a difficult and challenging task (Eastman & Jansen, 2003). Locating relevant information not only affects individuals but also commercial, educational, and governmental organizations. This is especially true in regards to people interacting with their governmental agencies. Executive Order 13011 (Clinton, 1996) directed the U.S. federal government to move aggressively with strategies to utilize the Internet. Birdsell and Muzzio (1999) present the growing presence of governmental Web sites, classifying them into three general categories, (1) provision of information, (2) delivery of forms, and (3) transactions. In 2004, 29% of American said they visited a government Web site to contact some governmental entity, 18% sent an e-mail and 22% use multiple means (Horrigan, 2004). It seems clear that the Web is a major conduit for accessing governmental information and maybe services. Search engines are the primary means for people to locate Web sites (Nielsen Media, 1997). Given the Web’s importance, we need to understand how Web search engines perform (Lawrence & Giles, 1998) and how people use and interact with Web search engines to locate governmental information. Examining Web searching for governmental information is an important area of research with the potential to increase our understanding of users of Web-based governmental information, advance our knowledge of Web searchers’ governmental information needs, and positively impact the design of Web search engines and sites that specialize in governmental information.


2011 ◽  
Vol 10 (05) ◽  
pp. 913-931 ◽  
Author(s):  
XIANYONG FANG ◽  
CHRISTIAN JACQUEMIN ◽  
FRÉDÉRIC VERNIER

Since the results from Semantic Web search engines are highly structured XML documents, they cannot be efficiently visualized with traditional explorers. Therefore, the Semantic Web calls for a new generation of search query visualizers that can rely on document metadata. This paper introduces such a visualization system called WebContent Visualizer that is used to display and browse search engine results. The visualization is organized into three levels: (1) Carousels contain documents with the same ranking, (2) carousels are piled into stacks, one for each date, and (3) these stacks are organized along a meta-carousel to display the results for several dates. Carousel stacks are piles of local carousels with increasing radii to visualize the ranks of classes. For document comparison, colored links connect documents between neighboring classes on the basis of shared entities. Based on these techniques, the interface is made of three collaborative components: an inspector window, a visualization panel, and a detailed dialog component. With this architecture, the system is intended to offer an efficient way to explore the results returned by Semantic Web search engines.


2019 ◽  
Vol 49 (5) ◽  
pp. 707-731 ◽  
Author(s):  
Malte Ziewitz

When measures come to matter, those measured find themselves in a precarious situation. On the one hand, they have a strong incentive to respond to measurement so as to score a favourable rating. On the other hand, too much of an adjustment runs the risk of being flagged and penalized by system operators as an attempt to ‘game the system’. Measures, the story goes, are most useful when they depict those measured as they usually are and not how they intend to be. In this article, I explore the practices and politics of optimization in the case of web search engines. Drawing on materials from ethnographic fieldwork with search engine optimization (SEO) consultants in the United Kingdom, I show how maximizing a website’s visibility in search results involves navigating the shifting boundaries between ‘good’ and ‘bad’ optimization. Specifically, I am interested in the ethical work performed as SEO consultants artfully arrange themselves to cope with moral ambiguities provoked and delegated by the operators of the search engine. Building on studies of ethics as a practical accomplishment, I suggest that the ethicality of optimization has itself become a site of governance and contestation. Studying such practices of ‘being ethical’ not only offers opportunities for rethinking popular tropes like ‘gaming the system’, but also draws attention to often-overlooked struggles for authority at the margins of contemporary ranking schemes.


2017 ◽  
Author(s):  
Xi Zhu ◽  
Xiangmiao Qiu ◽  
Dingwang Wu ◽  
Shidong Chen ◽  
Jiwen Xiong ◽  
...  

BACKGROUND All electronic health practices like app/software are involved in web search engine due to its convenience for receiving information. The success of electronic health has link with the success of web search engines in field of health. Yet information reliability from search engine results remains to be evaluated. A detail analysis can find out setbacks and bring inspiration. OBJECTIVE Find out reliability of women epilepsy related information from the searching results of main search engines in China. METHODS Six physicians conducted the search work every week. Search key words are one kind of AEDs (valproate acid/oxcarbazepine/levetiracetam/ lamotrigine) plus "huaiyun"/"renshen", both of which means pregnancy in Chinese. The search were conducted in different devices (computer/cellphone), different engines (Baidu/Sogou/360). Top ten results of every search result page were included. Two physicians classified every results into 9 categories according to their contents and also evaluated the reliability. RESULTS A total of 16411 searching results were included. 85.1% of web pages were with advertisement. 55% were categorized into question and answers according to their contents. Only 9% of the searching results are reliable, 50.7% are partly reliable, 40.3% unreliable. With the ranking of the searching results higher, advertisement up and the proportion of those unreliable increase. All contents from hospital websites are unreliable at all and all from academic publishing are reliable. CONCLUSIONS Several first principles must be emphasized to further the use of web search engines in field of healthcare. First, identification of registered physicians and development of an efficient system to guide the patients to physicians guarantee the quality of information provided. Second, corresponding department should restrict the excessive advertisement sale trades in healthcare area by specific regulations to avoid negative impact on patients. Third, information from hospital websites should be carefully judged before embracing them wholeheartedly.


Author(s):  
Ali Shiri ◽  
Lydia Zvyagintseva

The purpose of this study is to examine the performance of dynamic query suggestion in three popular web search engines, namely Google, Yahoo! and Bing. Using the TREC Web Track topics, this study conducts a comparative examination of the number, type and variations in the query term suggestions provided by the Web search engines.Le but de cette étude est d'examiner la performance des suggestions de requête active dans trois moteurs de recherche Web populaires, à savoir Google, Yahoo! et Bing. En utilisant les thèmes proposés sur le TREC (Text Retrieval Conference), cette étude procède à un examen comparatif du nombre, du type et des variations dans les suggestions de termes de requête fournis par les moteurs de recherche Web.


Sign in / Sign up

Export Citation Format

Share Document