Improved Query Processing in Web Search Engines Using  Grey Wolf Algorithm

Previous work in literature has indicated that template of web pages represent noisy information in web collections, and advocate that the simple removal of template result in improvements in quality of results provided by Web search systems. In this paper, we study the impact of template removal in two distinct scenarios: large scale web search collections, which consist of several distinct websites, and intrasite web collections, involving searches inside of web sites. Our work is the first in literature to study the impact of template removal to search systems in large scale Web collections. The study was carried out using an automatic template detection method previously proposed by us. As contributions, we present statistics about the application of this automatic template detection method to the well known GOV2 reference collection, a large scale Web collection. We also present experiments comparing the amount of template detected by our automatic method to the ones obtained when humans select templates. And finally, experiments which indicate that, in both experimented scenarios, template removal does not improve the quality of results provided by search systems, but can play the role of an effective loss compression method by reducing the size of their indexes.

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Adaptive Peer-to-Peer Social Networks for Distributed Content-Based Web Search

Virtual Communities ◽

10.4018/978-1-60960-100-3.ch307 ◽

2011 ◽

pp. 762-784 ◽

Cited By ~ 1

Author(s):

Le-Shin Wu ◽

Ruj Akavipat ◽

Ana Gabriela Maguitman ◽

Filippo Menczer

Keyword(s):

Search Engines ◽

Web Search ◽

Random Network ◽

Routing Algorithm ◽

Small World ◽

Query Routing ◽

Learning Techniques ◽

Peer Network ◽

User Communities

This chapter proposed a collaborative peer network application called 6Search (6S) to address the scalability limitations of centralized search engines. Each peer crawls the Web in a focused way, guided by its user’s information context. Through this approach, better (distributed) coverage can be achieved. Each peer also acts as a search “servent” (server + client) by submitting and responding to queries to/from its neighbors. This search process has no centralized bottleneck. Peers depend on a local adaptive routing algorithm to dynamically change the topology of the peer network and search for the best neighbors to answer their queries. We present and evaluate learning techniques to improve local query routing. We validate prototypes of the 6S network via simulations with model users based on actual Web crawls. We find that the network topology rapidly converges from a random network to a small world network, with clusters emerging from user communities with shared interests. We finally compare the quality of the results with those obtained by centralized search engines such as Google.

Download Full-text

I/O-Conscious Data Preparation for Large-Scale Web Search Engines

VLDB '02: Proceedings of the 28th International Conference on Very Large Databases ◽

10.1016/b978-155860869-6/50041-x ◽

2002 ◽

pp. 382-393 ◽

Cited By ~ 1

Author(s):

Maxim Lifantsev ◽

Tzi-cker Chiueh

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Data Preparation ◽

Web Search Engines

Download Full-text

Adaptive Peer-to-Peer Social Networks for Distributed Content-Based Web Search

Social Information Retrieval Systems ◽

10.4018/978-1-59904-543-6.ch009 ◽

2011 ◽

pp. 155-178 ◽

Cited By ~ 2

Author(s):

Le-Shin Wu ◽

Ruj Akavipat ◽

Ana Gabriela Maguitman ◽

Filippo Menczer

Keyword(s):

Search Engines ◽

Web Search ◽

Random Network ◽

Routing Algorithm ◽

Small World ◽

Query Routing ◽

Learning Techniques ◽

Peer Network ◽

User Communities

This chapter proposed a collaborative peer network application called 6Search (6S) to address the scalability limitations of centralized search engines. Each peer crawls the Web in a focused way, guided by its user’s information context. Through this approach, better (distributed) coverage can be achieved. Each peer also acts as a search “servent” (server + client) by submitting and responding to queries to/from its neighbors. This search process has no centralized bottleneck. Peers depend on a local adaptive routing algorithm to dynamically change the topology of the peer network and search for the best neighbors to answer their queries. We present and evaluate learning techniques to improve local query routing. We validate prototypes of the 6S network via simulations with model users based on actual Web crawls. We find that the network topology rapidly converges from a random network to a small world network, with clusters emerging from user communities with shared interests. We finally compare the quality of the results with those obtained by centralized search engines such as Google.

Download Full-text

Deep Web

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch062 ◽

2009 ◽

pp. 581-588 ◽

Cited By ~ 5

Author(s):

Denis Shestakov

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Web Database ◽

Web Search Engine ◽

Search Form ◽

Complete Set ◽

Web Crawlers ◽

Pass Through ◽

The Web

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.

Download Full-text

Application of grey wolf optimization in order reduction of large scale LTI systems

2017 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON) ◽

10.1109/upcon.2017.8251132 ◽

2017 ◽

Cited By ~ 2

Author(s):

Upma Bhatnagar ◽

Abhishek Gupta

Keyword(s):

Large Scale ◽

Order Reduction ◽

Grey Wolf ◽

Grey Wolf Optimization

Download Full-text

Personal Web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalisation

First Monday ◽

10.5210/fm.v16i2.3344 ◽

2011 ◽

Cited By ~ 26

Author(s):

Martin Feuz ◽

Matthew Fuller ◽

Felix Stalder

Keyword(s):

Search Engines ◽

Web Search ◽

Web Searching ◽

Search Results ◽

Digital Methods ◽

Internet Searches ◽

The Relationship ◽

Search History ◽

Current Reality

Web search engines have become indispensable tools for finding information online effectively. As the range of information, context and users of Internet searches has grown, the relationship between the search query, search interest and user has become more tenuous. Not all users are seeking the same information, even if they use the same query term. Thus, the quality of search results has, at least potentially, been decreasing. Search engines have begun to respond to this problem by trying to personalise search in order to deliver more relevant results to the users. A query is now evaluated in the context of a user’s search history and other data compiled into a personal profile and associated with statistical groups. This, at least, is the promise stated by the search engines themselves. This paper tries to assess the current reality of the personalisation of search results. We analyse the mechanisms of personalisation in the case of Google web search by empirically testing three commonly held assumptions about what personalisation does. To do this, we developed new digital methods which are explained here. The findings suggest that Google personal search does not fully provide the much-touted benefits for its search users. More likely, it seems to serve the interest of advertisers in providing more relevant audiences to them.

Download Full-text