scholarly journals Techniques for Improving Web Search by  Understanding Queries

2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


Author(s):  
Martin Feuz ◽  
Matthew Fuller ◽  
Felix Stalder

Web search engines have become indispensable tools for finding information online effectively. As the range of information, context and users of Internet searches has grown, the relationship between the search query, search interest and user has become more tenuous. Not all users are seeking the same information, even if they use the same query term. Thus, the quality of search results has, at least potentially, been decreasing. Search engines have begun to respond to this problem by trying to personalise search in order to deliver more relevant results to the users. A query is now evaluated in the context of a user’s search history and other data compiled into a personal profile and associated with statistical groups. This, at least, is the promise stated by the search engines themselves. This paper tries to assess the current reality of the personalisation of search results. We analyse the mechanisms of personalisation in the case of Google web search by empirically testing three commonly held assumptions about what personalisation does. To do this, we developed new digital methods which are explained here. The findings suggest that Google personal search does not fully provide the much-touted benefits for its search users. More likely, it seems to serve the interest of advertisers in providing more relevant audiences to them.


Author(s):  
Anselm Spoerri

This paper analyzes which pages and topics are the most popular on Wikipedia and why. For the period of September 2006 to January 2007, the 100 most visited Wikipedia pages in a month are identified and categorized in terms of the major topics of interest. The observed topics are compared with search behavior on the Web. Search queries, which are identical to the titles of the most popular Wikipedia pages, are submitted to major search engines and the positions of popular Wikipedia pages in the top 10 search results are determined. The presented data helps to explain how search engines, and Google in particular, fuel the growth and shape what is popular on Wikipedia.


Author(s):  
Rotimi-Williams Bello ◽  
Firstman Noah Otobo

Search Engine Optimization (SEO) is a technique which helps search engines to find and rank one site over another in response to a search query. SEO thus helps site owners to get traffic from search engines. Although the basic principle of operation of all search engines is the same, the minor differences between them lead to major changes in results relevancy. Choosing the right keywords to optimize for is thus the first and most crucial step to a successful SEO campaign. In the context of SEO, keyword density can be used as a factor in determining whether a webpage is relevant to a specified keyword or keyword phrase. SEO is known for its contribution as a process that affects the online visibility of a website or a webpage in a web search engine's results. In general, the earlier (or higher ranked on the search results page), and more frequently a website appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. It is the objective of this paper to re-present black hat SEO technique as an unprofessional but profitable method of converting website users to customers. Having studied and understood white hat SEO, black hat SEO, gray hat SEO, crawling, indexing, processing and retrieving methods used by search engines as a web software program or web based script to search for documents and files for keywords over the internet to return the list of results containing those keywords; it would be seen that proper application of SEO gives website a better user experience, SEO helps build brand awareness through high rankings, SEO helps circumvent competition, and SEO gives room for high increased return on investment.


2011 ◽  
Vol 1 (1) ◽  
pp. 31-44 ◽  
Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method s feasibility and effectiveness.


Author(s):  
Rahul Pradhan ◽  
Dilip Kumar Sharma

Users issuing query on search engine, expect results to more relevant to query topic rather than just the textual match with text in query. Studies conducted by few researchers shows that user want the search engine to understand the implicit intent of query rather than looking the textual match in hypertext structure of document or web page. In this paper the authors will be addressing queries that have any temporal intent and help the web search engines to classify them in certain categories. These classes or categories will help search engine to understand and cater the need of query. The authors will consider temporal expression (e.g. 1943) in document and categories them on the basis of temporal boundary of that query. Their experiment classifies the query and tries to suggest further course of action for search engines. Results shows that classifying the query to these classes will help user to reach his/her seeking information faster.


Author(s):  
Shanfeng Zhu ◽  
Xiaotie Deng ◽  
Qizhi Fang ◽  
Weimin Zhang

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.


Sign in / Sign up

Export Citation Format

Share Document