An architecture for non-linear discovery of aggregated multimedia document web search results

Searching web documents using a summarization approach

International Journal of Web Information Systems ◽

10.1108/ijwis-11-2015-0039 ◽

2016 ◽

Vol 12 (1) ◽

pp. 83-101 ◽

Cited By ~ 6

Author(s):

Rani Qumsiyeh ◽

Yiu-Kai Ng

Keyword(s):

Search Engines ◽

Web Search ◽

Specific Information ◽

Information Need ◽

Search Query ◽

Content Type ◽

Additional Information ◽

Search Results ◽

User Query ◽

Web Search Engines

Purpose The purpose of this paper is to introduce a summarization method to enhance the current web-search approaches by offering a summary of each clustered set of web-search results with contents addressing the same topic, which should allow the user to quickly identify the information covered in the clustered search results. Web search engines, such as Google, Bing and Yahoo!, rank the set of documents S retrieved in response to a user query and represent each document D in S using a title and a snippet, which serves as an abstract of D. Snippets, however, are not as useful as they are designed for, i.e. assisting its users to quickly identify results of interest. These snippets are inadequate in providing distinct information and capture the main contents of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is very difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user’s intended request without requiring additional information. Furthermore, a document title is not always a good indicator of the content of the corresponding document either. Design/methodology/approach The authors propose to develop a query-based summarizer, called QSum, in solving the existing problems of Web search engines which use titles and abstracts in capturing the contents of retrieved documents. QSum generates a concise/comprehensive summary for each cluster of documents retrieved in response to a user query, which saves the user’s time and effort in searching for specific information of interest by skipping the step to browse through the retrieved documents one by one. Findings Experimental results show that QSum is effective and efficient in creating a high-quality summary for each cluster to enhance Web search. Originality/value The proposed query-based summarizer, QSum, is unique based on its searching approach. QSum is also a significant contribution to the Web search community, as it handles the ambiguous problem of a search query by creating summaries in response to different interpretations of the search which offer a “road map” to assist users to quickly identify information of interest.

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482.v1 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Personal Web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalisation

First Monday ◽

10.5210/fm.v16i2.3344 ◽

2011 ◽

Cited By ~ 26

Author(s):

Martin Feuz ◽

Matthew Fuller ◽

Felix Stalder

Keyword(s):

Search Engines ◽

Web Search ◽

Web Searching ◽

Search Results ◽

Digital Methods ◽

Internet Searches ◽

The Relationship ◽

Search History ◽

Current Reality

Web search engines have become indispensable tools for finding information online effectively. As the range of information, context and users of Internet searches has grown, the relationship between the search query, search interest and user has become more tenuous. Not all users are seeking the same information, even if they use the same query term. Thus, the quality of search results has, at least potentially, been decreasing. Search engines have begun to respond to this problem by trying to personalise search in order to deliver more relevant results to the users. A query is now evaluated in the context of a user’s search history and other data compiled into a personal profile and associated with statistical groups. This, at least, is the promise stated by the search engines themselves. This paper tries to assess the current reality of the personalisation of search results. We analyse the mechanisms of personalisation in the case of Google web search by empirically testing three commonly held assumptions about what personalisation does. To do this, we developed new digital methods which are explained here. The findings suggest that Google personal search does not fully provide the much-touted benefits for its search users. More likely, it seems to serve the interest of advertisers in providing more relevant audiences to them.

What is popular on Wikipedia and why?

First Monday ◽

10.5210/fm.v12i4.1765 ◽

2007 ◽

Cited By ~ 29

Author(s):

Anselm Spoerri

Keyword(s):

Search Engines ◽

Web Search ◽

Search Behavior ◽

Search Queries ◽

Search Results ◽

The Web

This paper analyzes which pages and topics are the most popular on Wikipedia and why. For the period of September 2006 to January 2007, the 100 most visited Wikipedia pages in a month are identified and categorized in terms of the major topics of interest. The observed topics are compared with search behavior on the Web. Search queries, which are identical to the titles of the most popular Wikipedia pages, are submitted to major search engines and the positions of popular Wikipedia pages in the top 10 search results are determined. The presented data helps to explain how search engines, and Google in particular, fuel the growth and shape what is popular on Wikipedia.

The Semanference System: Better Search Results through Better Queries

10.28945/2570 ◽

2002 ◽

Author(s):

Anthony Scime ◽

Colleen Powderly

Keyword(s):

Web Search ◽

Information Need ◽

Semantic Approach ◽

Search Queries ◽

Search Results ◽

Frequency Of Use ◽

Key Phrases

A method to create more effective Web search queries is to combine elements of a semantic approach with a template that requests specific details about the searcher’s information need. Fundamental to this process is the use of semantics. Nouns, key phrases, and verbs are scored according to their frequency of use, then ranked as keywords and used to create the query. Key phrases and words in the query accurately represent the concepts of the text, generating search results that are significantly more accurate than those available using current methods.

Conversion of Website Users to Customers-The Black Hat SEO Technique

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i6.714 ◽

2018 ◽

Vol 8 (6) ◽

pp. 29 ◽

Cited By ~ 1

Author(s):

Rotimi-Williams Bello ◽

Firstman Noah Otobo

Keyword(s):

Search Engines ◽

Return On Investment ◽

Web Search ◽

Brand Awareness ◽

The Internet ◽

Search Query ◽

Web Based ◽

Search Results ◽

Rank One ◽

The Right

Search Engine Optimization (SEO) is a technique which helps search engines to find and rank one site over another in response to a search query. SEO thus helps site owners to get traffic from search engines. Although the basic principle of operation of all search engines is the same, the minor differences between them lead to major changes in results relevancy. Choosing the right keywords to optimize for is thus the first and most crucial step to a successful SEO campaign. In the context of SEO, keyword density can be used as a factor in determining whether a webpage is relevant to a specified keyword or keyword phrase. SEO is known for its contribution as a process that affects the online visibility of a website or a webpage in a web search engine's results. In general, the earlier (or higher ranked on the search results page), and more frequently a website appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. It is the objective of this paper to re-present black hat SEO technique as an unprofessional but profitable method of converting website users to customers. Having studied and understood white hat SEO, black hat SEO, gray hat SEO, crawling, indexing, processing and retrieving methods used by search engines as a web software program or web based script to search for documents and files for keywords over the internet to return the list of results containing those keywords; it would be seen that proper application of SEO gives website a better user experience, SEO helps build brand awareness through high rankings, SEO helps circumvent competition, and SEO gives room for high increased return on investment.

A Roadmap to Integrate Document Clustering in Information Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011010103 ◽

2011 ◽

Vol 1 (1) ◽

pp. 31-44 ◽

Cited By ~ 1

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Enabling Technology ◽

Clustering Techniques ◽

Search Results ◽

The World

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method s feasibility and effectiveness.