scholarly journals On Low Overlap among Search Results of Academic Search Engines

Author(s):  
Anasua Mitra ◽  
Amit Awekar
2013 ◽  
Vol 284-287 ◽  
pp. 3051-3055
Author(s):  
Lin Chih Chen

Academic search engines, such as Google Scholar and Scirus, provide a Web-based interface to effectively find relevant scientific articles to researchers. However, current academic search engines are lacking the ability to cluster the search results into a hierarchical tree structure. In this paper, we develop a post-search academic search engine by using a mixed clustering method. In this method, we first adopt a suffix tree clustering and a two-way hash mechanism to generate all meaningful labels. We then develop a divisive hierarchical clustering algorithm to organize the labels into a hierarchical tree. According to the results of experiments, we conclude that using our mixed clustering method to cluster the search results can give significant performance gains than current academic search engines. In this paper, we make two contributions. First, we present a high performance academic search engine based on our mixed clustering method. Second, we develop a divisive hierarchical clustering algorithm to organize all returned search results into a hierarchical tree structure.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2021 ◽  
Vol 11 (15) ◽  
pp. 7063
Author(s):  
Esmaeel Rezaee ◽  
Ali Mohammad Saghiri ◽  
Agostino Forestiero

With the increasing growth of different types of data, search engines have become an essential tool on the Internet. Every day, billions of queries are run through few search engines with several privacy violations and monopoly problems. The blockchain, as a trending technology applied in various fields, including banking, IoT, education, etc., can be a beneficial alternative. Blockchain-based search engines, unlike monopolistic ones, do not have centralized controls. With a blockchain-based search system, no company can lay claims to user’s data or access search history and other related information. All these data will be encrypted and stored on a blockchain. Valuing users’ searches and paying them in return is another advantage of a blockchain-based search engine. Additionally, in smart environments, as a trending research field, blockchain-based search engines can provide context-aware and privacy-preserved search results. According to our research, few efforts have been made to develop blockchain use, which include studies generally in the early stages and few white papers. To the best of our knowledge, no research article has been published in this regard thus far. In this paper, a survey on blockchain-based search engines is provided. Additionally, we state that the blockchain is an essential paradigm for the search ecosystem by describing the advantages.


Information ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 65
Author(s):  
Yuta Nemoto ◽  
Vitaly Klyuev

While users benefit greatly from the latest communication technology, with popular platforms such as social networking services including Facebook or search engines such as Google, scientists warn of the effects of a filter bubble at this time. A solution to escape from filtered information is urgently needed. We implement an approach based on the mechanism of a metasearch engine to present less-filtered information to users. We develop a practical application named MosaicSearch to select search results from diversified categories of sources collected from multiple search engines. To determine the power of MosaicSearch, we conduct an evaluation to assess retrieval quality. According to the results, MosaicSearch is more intelligent compared to other general-purpose search engines: it generates a smaller number of links while providing users with almost the same amount of objective information. Our approach contributes to transparent information retrieval. This application helps users play a main role in choosing the information they consume.


2019 ◽  
Vol 71 (1) ◽  
pp. 54-71 ◽  
Author(s):  
Artur Strzelecki

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.


Author(s):  
Novario Jaya Perdana

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.


2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>


2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>


Author(s):  
Natasha Tusikov

This chapter explains how the transnational regime uses search engines (especially Google) and domain name registrars (specifically GoDaddy) to throttle access to infringing sites. It traces efforts by the U.S. and U.K. governments, along with rights holders, to pressure Google and GoDaddy into adopting the non-binding agreements. It then presents two case studies. The first discusses search engines’ regulation of search results linking to infringing sites and a non-binding agreement struck among search engines (Google, Yahoo, and Microsoft) at the behest of the U.K. government. The second case study examines GoDaddy’s efforts to disable so-called illegal online pharmacies that operate in violation of U.S. federal and state laws. The chapter concludes that Internet firms’ practice of using chokepoints to dissuade access to targeted websites is highly problematic as legitimate websites are mistakenly targeted and sanctioned. Automated enforcement programs exacerbate this problem as they significantly increase the scale and speed of rights holders’ enforcement efforts without a corresponding increase in oversight.


Sign in / Sign up

Export Citation Format

Share Document