Improving search results with data mining in a thematic search engine

2004 ◽  
Vol 31 (14) ◽  
pp. 2387-2404 ◽  
Author(s):  
M Caramia ◽  
G Felici ◽  
A Pezzoli
2021 ◽  
Vol 5 (2) ◽  
Author(s):  
Hannah C Cai ◽  
Leanne E King ◽  
Johanna T Dwyer

ABSTRACT We assessed the quality of online health and nutrition information using a Google™ search on “supplements for cancer”. Search results were scored using the Health Information Quality Index (HIQI), a quality-rating tool consisting of 12 objective criteria related to website domain, lack of commercial aspects, and authoritative nature of the health and nutrition information provided. Possible scores ranged from 0 (lowest) to 12 (“perfect” or highest quality). After eliminating irrelevant results, the remaining 160 search results had median and mean scores of 8. One-quarter of the results were of high quality (score of 10–12). There was no correlation between high-quality scores and early appearance in the sequence of search results, where results are presumably more visible. Also, 496 advertisements, over twice the number of search results, appeared. We conclude that the Google™ search engine may have shortcomings when used to obtain information on dietary supplements and cancer.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2006 ◽  
Vol 23 (5) ◽  
pp. 313-319
Author(s):  
Yogesh P Awate ◽  
Jagger Bodas ◽  
Sachin Deshpande ◽  
Pushpak Bhattacharyya

2019 ◽  
Vol 71 (1) ◽  
pp. 54-71 ◽  
Author(s):  
Artur Strzelecki

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.


Author(s):  
Novario Jaya Perdana

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.


2014 ◽  
Vol 2 (1) ◽  
pp. e13 ◽  
Author(s):  
Leo Anthony Celi ◽  
Andrew J Zimolzak ◽  
David J Stone

Author(s):  
Gong Cheng ◽  
Yuzhong Qu

The rapid development of the data Web is accompanied by increasing information needs from ordinary Web users for searching objects and their relations. To meet the challenge, this chapter presents Falcons Object Search, a keyword-based search engine for linked objects. To support various user needs expressed via keyword queries, for each object an extensive virtual document is indexed, which consists of not only associated literals but also the textual descriptions of associated links and linked objects. The resulting objects are ranked according to a combination of their relevance to the query and their popularity. For each resulting object, a query-relevant structured snippet is provided to show the associated literals and linked objects matched with the query for reflecting query relevance and even directly answering the question behind the query. To exploit ontological semantics for more precise search results, the type information of objects is leveraged to support class-based query refinement, and Web-scale class-inclusion reasoning is performed to discover implicit type information. Further, a subclass recommendation technique is proposed to allow users navigate class hierarchies for incremental results filtering. A task-based experiment demonstrates the promising features of the system.


Author(s):  
Aboubakr Aqle ◽  
Dena Al-Thani ◽  
Ali Jaoua

AbstractThere are limited studies that are addressing the challenges of visually impaired (VI) users when viewing search results on a search engine interface by using a screen reader. This study investigates the effect of providing an overview of search results to VI users. We present a novel interactive search engine interface called InteractSE to support VI users during the results exploration stage in order to improve their interactive experience and web search efficiency. An overview of the search results is generated using an unsupervised machine learning approach to present the discovered concepts via a formal concept analysis that is domain-independent. These concepts are arranged in a multi-level tree following a hierarchical order and covering all retrieved documents that share maximal features. The InteractSE interface was evaluated by 16 legally blind users and compared with the Google search engine interface for complex search tasks. The evaluation results were obtained based on both quantitative (as task completion time) and qualitative (as participants’ feedback) measures. These results are promising and indicate that InteractSE enhances the search efficiency and consequently advances user experience. Our observations and analysis of the user interactions and feedback yielded design suggestions to support VI users when exploring and interacting with search results.


Sign in / Sign up

Export Citation Format

Share Document