scholarly journals Test-framework: performance profiling and testing web search engine on non factoid queries

Author(s):  
Althaf Ali A ◽  
Mahammad Shafi R

<p>Performance profiling and testing is one of the interesting topics in the big data management and Cloud Computing. In testing, we use test cases composed to different type of queries to evaluate the performance aspects of the information retrieval system for large scale information collection. This test scenarioperforms the evaluation ofretrieval accuracy for all kind of ambiguity and non factoid queries with result set as Training data. This stands difficult to evaluate the retrieval method in order to schedule or optimize the Recommendation and prediction technique of the IR method to the Real time queries. The Queries is considered as requirement specification which has to supply to search engine or web information provider applications for information or web page retrieval. In this paper, we propose a novel technique named as “Test Retrieval Framework“a performance profiling and testing of the web search engines on the information retrieved towards non factoid queries. In this technique, we apply expectation maximization algorithm as an iterative method to find maximum likelihood estimate.We discuss on the important aspects in this work based on Recommendation models integrating domain and web usage, Query optimization for navigational and Transactional queries, Query Result records.The Experimental results demonstrates the proposed technique outperforms of state of arts approaches in terms of set based measures like Precision, Recall and F measure and rank based measures like Mean Average Precision and Cumulative Gain.</p>

Author(s):  
Jizhou Huang ◽  
Wei Zhang ◽  
Shiqi Zhao ◽  
Shiqiang Ding ◽  
Haifeng Wang

Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in large-scale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct large-scale training data by leveraging the query-title pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2013 ◽  
Vol 2013 ◽  
pp. 1-10
Author(s):  
Lei Luo ◽  
Chao Zhang ◽  
Yongrui Qin ◽  
Chunyuan Zhang

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.


2019 ◽  
Vol 1 (4) ◽  
pp. 333-349 ◽  
Author(s):  
Peilu Wang ◽  
Hao Jiang ◽  
Jingfang Xu ◽  
Qi Zhang

Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialog system. These applications have been used in Web search products to help user acquire information more efficiently.


Healthcare ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 156
Author(s):  
Abdullah Bin Shams ◽  
Ehsanul Hoque Apu ◽  
Ashiqur Rahman ◽  
Md. Mohsin Sarker Raihan ◽  
Nazeeba Siddika ◽  
...  

Misinformation such as on coronavirus disease 2019 (COVID-19) drugs, vaccination or presentation of its treatment from untrusted sources have shown dramatic consequences on public health. Authorities have deployed several surveillance tools to detect and slow down the rapid misinformation spread online. Large quantities of unverified information are available online and at present there is no real-time tool available to alert a user about false information during online health inquiries over a web search engine. To bridge this gap, we propose a web search engine misinformation notifier extension (SEMiNExt). Natural language processing (NLP) and machine learning algorithm have been successfully integrated into the extension. This enables SEMiNExt to read the user query from the search bar, classify the veracity of the query and notify the authenticity of the query to the user, all in real-time to prevent the spread of misinformation. Our results show that SEMiNExt under artificial neural network (ANN) works best with an accuracy of 93%, F1-score of 92%, precision of 92% and a recall of 93% when 80% of the data is trained. Moreover, ANN is able to predict with a very high accuracy even for a small training data size. This is very important for an early detection of new misinformation from a small data sample available online that can significantly reduce the spread of misinformation and maximize public health safety. The SEMiNExt approach has introduced the possibility to improve online health management system by showing misinformation notifications in real-time, enabling safer web-based searching on health-related issues.


2012 ◽  
Vol 56 (18) ◽  
pp. 3825-3833 ◽  
Author(s):  
Sergey Brin ◽  
Lawrence Page

Sign in / Sign up

Export Citation Format

Share Document