Test-framework: performance profiling and testing web search engine on non factoid queries

Althaf Ali A; Mahammad Shafi R

doi:10.11591/ijeecs.v14.i3.pp1373-1381

Test-framework: performance profiling and testing web search engine on non factoid queries

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v14.i3.pp1373-1381 ◽

2019 ◽

Vol 14 (3) ◽

pp. 1373

Author(s):

Althaf Ali A ◽

Mahammad Shafi R

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Expectation Maximization Algorithm ◽

Training Data ◽

Test Cases ◽

Retrieval Method ◽

Performance Profiling ◽

Query Result ◽

Prediction Technique

<p>Performance profiling and testing is one of the interesting topics in the big data management and Cloud Computing. In testing, we use test cases composed to different type of queries to evaluate the performance aspects of the information retrieval system for large scale information collection. This test scenarioperforms the evaluation ofretrieval accuracy for all kind of ambiguity and non factoid queries with result set as Training data. This stands difficult to evaluate the retrieval method in order to schedule or optimize the Recommendation and prediction technique of the IR method to the Real time queries. The Queries is considered as requirement specification which has to supply to search engine or web information provider applications for information or web page retrieval. In this paper, we propose a novel technique named as “Test Retrieval Framework“a performance profiling and testing of the web search engines on the information retrieved towards non factoid queries. In this technique, we apply expectation maximization algorithm as an iterative method to find maximum likelihood estimate.We discuss on the important aspects in this work based on Recommendation models integrating domain and web usage, Query optimization for navigational and Transactional queries, Query Result records.The Experimental results demonstrates the proposed technique outperforms of state of arts approaches in terms of set based measures like Precision, Recall and F measure and rank based measures like Mean Average Precision and Cumulative Gain.</p>

Download Full-text

Learning to Explain Entity Relationships by Pairwise Ranking with Convolutional Neural Networks

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/561 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jizhou Huang ◽

Wei Zhang ◽

Shiqi Zhao ◽

Shiqiang Ding ◽

Haifeng Wang

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Training Data ◽

Data Annotation ◽

Web Search Engine ◽

Clickthrough Data ◽

Ranking Model ◽

Knowledge Graphs ◽

Entity Relationships

Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in large-scale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct large-scale training data by leveraging the query-title pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Maximum Variance Hashing via Column Generation

Mathematical Problems in Engineering ◽

10.1155/2013/379718 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10

Author(s):

Lei Luo ◽

Chao Zhang ◽

Yongrui Qin ◽

Chunyuan Zhang

Keyword(s):

Column Generation ◽

Large Scale ◽

Web Search ◽

Nearest Neighbor ◽

Computational Cost ◽

Multimedia Retrieval ◽

Training Data ◽

Nonlinear Dimensionality Reduction ◽

Maximum Variance ◽

Data Volume

With the explosive growth of the data volume in modern applications such as web search and multimedia retrieval, hashing is becoming increasingly important for efficient nearest neighbor (similar item) search. Recently, a number of data-dependent methods have been developed, reflecting the great potential of learning for hashing. Inspired by the classic nonlinear dimensionality reduction algorithm—maximum variance unfolding, we propose a novel unsupervised hashing method, named maximum variance hashing, in this work. The idea is to maximize the total variance of the hash codes while preserving the local structure of the training data. To solve the derived optimization problem, we propose a column generation algorithm, which directly learns the binary-valued hash functions. We then extend it using anchor graphs to reduce the computational cost. Experiments on large-scale image datasets demonstrate that the proposed method outperforms state-of-the-art hashing methods in many cases.

Download Full-text

A retrieval method for similar Q&A articles of web bulletin board with relevance index derived from commercial web search engine

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services - iiWAS '08 ◽

10.1145/1497308.1497416 ◽

2008 ◽

Author(s):

Masanori Akiyoshi ◽

Koichi Iwai ◽

Norihisa Komoda

Keyword(s):

Search Engine ◽

Web Search ◽

Bulletin Board ◽

Retrieval Method ◽

Web Search Engine ◽

Relevance Index

Download Full-text

Knowledge Graph Construction and Applications for Web Search and Beyond

Data Intelligence ◽

10.1162/dint_a_00019 ◽

2019 ◽

Vol 1 (4) ◽

pp. 333-349 ◽

Cited By ~ 1

Author(s):

Peilu Wang ◽

Hao Jiang ◽

Jingfang Xu ◽

Qi Zhang

Keyword(s):

Search Engine ◽

Intelligent Systems ◽

Large Scale ◽

Web Search ◽

Question Answering ◽

Knowledge Graph ◽

Graph Databases ◽

Distributed Search ◽

Knowledge Based ◽

Dialog System

Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialog system. These applications have been used in Web search products to help user acquire information more efficiently.

Download Full-text

A Hierarchical Cache Scheme for the Large-scale Web Search Engine

2008 Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing ◽

10.1109/snpd.2008.107 ◽

2008 ◽

Author(s):

Sungchae Lim ◽

Joonseon Ahn

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Web Search Engine ◽

Cache Scheme

Download Full-text

Web Search Engine Misinformation Notifier Extension (SEMiNExt): A Machine Learning Based Approach during COVID-19 Pandemic

Healthcare ◽

10.3390/healthcare9020156 ◽

2021 ◽

Vol 9 (2) ◽

pp. 156

Author(s):

Abdullah Bin Shams ◽

Ehsanul Hoque Apu ◽

Ashiqur Rahman ◽

Md. Mohsin Sarker Raihan ◽

Nazeeba Siddika ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Real Time ◽

Search Engine ◽

Language Processing ◽

Web Search ◽

Training Data ◽

Small Data ◽

Web Search Engine ◽

User Query

Misinformation such as on coronavirus disease 2019 (COVID-19) drugs, vaccination or presentation of its treatment from untrusted sources have shown dramatic consequences on public health. Authorities have deployed several surveillance tools to detect and slow down the rapid misinformation spread online. Large quantities of unverified information are available online and at present there is no real-time tool available to alert a user about false information during online health inquiries over a web search engine. To bridge this gap, we propose a web search engine misinformation notifier extension (SEMiNExt). Natural language processing (NLP) and machine learning algorithm have been successfully integrated into the extension. This enables SEMiNExt to read the user query from the search bar, classify the veracity of the query and notify the authenticity of the query to the user, all in real-time to prevent the spread of misinformation. Our results show that SEMiNExt under artificial neural network (ANN) works best with an accuracy of 93%, F1-score of 92%, precision of 92% and a recall of 93% when 80% of the data is trained. Moreover, ANN is able to predict with a very high accuracy even for a small training data size. This is very important for an early detection of new misinformation from a small data sample available online that can significantly reduce the spread of misinformation and maximize public health safety. The SEMiNExt approach has introduced the possibility to improve online health management system by showing misinformation notifications in real-time, enabling safer web-based searching on health-related issues.

Download Full-text

Reprint of: The anatomy of a large-scale hypertextual web search engine

Computer Networks ◽

10.1016/j.comnet.2012.10.007 ◽

2012 ◽

Vol 56 (18) ◽

pp. 3825-3833 ◽

Cited By ~ 435

Author(s):

Sergey Brin ◽

Lawrence Page

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Web Search Engine

Download Full-text

Challenges in Using Peer-to-Peer Structures in Order to Design a Large-Scale Web Search Engine

Communications in Computer and Information Science - Advances in Computer Science and Engineering ◽

10.1007/978-3-540-89985-3_57 ◽

2008 ◽

pp. 461-468

Author(s):

Hamid Mousavi ◽

Ali Movaghar

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Peer To Peer ◽

Web Search Engine

Download Full-text

Notice of Violation of IEEE Publication Principles: The Anatomy of a Large-Scale Hyper Textual Web Search Engine

2009 Second International Conference on Computer and Electrical Engineering ◽

10.1109/iccee.2009.59 ◽

2009 ◽

Cited By ~ 4

Author(s):

Umesh Sehgal ◽

Kuljeet Kaur ◽

Pawan Kumar

Keyword(s):

Search Engine ◽

Large Scale ◽

Web Search ◽

Web Search Engine

Download Full-text