commercial search engine Latest Research Papers

Domain terminologies are a basic resource for various natural language processing tasks. To automatically discover terminologies for a domain of interest, most traditional approaches mostly rely on a domain-specific corpus given in advance; thus, the performance of traditional approaches can only be guaranteed when collecting a high-quality domain-specific corpus, which requires extensive human involvement and domain expertise. In this article, we propose a novel approach that is capable of automatically mining domain terminologies using search engine's query log—a type of domain-independent corpus of higher availability, coverage, and timeliness than a manually collected domain-specific corpus. In particular, we represent query log as a heterogeneous network and formulate the task of mining domain terminology as transductive learning on the heterogeneous network. In the proposed approach, the manifold structure of domain-specificity inherent in query log is captured by using a novel network embedding algorithm and further exploited to reduce the need for the manual annotation efforts for domain terminology classification. We select Agriculture and Healthcare as the target domains and experiment using a real query log from a commercial search engine. Experimental results show that the proposed approach outperforms several state-of-the-art approaches.

Download Full-text

AOL4PS: A Large-Scale Dataset for Personalized Search

Data Intelligence ◽

10.1162/dint_a_00104 ◽

2021 ◽

pp. 1-17

Author(s):

Qian Guo ◽

Wei Chen ◽

Huaiyu Wan

Keyword(s):

Data Processing ◽

Large Scale ◽

Web Search ◽

Personalized Search ◽

Search Methods ◽

Search Models ◽

Large Scale Dataset ◽

Query Logs ◽

Commercial Search Engine ◽

Public Datasets

Abstract Personalized search is a promising way to improve the quality of web search, and it has attracted much attention from both academic and industrial communities. Much of the current related research is based on commercial search engine data, which can not be released publicly for such reasons as privacy protection and information security. This leads to a serious lack of accessible public datasets in this field. The few available datasets though released to the public have not become widely used in academia due to the complexity of the processing process. The lack of datasets together with the difficulties of data processing have brought obstacles to fair comparison and evaluation of personalized search models. In this paper, we constructed a large-scale dataset AOL4PS to evaluate personalized search methods, collected and processed from AOL query logs. We present the complete and detailed data processing and construction process. Specifically, to address the challenges of processing time and storage space demands brought by massive data volumes, we optimized the process of dataset construction and proposed an improved BM25 algorithm. Experiments are performed on AOL4PS with some classic and state-of-the-art personalized search methods, and the experiment results demonstrate that AOL4PS can measure the effect of personalized search models. AOL4PS is publicly available at http://github.com/wanhuaiyu/AOL4PS.

Download Full-text

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6450 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9146-9153

Author(s):

Bingning Wang ◽

Ting Yao ◽

Qi Zhang ◽

Jingfang Xu ◽

Xiaochuan Wang

Keyword(s):

Reading Comprehension ◽

Search Engine ◽

Large Scale ◽

Question Answering ◽

Quality Analysis ◽

Chinese Reading ◽

Large Margin ◽

Commercial Search Engine ◽

Machine Reading ◽

Support Evidence

This paper presents the ReCO, a human-curated Chinese Reading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that it requires various types of reasoning skills such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT (Devlin et al. 2018), only achieves 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO present a good challenge for machine reading comprehension. The codes, dataset and leaderboard will be freely available at https://github.com/benywon/ReCO.

Download Full-text

An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine

Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19 ◽

10.1145/3357384.3358012 ◽

2019 ◽

Author(s):

Yatao Yang ◽

Jun Tan ◽

Hongbo Deng ◽

Zibin Zheng ◽

Yutong Lu ◽

...

Keyword(s):

Search Engine ◽

Semantic Matching ◽

Commercial Search Engine

Download Full-text

The Use of Support Vector Machines When Designing a User-Defined Niche Search Engine

Journal of information and organizational sciences ◽

10.31341/jios.42.1.5 ◽

2018 ◽

Vol 42 (1) ◽

pp. 87-109

Author(s):

Maria Jakovljevic ◽

Alfred Coleman

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Search Engine ◽

Support Vector ◽

Web Pages ◽

Specific Focus ◽

Textual Data ◽

Vector Machines ◽

Commercial Search Engine

<div>This study presents the construction of a niche search engine, whose search topic domain is to be user-defined.  The specific focus of this study is the investigation of the role that a Support Vector Machine plays when classifying textual data from web pages. Furthermore, the aim is to establish whether this niche search engine can return results that are more relevant to a user than when compared to those returned by a commercial search engine Through the conduction of various experiments across a number of appropriate datasets, the suitability of the SVM to classify web pages has been proven to meet the needs of a niche search engine. A subset of the most useful webpage-specific features has been discovered, with the best performing feature being a web pages’ Text & Title component. The user defined niche search engine was successfully designed and an experiment showed that it returned more relevant results than a commercial search engine.<div> </div></div>

Download Full-text

Using Primo for undergraduate research: a usability study

Library Hi Tech ◽

10.1108/lht-05-2016-0052 ◽

2016 ◽

Vol 34 (4) ◽

pp. 566-584 ◽

Cited By ~ 6

Author(s):

Greta Kliewer ◽

Amalia Monroe-Gulick ◽

Stephanie Gamble ◽

Erik Radio

Keyword(s):

Undergraduate Students ◽

Web Search ◽

Undergraduate Research ◽

Usability Study ◽

Content Type ◽

Web Tools ◽

Level Of Satisfaction ◽

Commercial Search Engine ◽

High Level ◽

Additional Feedback

Purpose The purpose of this paper is to observe how undergraduate students approach open-ended searching for a research assignment, specifically as it affected their use of the discovery interface Primo. Design/methodology/approach In total, 30 undergraduate students were provided with a sample research assignment and instructed to find resources for it using web tools of their choice, followed by the Primo discovery tool. Students were observed for 30 minutes. A survey was provided at the end to solicit additional feedback. Sources students found were evaluated for relevance and utility. Findings Students expressed a high level of satisfaction with Primo despite some difficulty navigating through more complicated tasks. Despite their interest in the tool and previous exposure to it, it was usually not the first discovery tool students used when given the research assignment. Students approached the open-ended search environment much like they would with a commercial search engine. Originality/value This paper focused on an open-ended search environment as opposed to a known-item scenario in order to assess students’ preferences for web search tools and how a library discovery layer such as Primo was a part of that situation. Evaluation of the resources students found relevant were also analyzed to determine to what degree the students understood the level of quality they exhibited and from which tool they were obtained.

Download Full-text

Based on DNS of a Layered Web Search Engine Study

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.971-973.1870 ◽

2014 ◽

Vol 971-973 ◽

pp. 1870-1873

Author(s):

Xiao Gang Dong

Keyword(s):

Search Engine ◽

Web Search ◽

Web Pages ◽

Distributed Architecture ◽

Search System ◽

Web Search Engine ◽

Original Idea ◽

Commercial Search Engine ◽

New System ◽

The Web

Web search engine based on DNS, the standard proposed solution of IETF for public web search system, is introduced in this paper. Now no web search engine can cover more than 60 percent of all the pages on Internet. The update interval of most pages database is almost one month. This condition hasn't changed for many years. Converge and recency problems have become the bottleneck problem of current web search engine. To solve these problems, a new system, search engine based on DNS is proposed in this paper. This system adopts the hierarchical distributed architecture like DNS, which is different from any current commercial search engine. In theory, this system can cover all the web pages on Internet. Its update interval could even be one day. The original idea, detailed content and implementation of this system all are introduced in this paper.

Download Full-text

Machine Learning Approach to Search Query Classification

Machine Learning ◽

10.4018/978-1-60960-818-7.ch308 ◽

2012 ◽

pp. 467-482

Author(s):

Isak Taksa ◽

Sarah Zelikovitz ◽

Amanda Spink

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Hierarchical Classification ◽

Search Query ◽

Short Text ◽

Search Queries ◽

Tuning Parameters ◽

Commercial Search Engine ◽

Query Classification

Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical classification of search queries that focuses on two specific areas of machine learning: short text classification and limited manual labeling. Typically, search queries are short, display little class specific information per single query and are therefore a weak source for traditional machine learning. To improve the effectiveness of the classification process the chapter introduces background knowledge discovery by using information retrieval techniques. The proposed approach is applied to a task of age classification of a corpus of queries from a commercial search engine. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters.

Download Full-text

On judgments obtained from a commercial search engine

Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12 ◽

10.1145/2348283.2348496 ◽

2012 ◽

Cited By ~ 1

Author(s):

Emine Yilmaz ◽

Gabriella Kazai ◽

Nick Craswell ◽

Saied Mehrizi Tahaghoghi

Keyword(s):

Search Engine ◽

Commercial Search Engine

Download Full-text

Machine Learning Approach to Search Query Classification

Handbook of Research on Web Log Analysis ◽

10.4018/978-1-59904-974-8.ch016 ◽

2011 ◽

pp. 329-344

Author(s):

Isak Taksa ◽

Sarah Zelikovitz ◽

Amanda Spink

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Hierarchical Classification ◽

Search Query ◽

Short Text ◽

Search Queries ◽

Tuning Parameters ◽

Commercial Search Engine ◽

Query Classification

Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical classification of search queries that focuses on two specific areas of machine learning: short text classification and limited manual labeling. Typically, search queries are short, display little class specific information per single query and are therefore a weak source for traditional machine learning. To improve the effectiveness of the classification process the chapter introduces background knowledge discovery by using information retrieval techniques. The proposed approach is applied to a task of age classification of a corpus of queries from a commercial search engine. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters.

Download Full-text

commercial search engine
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Mining Domain Terminologies Using Search Engine's Query Log

AOL4PS: A Large-Scale Dataset for Personalized Search

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine

The Use of Support Vector Machines When Designing a User-Defined Niche Search Engine

Using Primo for undergraduate research: a usability study

Based on DNS of a Layered Web Search Engine Study

Machine Learning Approach to Search Query Classification

On judgments obtained from a commercial search engine

Machine Learning Approach to Search Query Classification

Export Citation Format

commercial search engineRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Mining Domain Terminologies Using Search Engine's Query Log

AOL4PS: A Large-Scale Dataset for Personalized Search

ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion

An Active and Deep Semantic Matching Framework for Query Rewrite in E-Commercial Search Engine

The Use of Support Vector Machines When Designing a User-Defined Niche Search Engine

Using Primo for undergraduate research: a usability study

Based on DNS of a Layered Web Search Engine Study

Machine Learning Approach to Search Query Classification

On judgments obtained from a commercial search engine

Machine Learning Approach to Search Query Classification

commercial search engine
Recently Published Documents