commercial search engine
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 4)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
Weijian Ni ◽  
Tong Liu ◽  
Qingtian Zeng ◽  
Nengfu Xie

Domain terminologies are a basic resource for various natural language processing tasks. To automatically discover terminologies for a domain of interest, most traditional approaches mostly rely on a domain-specific corpus given in advance; thus, the performance of traditional approaches can only be guaranteed when collecting a high-quality domain-specific corpus, which requires extensive human involvement and domain expertise. In this article, we propose a novel approach that is capable of automatically mining domain terminologies using search engine's query log—a type of domain-independent corpus of higher availability, coverage, and timeliness than a manually collected domain-specific corpus. In particular, we represent query log as a heterogeneous network and formulate the task of mining domain terminology as transductive learning on the heterogeneous network. In the proposed approach, the manifold structure of domain-specificity inherent in query log is captured by using a novel network embedding algorithm and further exploited to reduce the need for the manual annotation efforts for domain terminology classification. We select Agriculture and Healthcare as the target domains and experiment using a real query log from a commercial search engine. Experimental results show that the proposed approach outperforms several state-of-the-art approaches.


2021 ◽  
pp. 1-17
Author(s):  
Qian Guo ◽  
Wei Chen ◽  
Huaiyu Wan

Abstract Personalized search is a promising way to improve the quality of web search, and it has attracted much attention from both academic and industrial communities. Much of the current related research is based on commercial search engine data, which can not be released publicly for such reasons as privacy protection and information security. This leads to a serious lack of accessible public datasets in this field. The few available datasets though released to the public have not become widely used in academia due to the complexity of the processing process. The lack of datasets together with the difficulties of data processing have brought obstacles to fair comparison and evaluation of personalized search models. In this paper, we constructed a large-scale dataset AOL4PS to evaluate personalized search methods, collected and processed from AOL query logs. We present the complete and detailed data processing and construction process. Specifically, to address the challenges of processing time and storage space demands brought by massive data volumes, we optimized the process of dataset construction and proposed an improved BM25 algorithm. Experiments are performed on AOL4PS with some classic and state-of-the-art personalized search methods, and the experiment results demonstrate that AOL4PS can measure the effect of personalized search models. AOL4PS is publicly available at http://github.com/wanhuaiyu/AOL4PS.


2020 ◽  
Vol 34 (05) ◽  
pp. 9146-9153
Author(s):  
Bingning Wang ◽  
Ting Yao ◽  
Qi Zhang ◽  
Jingfang Xu ◽  
Xiaochuan Wang

This paper presents the ReCO, a human-curated Chinese Reading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents. Finally, an abstractive yes/no/uncertain answer was given by the crowdworkers. The release of ReCO consists of 300k questions that to our knowledge is the largest in Chinese reading comprehension. A prominent characteristic of ReCO is that in addition to the original context paragraph, we also provided the support evidence that could be directly used to answer the question. Quality analysis demonstrates the challenge of ReCO that it requires various types of reasoning skills such as causal inference, logical reasoning, etc. Current QA models that perform very well on many question answering problems, such as BERT (Devlin et al. 2018), only achieves 77% accuracy on this dataset, a large margin behind humans nearly 92% performance, indicating ReCO present a good challenge for machine reading comprehension. The codes, dataset and leaderboard will be freely available at https://github.com/benywon/ReCO.


2018 ◽  
Vol 42 (1) ◽  
pp. 87-109
Author(s):  
Maria Jakovljevic ◽  
Alfred Coleman

<div>This study presents the construction of a niche search engine, whose search topic domain is to be user-defined. &nbsp;The specific focus of this study is the investigation of the role that a Support Vector Machine plays when classifying textual data from web pages. Furthermore, the aim is to establish whether this niche search engine can return results that are more relevant to a user than when compared to those returned by a commercial search engine Through the conduction of various experiments across a number of appropriate datasets, the suitability of the SVM to classify web pages has been proven to meet the needs of a niche search engine. A subset of the most useful webpage-specific features has been discovered, with the best performing feature being a web pages’ Text &amp; Title component. The user defined niche search engine was successfully designed and an experiment showed that it returned more relevant results than a commercial search engine.<div> </div></div>


2016 ◽  
Vol 34 (4) ◽  
pp. 566-584 ◽  
Author(s):  
Greta Kliewer ◽  
Amalia Monroe-Gulick ◽  
Stephanie Gamble ◽  
Erik Radio

Purpose The purpose of this paper is to observe how undergraduate students approach open-ended searching for a research assignment, specifically as it affected their use of the discovery interface Primo. Design/methodology/approach In total, 30 undergraduate students were provided with a sample research assignment and instructed to find resources for it using web tools of their choice, followed by the Primo discovery tool. Students were observed for 30 minutes. A survey was provided at the end to solicit additional feedback. Sources students found were evaluated for relevance and utility. Findings Students expressed a high level of satisfaction with Primo despite some difficulty navigating through more complicated tasks. Despite their interest in the tool and previous exposure to it, it was usually not the first discovery tool students used when given the research assignment. Students approached the open-ended search environment much like they would with a commercial search engine. Originality/value This paper focused on an open-ended search environment as opposed to a known-item scenario in order to assess students’ preferences for web search tools and how a library discovery layer such as Primo was a part of that situation. Evaluation of the resources students found relevant were also analyzed to determine to what degree the students understood the level of quality they exhibited and from which tool they were obtained.


2014 ◽  
Vol 971-973 ◽  
pp. 1870-1873
Author(s):  
Xiao Gang Dong

Web search engine based on DNS, the standard proposed solution of IETF for public web search system, is introduced in this paper. Now no web search engine can cover more than 60 percent of all the pages on Internet. The update interval of most pages database is almost one month. This condition hasn't changed for many years. Converge and recency problems have become the bottleneck problem of current web search engine. To solve these problems, a new system, search engine based on DNS is proposed in this paper. This system adopts the hierarchical distributed architecture like DNS, which is different from any current commercial search engine. In theory, this system can cover all the web pages on Internet. Its update interval could even be one day. The original idea, detailed content and implementation of this system all are introduced in this paper.


2012 ◽  
pp. 467-482
Author(s):  
Isak Taksa ◽  
Sarah Zelikovitz ◽  
Amanda Spink

Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical classification of search queries that focuses on two specific areas of machine learning: short text classification and limited manual labeling. Typically, search queries are short, display little class specific information per single query and are therefore a weak source for traditional machine learning. To improve the effectiveness of the classification process the chapter introduces background knowledge discovery by using information retrieval techniques. The proposed approach is applied to a task of age classification of a corpus of queries from a commercial search engine. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters.


Author(s):  
Isak Taksa ◽  
Sarah Zelikovitz ◽  
Amanda Spink

Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical classification of search queries that focuses on two specific areas of machine learning: short text classification and limited manual labeling. Typically, search queries are short, display little class specific information per single query and are therefore a weak source for traditional machine learning. To improve the effectiveness of the classification process the chapter introduces background knowledge discovery by using information retrieval techniques. The proposed approach is applied to a task of age classification of a corpus of queries from a commercial search engine. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters.


Sign in / Sign up

Export Citation Format

Share Document