Identifying comparable entities with indirectly associative relations and word embeddings from web search logs

Author(s):

Kamal Al-Sabahi ◽

Zhang Zuping

Keyword(s):

Language Processing ◽

Web Search ◽

Question Answering ◽

Information Overload ◽

Good Representation ◽

Intelligence Analysis ◽

Word Embeddings ◽

Question Answering Systems ◽

Active Research ◽

News Recommendation

In the era of information overload, text summarization has become a focus of attention in a number of diverse fields such as, question answering systems, intelligence analysis, news recommendation systems, search results in web search engines, and so on. A good document representation is the key point in any successful summarizer. Learning this representation becomes a very active research in natural language processing field (NLP). Traditional approaches mostly fail to deliver a good representation. Word embedding has proved an excellent performance in learning the representation. In this paper, a modified BM25 with Word Embeddings are used to build the sentence vectors from word vectors. The entire document is represented as a set of sentence vectors. Then, the similarity between every pair of sentence vectors is computed. After that, TextRank, a graph-based model, is used to rank the sentences. The summary is generated by picking the top-ranked sentences according to the compression rate. Two well-known datasets, DUC2002 and DUC2004, are used to evaluate the models. The experimental results show that the proposed models perform comprehensively better compared to the state-of-the-art methods.

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

International Journal of Software Innovation ◽

10.4018/ijsi.2018070105 ◽

2018 ◽

Vol 6 (3) ◽

pp. 67-78

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro

Keyword(s):

Search Engine ◽

Information Needs ◽

Web Search ◽

Topic Model ◽

Japanese Version ◽

Word Embedding ◽

Coarse Grained ◽

Web Pages ◽

Word Embeddings

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Using Web Search Logs to Identify Query Classification Terms

Fourth International Conference on Information Technology (ITNG'07) ◽

10.1109/itng.2007.202 ◽

2007 ◽

Author(s):

Isak Taksa ◽

Sarah Zelikovitz ◽

Amanda Spink

Keyword(s):

Web Search ◽

Query Classification ◽

Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 ◽

Learn from web search logs to organize search results

10.1145/1277741.1277759 ◽

2007 ◽

Cited By ~ 68

Author(s):

Xuanhui Wang ◽

ChengXiang Zhai

Keyword(s):

Web Search ◽

Search Results ◽

BEYOND RANKED LISTS IN WEB SEARCH: AGGREGATING WEB CONTENT INTO TOPIC PAGES

International Journal of Semantic Computing ◽

10.1142/s1793351x10001103 ◽

2010 ◽

Vol 04 (04) ◽

pp. 509-534 ◽

Cited By ~ 3

Author(s):

NIRANJAN BALASUBRAMANIAN ◽

SILVIU CUCERZAN

Keyword(s):

Web Search ◽

Automatic Generation ◽

Selection Method ◽

Web Content ◽

Search Results ◽

Aggregate Information ◽

Search Logs ◽

The Web

We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. Topic pages explicitly aggregate information across documents, filter redundancy, and promote diversity of topical aspects. We propose a novel framework for building rich topical aspect models and selecting diverse information from the Web. In particular, we use Web search logs to build aspect models with various degrees of specificity, and then employ these aspect models as input to a sentence selection method that identifies relevant and non-redundant sentences from the Web. Automatic and manual evaluations on biographical topics show that topic pages built by our system compare favorably to regular Web search results and to MDS-style summaries of the Web results on all metrics employed.

Using web search logs to identify query classification terms

International Journal of Web Information Systems ◽

10.1108/17440080710848107 ◽

2007 ◽

Vol 3 (4) ◽

pp. 315-327 ◽

Author(s):

Isak Taksa ◽

Sarah Zelikovitz ◽

Amanda Spink

Keyword(s):

Web Search ◽

Query Classification ◽

Processing and Analysis of Search Query Logs in Chinese

Handbook of Research on Web Log Analysis ◽

10.4018/978-1-59904-974-8.ch019 ◽

2011 ◽

pp. 378-388 ◽

Author(s):

Michael Chau ◽

Yan Lu ◽

Xiao Fang ◽

Christopher C. Yang

Keyword(s):

World Wide ◽

Web Search ◽

Searching Behavior ◽

Web Searching ◽

Search Queries ◽

Web Search Engine ◽

The World ◽

Query Logs ◽

Search Logs ◽

The Web

More non-English contents are now available on the World Wide Web and the number of non-English users on the Web is increasing. While it is important to understand the Web searching behavior of these non-English users, many previous studies on Web query logs have focused on analyzing English search logs and their results may not be directly applied to other languages. In this Chapter we discuss some methods and techniques that can be used to analyze search queries in Chinese. We also show an example of applying our methods on a Chinese Web search engine. Some interesting findings are reported.

Detecting Hot Events from Web Search Logs

Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/978-3-642-14246-8_41 ◽

2010 ◽

pp. 417-428 ◽

Author(s):

Yingqin Gu ◽

Jianwei Cui ◽

Hongyan Liu ◽

Xuan Jiang ◽

Jun He ◽

...

Keyword(s):

Web Search ◽

Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search Personalization

ACM Transactions on Information Systems ◽

10.1145/3476106 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-24

Author(s):

Jiashu Zhao ◽

Jimmy Xiangji Huang ◽

Hongbo Deng ◽

Yi Chang ◽

Long Xia

Keyword(s):

Probabilistic Model ◽

Large Scale ◽

Web Search ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

User Profile ◽

New Approach ◽

Latent Topic ◽

Search History ◽

In this article, we propose a Latent Dirichlet Allocation– (LDA) based topic-graph probabilistic personalization model for Web search. This model represents a user graph in a latent topic graph and simultaneously estimates the probabilities that the user is interested in the topics, as well as the probabilities that the user is not interested in the topics. For a given query issued by the user, the webpages that have higher relevancy to the interested topics are promoted, and the webpages more relevant to the non-interesting topics are penalized. In particular, we simulate a user’s search intent by building two profiles: A positive user profile for the probabilities of the user is interested in the topics and a corresponding negative user profile for the probabilities of being not interested in the the topics. The profiles are estimated based on the user’s search logs. A clicked webpage is assumed to include interesting topics. A skipped (viewed but not clicked) webpage is assumed to cover some non-interesting topics to the user. Such estimations are performed in the latent topic space generated by LDA. Moreover, a new approach is proposed to estimate the correlation between a given query and the user’s search history so as to determine how much personalization should be considered for the query. We compare our proposed models with several strong baselines including state-of-the-art personalization approaches. Experiments conducted on a large-scale real user search log collection illustrate the effectiveness of the proposed models.

The Methodology of Search Log Analysis

Handbook of Research on Web Log Analysis ◽

10.4018/978-1-59904-974-8.ch006 ◽

2011 ◽

pp. 100-123 ◽

Cited By ~ 2

Author(s):

Bernard J. Jansen

Keyword(s):

Information System ◽

System Design ◽

Web Search ◽

Log Analysis ◽

Web Searching ◽

Information Searching ◽

Information System Design ◽

Analysis Methodology ◽

Three Stages ◽

Exploiting the data stored in search logs of Web search engines, Intranets, and Websites can provide important insights into understanding the information searching tactics of online searchers. This understanding can inform information system design, interface development, and information architecture construction for content collections. This chapter presents a review of and foundation for conducting Web search transaction log analysis. A search log analysis methodology is outlined consisting of three stages (i.e., collection, preparation, and analysis). The three stages of the methodology are presented in detail with discussions of the goals, metrics, and processes at each stage. The critical terms in transaction log analysis for Web searching are defined. Suggestions are provided on ways to leverage the strengths and addressing the limitations of transaction log analysis for Web searching research.