Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search

Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Information Retrieval

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.022 ◽

2016 ◽

Author(s):

Qiaozhu Mei ◽

Dragomir Radev

Keyword(s):

Information Retrieval ◽

Digital Libraries ◽

Web Search ◽

Retrieval System ◽

Information Retrieval System ◽

Information Need ◽

System A ◽

Recent Developments ◽

Text Information ◽

Text Information Retrieval

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.

Download Full-text

A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6325 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8131-8138

Author(s):

Anne Lauscher ◽

Goran Glavaš ◽

Simone Paolo Ponzetto ◽

Ivan Vulić

Keyword(s):

General Framework ◽

Semantic Information ◽

Evaluation Framework ◽

Vector Spaces ◽

Racial Biases ◽

Definition Of ◽

Cross Lingual ◽

Experimental Findings ◽

Implicit And Explicit ◽

Embedding Methods

Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones. Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly without degradation of semantic information encoded in any of the input distributional spaces. Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.

Download Full-text

A Framework for Personalizing Atypical Web Search Sessions with Concept-Based User Profiles Using Selective Machine Learning Techniques

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_23 ◽

2021 ◽

pp. 279-291

Author(s):

Pradeep Bedi ◽

S. B. Goyal ◽

Anand Singh Rajawat ◽

Rabindra Nath Shaw ◽

Ankush Ghosh

Keyword(s):

Machine Learning ◽

Web Search ◽

Machine Learning Techniques ◽

User Profiles ◽

Learning Techniques

Download Full-text

Application of Genetic Algorithm and Back Propagation Neural Network for Effective Personalize Web Search-Based on Clustered Query Sessions

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2016010103 ◽

2016 ◽

Vol 7 (1) ◽

pp. 33-49 ◽

Cited By ~ 2

Author(s):

Suruchi Chawla

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Web Search ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Information Need ◽

Data Set ◽

User Query ◽

User Queries

In this paper novel method is proposed using hybrid of Genetic Algorithm (GA) and Back Propagation (BP) Artificial Neural Network (ANN) for learning of classification of user queries to cluster for effective Personalized Web Search. The GA- BP ANN has been trained offline for classification of input queries and user query session profiles to a specific cluster based on clustered web query sessions. Thus during online web search, trained GA –BP ANN is used for classification of new user queries to a cluster and the selected cluster is used for web page recommendations. This process of classification and recommendations continues till search is effectively personalized to the information need of the user. Experiment was conducted on the data set of web user query sessions to evaluate the effectiveness of Personalized Web Search using GA optimized BP ANN and the results confirm the improvement in the precision of search results.

Download Full-text

Mobile Geographic Web Search Personalization with Language Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1420 ◽

2013 ◽

Vol 303-306 ◽

pp. 1420-1425

Author(s):

Qiang Pu ◽

Ahmed Lbath ◽

Da Qing He

Keyword(s):

Query Expansion ◽

Web Search ◽

Mixed Model ◽

Topic Model ◽

Language Model ◽

Mobile Users ◽

Information Need ◽

User Interest ◽

Modeling Framework ◽

Model Based

Mobile personalized web search has been introduced for the purpose of distinguishing mobile user's personal different search interest. We first take the user's location information into account to do a geographic query expansion, then present an approach to personalizing web search for mobile users within language modeling framework. We estimate a user mixed model estimated according to both activated ontological topic model-based feedback and user interest model to re-rank the results from geographic query expansion. Experiments show that language model based re-ranking method is effective in presenting more relevant documents on the top retrieved results to mobile users. The main contribution of the improvements comes from the consideration of geographic information, ontological topic information and user interests together to find more relevant documents for satisfying their personal information need.

Download Full-text

Clustering web search results using semantic information

2009 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2009.5212332 ◽

2009 ◽

Cited By ~ 1

Author(s):

Han Wen ◽

Guo-Shun Huang ◽

Zhao Li

Keyword(s):

Web Search ◽

Semantic Information ◽

Search Results

Download Full-text

Improving Tourist Arrival Prediction: A Big Data and Artificial Neural Network Approach

Journal of Travel Research ◽

10.1177/0047287520921244 ◽

2020 ◽

pp. 004728752092124 ◽

Cited By ~ 2

Author(s):

Wolfram Höpken ◽

Tobias Eberle ◽

Matthias Fuchs ◽

Maria Lexhagen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Big Data ◽

Web Search ◽

Prediction Models ◽

Arima Models ◽

Study Results ◽

Artificial Neural ◽

Prediction Approach

Because of high fluctuations of tourism demand, accurate predictions of tourist arrivals are of high importance for tourism organizations. The study at hand presents an approach to enhance autoregressive prediction models by including travelers’ web search traffic as external input attribute for tourist arrival prediction. The study proposes a novel method to identify relevant search terms and to aggregate them into a compound web-search index, used as additional input of an autoregressive prediction approach. As methods to predict tourism arrivals, the study compares autoregressive integrated moving average (ARIMA) models with the machine learning–based technique artificial neural network (ANN). Study results show that (1) Google Trends data, mirroring traveler’s online search behavior (i.e., big data information source), significantly increase the performance of tourist arrival prediction compared to autoregressive approaches using past arrivals alone, and (2) the machine learning technique ANN has the capacity to outperform ARIMA models.

Download Full-text

The Semanference System: Better Search Results through Better Queries

10.28945/2570 ◽

2002 ◽

Author(s):

Anthony Scime ◽

Colleen Powderly

Keyword(s):

Web Search ◽

Information Need ◽

Semantic Approach ◽

Search Queries ◽

Search Results ◽

Frequency Of Use ◽

Key Phrases

A method to create more effective Web search queries is to combine elements of a semantic approach with a template that requests specific details about the searcher’s information need. Fundamental to this process is the use of semantics. Nouns, key phrases, and verbs are scored according to their frequency of use, then ranked as keywords and used to create the query. Key phrases and words in the query accurately represent the concepts of the text, generating search results that are significantly more accurate than those available using current methods.

Download Full-text

INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00198-1 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Hooman Zabeti ◽

Nick Dexter ◽

Amir Hosein Safari ◽

Nafiseh Sedaghat ◽

Maxwell Libbrecht ◽

...

Keyword(s):

Machine Learning ◽

Drug Resistance ◽

Predictive Accuracy ◽

Group Testing ◽

Predictive Performance ◽

Machine Learning Techniques ◽

Evaluation Metrics ◽

Lower Accuracy ◽

Unseen Data ◽

The One

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.

Download Full-text