scholarly journals Vertical intent prediction approach based on Doc2vec and convolutional neural networks for improving vertical selection in aggregated search

Author(s):  
Sanae Achsas ◽  
El Habib Nfaoui

Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.

2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


Author(s):  
Qiaozhu Mei ◽  
Dragomir Radev

This chapter is a basic introduction to text information retrieval. Information Retrieval (IR) refers to the activities of obtaining information resources (usually in the form of textual documents) from a much larger collection, which are relevant to an information need of the user (usually expressed as a query). Practical instances of an IR system include digital libraries and Web search engines. This chapter presents the typical architecture of an IR system, an overview of the methods corresponding to the design and the implementation of each major component of an information retrieval system, a discussion of evaluation methods for an IR system, and finally a summary of recent developments and research trends in the field of information retrieval.


2020 ◽  
Vol 34 (05) ◽  
pp. 8131-8138
Author(s):  
Anne Lauscher ◽  
Goran Glavaš ◽  
Simone Paolo Ponzetto ◽  
Ivan Vulić

Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones. Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly without degradation of semantic information encoded in any of the input distributional spaces. Moreover, we successfully transfer debiasing models, by means of cross-lingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.


2016 ◽  
Vol 7 (1) ◽  
pp. 33-49 ◽  
Author(s):  
Suruchi Chawla

In this paper novel method is proposed using hybrid of Genetic Algorithm (GA) and Back Propagation (BP) Artificial Neural Network (ANN) for learning of classification of user queries to cluster for effective Personalized Web Search. The GA- BP ANN has been trained offline for classification of input queries and user query session profiles to a specific cluster based on clustered web query sessions. Thus during online web search, trained GA –BP ANN is used for classification of new user queries to a cluster and the selected cluster is used for web page recommendations. This process of classification and recommendations continues till search is effectively personalized to the information need of the user. Experiment was conducted on the data set of web user query sessions to evaluate the effectiveness of Personalized Web Search using GA optimized BP ANN and the results confirm the improvement in the precision of search results.


2013 ◽  
Vol 303-306 ◽  
pp. 1420-1425
Author(s):  
Qiang Pu ◽  
Ahmed Lbath ◽  
Da Qing He

Mobile personalized web search has been introduced for the purpose of distinguishing mobile user's personal different search interest. We first take the user's location information into account to do a geographic query expansion, then present an approach to personalizing web search for mobile users within language modeling framework. We estimate a user mixed model estimated according to both activated ontological topic model-based feedback and user interest model to re-rank the results from geographic query expansion. Experiments show that language model based re-ranking method is effective in presenting more relevant documents on the top retrieved results to mobile users. The main contribution of the improvements comes from the consideration of geographic information, ontological topic information and user interests together to find more relevant documents for satisfying their personal information need.


2020 ◽  
pp. 004728752092124 ◽  
Author(s):  
Wolfram Höpken ◽  
Tobias Eberle ◽  
Matthias Fuchs ◽  
Maria Lexhagen

Because of high fluctuations of tourism demand, accurate predictions of tourist arrivals are of high importance for tourism organizations. The study at hand presents an approach to enhance autoregressive prediction models by including travelers’ web search traffic as external input attribute for tourist arrival prediction. The study proposes a novel method to identify relevant search terms and to aggregate them into a compound web-search index, used as additional input of an autoregressive prediction approach. As methods to predict tourism arrivals, the study compares autoregressive integrated moving average (ARIMA) models with the machine learning–based technique artificial neural network (ANN). Study results show that (1) Google Trends data, mirroring traveler’s online search behavior (i.e., big data information source), significantly increase the performance of tourist arrival prediction compared to autoregressive approaches using past arrivals alone, and (2) the machine learning technique ANN has the capacity to outperform ARIMA models.


10.28945/2570 ◽  
2002 ◽  
Author(s):  
Anthony Scime ◽  
Colleen Powderly

A method to create more effective Web search queries is to combine elements of a semantic approach with a template that requests specific details about the searcher’s information need. Fundamental to this process is the use of semantics. Nouns, key phrases, and verbs are scored according to their frequency of use, then ranked as keywords and used to create the query. Key phrases and words in the query accurately represent the concepts of the text, generating search results that are significantly more accurate than those available using current methods.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Hooman Zabeti ◽  
Nick Dexter ◽  
Amir Hosein Safari ◽  
Nafiseh Sedaghat ◽  
Maxwell Libbrecht ◽  
...  

Abstract Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.


Sign in / Sign up

Export Citation Format

Share Document