Word Embedding based Generalized Language Model for Information Retrieval

Author(s):  
Debasis Ganguly ◽  
Dwaipayan Roy ◽  
Mandar Mitra ◽  
Gareth J.F. Jones
Author(s):  
Jose Camacho-Collados ◽  
Luis Espinosa-Anke ◽  
Shoaib Jameel ◽  
Steven Schockaert

Recently a number of unsupervised approaches have been proposed for learning vectors that capture the relationship between two words. Inspired by word embedding models, these approaches rely on co-occurrence statistics that are obtained from sentences in which the two target words appear. However, the number of such sentences is often quite small, and most of the words that occur in them are not relevant for characterizing the considered relationship. As a result, standard co-occurrence statistics typically lead to noisy relation vectors. To address this issue, we propose a latent variable model that aims to explicitly determine what words from the given sentences best characterize the relationship between the two target words. Relation vectors then correspond to the parameters of a simple unigram language model which is estimated from these words.


Sign in / Sign up

Export Citation Format

Share Document