scholarly journals Semantic similarity measurement using historical google search patterns

2017 ◽  
Author(s):  
Jorge Martinez-Gil ◽  
José F. Aldana Montes

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.

Author(s):  
Bojan Furlan ◽  
Vladimir Sivački ◽  
Davor Jovanović ◽  
Boško Nikolić

This paper presents methods for measuring the semantic similarity of texts, where we evaluated different approaches based on existing similarity measures. On one side word similarity was calculated by processing large text corpuses and on the other, commonsense knowledgebase was used. Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, image captions or product descriptions), where commonsense knowledge has an important role, in this paper we focus on computing the similarity between two sentences or two short paragraphs by extending existing measures with information from the ConceptNet knowledgebase. On the other hand, an extensive research has been done in the field of corpus-based semantic similarity, so we also evaluated existing solutions by imposing some modifications. Through experiments performed on a paraphrase data set, we demonstrate that some of proposed approaches can improve the semantic similarity measurement of short text.


2021 ◽  
pp. 1-12
Author(s):  
Fuqiang Zhao ◽  
Zhengyu Zhu ◽  
Ping Han

To measure semantic similarity between words, a novel model DFRVec that encodes multiple semantic information of a word in WordNet into a vector space is presented in this paper. Firstly, three different sub-models are proposed: 1) DefVec: encoding the definitions of a word in WordNet; 2) FormVec: encoding the part-of-speech (POS) of a word in WordNet; 3) RelVec: encoding the relations of a word in WordNet. Then by combining the three sub-models with an existing word embedding, the new model for generating the vector of a word is proposed. Finally, based on DFRVec and the path information in WordNet, a new method DFRVec+Path to measure semantic similarity between words is presented. The experiments on ten benchmark datasets show that DFRVec+Path can outperform many existing methods on semantic similarity measurement.


Author(s):  
Abhijit Adhikari ◽  
Biswanath Dutta ◽  
Animesh Dutta ◽  
Deepjyoti Mondal

Author(s):  
Deepjyoti Mondal ◽  
Animesh Dutta ◽  
Abhijit Adhikari ◽  
Biswanath Dutta

2021 ◽  
pp. 192-203
Author(s):  
Jorge Martinez-Gil ◽  
Riad Mokadem ◽  
Josef Küng ◽  
Abdelkader Hameurlain

Sign in / Sign up

Export Citation Format

Share Document