Embeddings Evaluation Using a Novel Measure of Semantic Similarity

AbstractIn the field of scientometrics, the principal purpose for author co-citation analysis (ACA) is to map knowledge domains by quantifying the relationship between co-cited author pairs. However, traditional ACA has been criticized since its input is insufficiently informative by simply counting authors’ co-citation frequencies. To address this issue, this paper introduces a new method that reconstructs the raw co-citation matrices by regarding document unit counts and keywords of references, named as Document- and Keyword-Based Author Co-Citation Analysis (DKACA). Based on the traditional ACA, DKACA counted co-citation pairs by document units instead of authors from the global network perspective. Moreover, by incorporating the information of keywords from cited papers, DKACA captured their semantic similarity between co-cited papers. In the method validation part, we implemented network visualization and MDS measurement to evaluate the effectiveness of DKACA. Results suggest that the proposed DKACA method not only reveals more insights that are previously unknown but also improves the performance and accuracy of knowledge domain mapping, representing a new basis for further studies.

Download Full-text

RTM-DCU: Referential Translation Machines for Semantic Similarity

10.3115/v1/s14-2085 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ergun Bicici ◽

Andy Way

Keyword(s):

Semantic Similarity

Download Full-text

NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality

10.3115/v1/s14-2078 ◽

2014 ◽

Cited By ~ 1

Author(s):

André Lynum ◽

Partha Pakray ◽

Björn Gambäck ◽

Sergio Jimenez

Keyword(s):

Semantic Similarity ◽

Feature Representations

Download Full-text

The Opposition of Surprisal and Semantic Similarity in the Prediction of Language Processing: Evidence from Eye-tracking Data

10.31234/osf.io/zypk9 ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Eye Tracking ◽

Semantic Similarity ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Word Processing ◽

Reading Time ◽

Computational Models ◽

Tracking Data ◽

Dynamic Approach

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.

Download Full-text

Pembobotan Berdasarkan Tingkat Kesamaan Semantik pada Metode Fuzzy Semi-Supervised Co-Clustering untuk Pengelompokkan Dokumen Teks

Jurnal ULTIMATICS ◽

10.31937/ti.v6i2.333 ◽

2014 ◽

Vol 6 (2) ◽

pp. 46-51

Author(s):

Galang Amanda Dwi P. ◽

Gregorius Edwadr ◽

Agus Zainal Arifin

Keyword(s):

Supervised Learning ◽

Semantic Similarity ◽

The Other ◽

Classification Result ◽

Document Similarity ◽

The Matrix ◽

Index Terms ◽

Membership Value ◽

Degree Of Similarity

Nowadays, a large number of information can not be reached by the reader because of the misclassification of text-based documents. The misclassified data can also make the readers obtain the wrong information. The method which is proposed by this paper is aiming to classify the documents into the correct group. Each document will have a membership value in several different classes. The method will be used to find the degree of similarity between the two documents is the semantic similarity. In fact, there is no document that doesn’t have a relationship with the other but their relationship might be close to 0. This method calculates the similarity between two documents by taking into account the level of similarity of words and their synonyms. After all inter-document similarity values obtained, a matrix will be created. The matrix is then used as a semi-supervised factor. The output of this method is the value of the membership of each document, which must be one of the greatest membership value for each document which indicates where the documents are grouped. Classification result computed by the method shows a good value which is 90 %. Index Terms - Fuzzy co-clustering, Heuristic, Semantica Similiarity, Semi-supervised learning.

Download Full-text