Semantic Sentence Similarity: Size does not Always Matter

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

A New Sentence Similarity Computing Technique Using Order and Semantic Similarity

10.1109/icses52305.2021.9633911 ◽

2021 ◽

Author(s):

Nityam Agarwal ◽

Poorvi Seth ◽

Merin Meleet

Keyword(s):

Semantic Similarity ◽

Computing Technique ◽

Sentence Similarity

Download Full-text

Combining Common Words and Semantic Features for Sentence Similarity

2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt.2018.8493713 ◽

2018 ◽

Author(s):

M. KrishnaSiva Prasad ◽

Poonam Sharma

Keyword(s):

Semantic Features ◽

Sentence Similarity

Download Full-text

Generating Sentences by Editing Prototypes

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00030 ◽

2018 ◽

Vol 6 ◽

pp. 437-450 ◽

Cited By ~ 10

Author(s):

Kelvin Guu ◽

Tatsunori B. Hashimoto ◽

Yonatan Oren ◽

Percy Liang

Keyword(s):

Language Model ◽

Language Modeling ◽

Language Models ◽

Training Corpus ◽

Human Evaluation ◽

Sentence Level ◽

Sentence Similarity ◽

Traditional Language ◽

Generative Language

We propose a new generative language model for sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional language models that generate from scratch either left-to-right or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Furthermore, the model gives rise to a latent edit vector that captures interpretable semantics such as sentence similarity and sentence-level analogies.

Download Full-text

Evaluation of Methods for Sentence Similarity for Use in Intelligent Tutoring System

Advances in Science Technology and Engineering Systems Journal ◽

10.25046/aj030501 ◽

2018 ◽

Vol 3 (5) ◽

pp. 01-05

Author(s):

Emil Brajković ◽

Daniel Vasić ◽

Tomislav Volarić

Keyword(s):

Intelligent Tutoring ◽

Intelligent Tutoring System ◽

Tutoring System ◽

Sentence Similarity ◽

Evaluation Of Methods

Download Full-text

Sentence Similarity Calculating Method Based on Word2Vec and Clustering

DEStech Transactions on Engineering and Technology Research ◽

10.12783/dtetr/amee2019/33485 ◽

2020 ◽

Author(s):

WAN-LI SONG

Keyword(s):

Calculating Method ◽

Sentence Similarity

Download Full-text

APLIKASI PENDETEKSI PLAGIARISME TUGAS DAN MAKALAH PADA SEKOLAH MENGGUNAKAN ALGORITMA RABIN KARP

Jurnal Algoritma, Logika dan Komputasi ◽

10.30813/j-alu.v1i1.1104 ◽

2018 ◽

Vol 1 (1) ◽

Author(s):

Danny Steveson ◽

Halim Agung ◽

Fendra Mulia

Keyword(s):

Search Algorithm ◽

Sentence Similarity ◽

String Search ◽

Frequent Problem ◽

Sørensen Index

Plagiarism is a very frequent problem in all aspects of one occurring in school. There is often plagiarism on the content of the papers or assignments collected by the students. This is to support the decreasing creativity of students in giving ideas and personal opinions on the task given. To answer the above problems then this research using Rabin-Karp algorithm. Rabin-Karp algorithm is a string search algorithm that uses hashing to find one of a series of string patterns in text. Using this application, the user can compare document 1 with another document, which gives results in sentence similarity, then spelled out per word, followed by per hashing and is calculated from the average number of percentages. The test in this research is done by taking samples 50 times and in comparison between percentage with Rabin Karp algorithm and percentage with manual taking. Testing is done by comparing one document with another document. Based on the result of the research, it can be concluded by using Rabin Karp Algorithm, which can be implemented in plagiarism application evidenced by the test using 50 test samples with 43 samples of success of 14.22%.<br />Keywords: document , Rabin Karp Algorithm, Dice Sorensen Index, Plagiarism, sentence, word

Download Full-text

FAST: A fuzzy semantic sentence similarity measure

2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2013.6622344 ◽

2013 ◽

Cited By ~ 6

Author(s):

David Chandran ◽

Keeley Crockett ◽

David Mclean ◽

Zuhair Bandar

Keyword(s):

Similarity Measure ◽

Sentence Similarity

Download Full-text