Word Embedding Based Document Similarity for the Inferring of Penalty

Author(s):  
Tieke He ◽  
Hao Lian ◽  
Zemin Qin ◽  
Zhipeng Zou ◽  
Bin Luo
2015 ◽  
Author(s):  
Oren Melamud ◽  
Omer Levy ◽  
Ido Dagan

2014 ◽  
Vol 6 (2) ◽  
pp. 46-51
Author(s):  
Galang Amanda Dwi P. ◽  
Gregorius Edwadr ◽  
Agus Zainal Arifin

Nowadays, a large number of information can not be reached by the reader because of the misclassification of text-based documents. The misclassified data can also make the readers obtain the wrong information. The method which is proposed by this paper is aiming to classify the documents into the correct group.  Each document will have a membership value in several different classes. The method will be used to find the degree of similarity between the two documents is the semantic similarity. In fact, there is no document that doesn’t have a relationship with the other but their relationship might be close to 0. This method calculates the similarity between two documents by taking into account the level of similarity of words and their synonyms. After all inter-document similarity values obtained, a matrix will be created. The matrix is then used as a semi-supervised factor. The output of this method is the value of the membership of each document, which must be one of the greatest membership value for each document which indicates where the documents are grouped. Classification result computed by the method shows a good value which is 90 %. Index Terms - Fuzzy co-clustering, Heuristic, Semantica Similiarity, Semi-supervised learning.


Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


Author(s):  
Abhijeet SANDEEP Bhardwaj ◽  
Akash Deep ◽  
Dharmaraj Veeramani ◽  
Shiyu Zhou
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document