Document Similarity Search Based on Manifold-Ranking of TextTiles

Author(s):  
Xiaojun Wan ◽  
Jianwu Yang ◽  
Jianguo Xiao
2008 ◽  
Vol 44 (3) ◽  
pp. 1032-1048 ◽  
Author(s):  
Xiaojun Wan ◽  
Jianwu Yang ◽  
Jianguo Xiao

2012 ◽  
Vol 601 ◽  
pp. 394-400
Author(s):  
Taeh Wan Kim ◽  
Ho Cheol Jeon ◽  
Joong Min Choi

Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.


Author(s):  
Mardi Siswo Utomo ◽  
Edi Winarko

Abstract— Document similarity can be used as a reference for other information searches similar. So as to reduce the time-re-appointment for information following a similar document. Document similarity search capability is usually implemented on the features 'related articles'.Similarity of documents can be measured with a cosine, with preprosesing conducted prior to the document that will be measured. The indexing process and the measurement takes a relatively long excecution time. Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.The purpose of this research is to design and create a software that give capability for web-based database management system of medical journals in Indonesian language to find other documents similar to the current document in reading at the time.The results of this research is the mechanism autoreload javascript and session cookies and can break down the process and measurement index similaritas into several small sections, so the process can be performed on web-based applications and the number of relatively large documents.Results with the cosine similarity measure in the case of Indonesian-language medical journal “Media medika Indonesiana” has a fairly high accuracy of 90%. Keywords— document similarity, cosine measure, web-based application.


2018 ◽  
Vol 164 ◽  
pp. 01048
Author(s):  
Yanuar Nurdiansyah ◽  
Fiqih Nur Muharrom ◽  
Firdaus

Plagiarism occurs when the students have tasks and pursued by the deadline. Plagiarism is considered as the fastest way to accomplish the tasks. This reason makes the author tried to build a plagiarism detection system with Winnowing algorithm as document similarity search algorithm. The documents that being tested are Indonesian journals with extension .doc, .docx, and/or .txt. Similarity calculation process through two stages, the first is the process of making a document fingerprint using Winnowing algorithm and the second is using Jaccard coefficient similarity. In order to develop this system, the author used iterative waterfall model approach. The main objective of this project is to determine the level of plagiarism. It is expected to prevent plagiarism either intentionally or unintentionally before our journal published by displaying the percentage of similarity in the journals that we make.


Sign in / Sign up

Export Citation Format

Share Document