Document Similarity Search Based on Manifold-Ranking of TextTiles

Towards a unified approach to document similarity search using manifold-ranking of blocks

Information Processing & Management ◽

10.1016/j.ipm.2007.07.012 ◽

2008 ◽

Vol 44 (3) ◽

pp. 1032-1048 ◽

Cited By ~ 17

Author(s):

Xiaojun Wan ◽

Jianwu Yang ◽

Jianguo Xiao

Keyword(s):

Similarity Search ◽

Unified Approach ◽

Manifold Ranking ◽

Document Similarity

Download Full-text

Applying the branch and bound technique to document similarity search

2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (IEEE Cat. No.01CH37233) ◽

10.1109/pacrim.2001.953590 ◽

2002 ◽

Author(s):

K. Furuse ◽

T. Miura ◽

M. Ishikawa ◽

H. Chen ◽

N. Ohbo

Keyword(s):

Branch And Bound ◽

Similarity Search ◽

Document Similarity ◽

Branch And Bound Technique

Download Full-text

Comparing Two Models of Document Similarity Search over a Text Stream of Articles from Online News Sites

Advances in Intelligent Systems and Computing - Intelligent Computing and Optimization ◽

10.1007/978-3-030-33585-4_38 ◽

2019 ◽

pp. 379-388

Author(s):

Tham Vo Thi Hong ◽

Phuc Do

Keyword(s):

Similarity Search ◽

Online News ◽

Document Similarity ◽

Online News Sites

Download Full-text

A New Document Representation Using a Unified Graph to Document Similarity Search

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.601.394 ◽

2012 ◽

Vol 601 ◽

pp. 394-400

Author(s):

Taeh Wan Kim ◽

Ho Cheol Jeon ◽

Joong Min Choi

Keyword(s):

Similarity Search ◽

Traditional Approach ◽

Experimental Results ◽

Document Representation ◽

Web Page ◽

Undirected Graphs ◽

Document Similarity ◽

Ranked List ◽

The Web

Document similarity search is to retrieve a ranked list of similar documents and find documents similar to a query document in a text corpus or a web page on the web. But most of the previous researches regarding searching for similar documents are focused on classifying documents based on the contents of documents. To solve this problem, we propose a novel retrieval approach based on undirected graphs to represent each document in corpus. In addition, this study also considers unified graph in conjunction with multiple graphs to improve the quality of searching for similar documents. Experimental results on the Reuters-21578 data demonstrate that the proposed system has better performance and success than the traditional approach.

Download Full-text

Accounting for Language Changes Over Time in Document Similarity Search

ACM Transactions on Information Systems ◽

10.1145/2934671 ◽

2016 ◽

Vol 35 (1) ◽

pp. 1-26 ◽

Cited By ~ 1

Author(s):

Sara Morsy ◽

George Karypis

Keyword(s):

Similarity Search ◽

Document Similarity ◽

Changes Over Time ◽

Over Time

Download Full-text

A New Retrieval Model Based on TextTiling for Document Similarity Search

Journal of Computer Science and Technology ◽

10.1007/s11390-005-0552-9 ◽

2005 ◽

Vol 20 (4) ◽

pp. 552-558 ◽

Cited By ~ 8

Author(s):

Xiao-Jun Wan ◽

Yu-Xin Peng

Keyword(s):

Similarity Search ◽

Retrieval Model ◽

Document Similarity ◽

Model Based

Download Full-text

Document Similarity Search Based on Generic Summaries

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/11562382_60 ◽

2005 ◽

pp. 635-640

Author(s):

Xiaojun Wan ◽

Jianwu Yang

Keyword(s):

Similarity Search ◽

Document Similarity

Download Full-text

Design And Implementation of Document Similarity Search System For WEB-Based Medical Journal Management

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.2000 ◽

2013 ◽

Vol 5 (2) ◽

Author(s):

Mardi Siswo Utomo ◽

Edi Winarko

Keyword(s):

Medical Journal ◽

Similarity Measure ◽

Execution Time ◽

Similarity Search ◽

Similarity Index ◽

Database Management System ◽

Document Similarity ◽

Web Based ◽

Cosine Measure ◽

Other Information

Abstract— Document similarity can be used as a reference for other information searches similar. So as to reduce the time-re-appointment for information following a similar document. Document similarity search capability is usually implemented on the features 'related articles'.Similarity of documents can be measured with a cosine, with preprosesing conducted prior to the document that will be measured. The indexing process and the measurement takes a relatively long excecution time. Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.Problems with a web-based application to conduct the process and measuring the similarity index is a limited execution time, so the processing index and similarity measure in web-based application needs its own programming techniques.The purpose of this research is to design and create a software that give capability for web-based database management system of medical journals in Indonesian language to find other documents similar to the current document in reading at the time.The results of this research is the mechanism autoreload javascript and session cookies and can break down the process and measurement index similaritas into several small sections, so the process can be performed on web-based applications and the number of relatively large documents.Results with the cosine similarity measure in the case of Indonesian-language medical journal “Media medika Indonesiana” has a fairly high accuracy of 90%. Keywords— document similarity, cosine measure, web-based application.

Download Full-text

Block-Based Similarity Search on the Web Using Manifold-Ranking

Web Information Systems – WISE 2006 - Lecture Notes in Computer Science ◽

10.1007/11912873_9 ◽

2006 ◽

pp. 60-71 ◽

Cited By ~ 1

Author(s):

Xiaojun Wan ◽

Jianwu Yang ◽

Jianguo Xiao

Keyword(s):

Similarity Search ◽

Manifold Ranking ◽

Block Based ◽

The Web

Download Full-text

Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document

MATEC Web of Conferences ◽

10.1051/matecconf/201816401048 ◽

2018 ◽

Vol 164 ◽

pp. 01048

Author(s):

Yanuar Nurdiansyah ◽

Fiqih Nur Muharrom ◽

Firdaus

Keyword(s):

Similarity Search ◽

Search Algorithm ◽

Detection System ◽

Jaccard Coefficient ◽

Plagiarism Detection ◽

Document Similarity ◽

Calculation Process ◽

Waterfall Model ◽

Two Stages ◽

Model Approach

Plagiarism occurs when the students have tasks and pursued by the deadline. Plagiarism is considered as the fastest way to accomplish the tasks. This reason makes the author tried to build a plagiarism detection system with Winnowing algorithm as document similarity search algorithm. The documents that being tested are Indonesian journals with extension .doc, .docx, and/or .txt. Similarity calculation process through two stages, the first is the process of making a document fingerprint using Winnowing algorithm and the second is using Jaccard coefficient similarity. In order to develop this system, the author used iterative waterfall model approach. The main objective of this project is to determine the level of plagiarism. It is expected to prevent plagiarism either intentionally or unintentionally before our journal published by displaying the percentage of similarity in the journals that we make.

Download Full-text