Rabin Karp And Winnowing Algorithm For Statistics Of Text Document Plagiarism Detection

2017 ◽

Vol 5 (2) ◽

pp. 462 ◽

Cited By ~ 3

Author(s):

Brinardi Leonardo ◽

Seng Hansun

Keyword(s):

Detection System ◽

String Matching ◽

Experimental Results ◽

Plagiarism Detection ◽

Text Documents ◽

Matching Algorithm ◽

Text Document ◽

Different Types ◽

The University

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.

Download Full-text

PERANCANGAN DAN PENERAPAN ALGORITMA RIZKI TANJUNG 24 (RTG24) UNTUK KOMPARASI KATA PADA FILE TEXT

Compiler ◽

10.28989/compiler.v3i1.68 ◽

2014 ◽

Vol 3 (1) ◽

Author(s):

Rizki Tanjung ◽

Haruno Sajati ◽

Dwi Nugraheny

Keyword(s):

String Matching ◽

Plagiarism Detection ◽

Text Documents ◽

Text Document ◽

Basic Word ◽

Root Word

Plagiarism is the act of taking essay or work of others, and recognize it as his own work. Plagiarism of the text is very common and difficult to avoid. Therefore, many created a system that can assist in plagiarism detection text document. To make the detection of plagiarism of text documents at its core is to perform string matching. This makes the emergence of the idea to build an algorithm that will be implemented in RTG24 Comparison file.txt applications. Document to be compared must be a file. Txt or plaintext, and every word contained in the document must be in the dictionary of Indonesian. RTG24 algorithm works by determining the number of same or similar words in any text between the two documents. In the process RTG24 algorithm has several stages: parsing, filtering, stemming and comparison. Parsing stage is the stage where every sentence in the document will be broken down into basic words, filtering step is cleaning the particles are not important. The next stage, stemming is the stage where every word searchable basic word or root word, this is done to simplify and facilitate comparison between the two documents. Right after through the process of parsing, filtering, and stemming, then the document should be inserted into the array for the comparison or the comparison between the two documents. So it can be determined the percentage of similarity between the two documents.

Download Full-text

Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks: Algoritma Rabin Karp Vs. Winnowing

Digital Zone Jurnal Teknologi Informasi dan Komunikasi ◽

10.31849/digitalzone.v9i1.1242 ◽

2018 ◽

Vol 9 (1) ◽

pp. 82-93

Author(s):

Sugiono Sugiono ◽

Herwin Herwin ◽

Hamdani Hamdani ◽

Erlin Erlin

Keyword(s):

Word Processing ◽

Processing Time ◽

Code Of Conduct ◽

Scientific Writing ◽

Plagiarism Detection ◽

Text Similarity ◽

Text Documents ◽

Processing Application ◽

Text Document ◽

Copy And Paste

Tindakan copy paste dokumen teks sering terjadi dalam penulisan karya ilmiah tanpa memberikan kredit kepada yang mempunyai dokumen teks tersebut. Tindakan melanggar kode etik ini disebabkan karena tersedianya fasilitas menyalin dan menempel teks pada aplikasi pengolah kata. Tujuan dari penelitian ini adalah untuk membangun sebuah aplikasi yang mampu mendeteksi tingkat kesamaan dokumen teks dengan terlebih dahulu membandingkan tingkat kehandalan dari dua algoritma pendeteksi kesamaan teks yaitu algoritma rabin-karp dan algoritma winnowing. Perbandingan dilakukan terhadap dua variabel yaitu tingkat kemampuan mendeteksi dan waktu pemrosesan. Hasil menunjukkan bawah algoritma winnowing lebih unggul dibandingkan algoritma rabin-karp dari sisi tingkat akurasi maupun dari sisi waktu pemrosesan. Abstract The behavior of copy pastes the text document often occurs in scientific writing without giving credit to those who have the text document. The behavior of this missing code of conduct due to the availability of facility to copy and paste the text in a word processing application. The purpose of this study is to build an application that can detect the index of similarity of text documents by first comparing the level of reliability of the two text similarity algorithms, i.e., Rabin-Karp and Winnowing. The comparison is measured based on two variables; the level of capability of detecting and processing time. The result shows that Winnowing algorithm outperforms Rabin-Karp in term of both accuracy and processing time. Keywords: Rabin-Karp, Winnowing, Plagiarism Detection, Text Similarity

Download Full-text

Plagiarism Detection and Avoidance Consequences in Academic World

Journal of Advanced Research in Library and Information Science ◽

10.24321/2395.2288.201706 ◽

2017 ◽

Vol 04 (04) ◽

pp. 6-13

Author(s):

Akhandanand Shukla ◽

Keyword(s):

Plagiarism Detection ◽

Academic World

Download Full-text

Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i5.2486 ◽

2020 ◽

Vol 4 (5) ◽

pp. 988-997

Author(s):

Sylvia Putri Gunawan ◽

Lucia Dwi Krisnawati ◽

Antonius Rachmat Chrismanto

Keyword(s):

Detection System ◽

Plagiarism Detection ◽

Development System ◽

Intrinsic Plagiarism Detection

Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.

Download Full-text

Comparision of Different Distance Measure Methods in Text Document Clustering

INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING ◽

10.21276/ijre.2018.5.7.2 ◽

2018 ◽

Vol 5 (7) ◽

Author(s):

Yin Min Tun ◽

Keyword(s):

Distance Measure ◽

Document Clustering ◽

Text Document ◽

Measure Methods

Download Full-text

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180903112541 ◽

2020 ◽

Vol 16 (4) ◽

pp. 296-306 ◽

Cited By ~ 3

Author(s):

Laith Mohammad Abualigah ◽

Essam Said Hanandeh ◽

Ahamad Tajudin Khader ◽

Mohammed Abdallh Otair ◽

Shishir Kumar Shandilya

Keyword(s):

Optimization Technique ◽

Document Clustering ◽

Text Clustering ◽

Hill Climbing ◽

Text Documents ◽

Clustering Problem ◽

Text Document ◽

Text Information ◽

Amount Of Knowledge ◽

The Hill

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Revisiting the Challenges and Opportunities in Software Plagiarism Detection

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner48275.2020.9054847 ◽

2020 ◽

Author(s):

Xi Xu ◽

Ming Fan ◽

Ang Jia ◽

Yin Wang ◽

Zheng Yan ◽

...

Keyword(s):

Plagiarism Detection ◽

Challenges And Opportunities

Download Full-text

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Computers ◽

10.3390/computers10040047 ◽

2021 ◽

Vol 10 (4) ◽

pp. 47

Author(s):

Fariha Iffath ◽

A. S. M. Kayes ◽

Md. Tahsin Rahman ◽

Jannatul Ferdows ◽

Mohammad Shamsul Arefin ◽

...

Keyword(s):

Source Code ◽

Large Data ◽

Large Data Sets ◽

Detection Technique ◽

Data Sets ◽

Plagiarism Detection ◽

Source Codes ◽

Efficient Detection ◽

Mathematical Problems ◽

Automatic Scoring

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.

Download Full-text

Application of Java Relationship Graphs (JRG) to plagiarism detection in Java Projects: A Neo4j Graph Database Approach

2021 The 4th International Conference on Software Engineering and Information Management ◽

10.1145/3451471.3451479 ◽

2021 ◽

Author(s):

Ritu Arora ◽

Arun Motilal Maurya ◽

Yashvardhan Sharma

Keyword(s):

Graph Database ◽

Plagiarism Detection

Download Full-text

Rabin Karp And Winnowing Algorithm For Statistics Of Text Document Plagiarism Detection

Text Documents Plagiarism Detection using Rabin-Karp and Jaro-Winkler Distance Algorithms

PERANCANGAN DAN PENERAPAN ALGORITMA RIZKI TANJUNG 24 (RTG24) UNTUK KOMPARASI KATA PADA FILE TEXT

Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks: Algoritma Rabin Karp Vs. Winnowing

Plagiarism Detection and Avoidance Consequences in Academic World

Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

Comparision of Different Distance Measure Methods in Text Document Clustering

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Revisiting the Challenges and Opportunities in Software Plagiarism Detection

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Application of Java Relationship Graphs (JRG) to plagiarism detection in Java Projects: A Neo4j Graph Database Approach

Export Citation Format