Text Documents Plagiarism Detection using Rabin-Karp and Jaro-Winkler Distance Algorithms

Brinardi Leonardo; Seng Hansun

doi:10.11591/ijeecs.v5.i2.pp462-471

Text Documents Plagiarism Detection using Rabin-Karp and Jaro-Winkler Distance Algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp462-471 ◽

2017 ◽

Vol 5 (2) ◽

pp. 462 ◽

Cited By ~ 3

Author(s):

Brinardi Leonardo ◽

Seng Hansun

Keyword(s):

Detection System ◽

String Matching ◽

Experimental Results ◽

Plagiarism Detection ◽

Text Documents ◽

Matching Algorithm ◽

Text Document ◽

Different Types ◽

The University

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.

Download Full-text

PERANCANGAN DAN PENERAPAN ALGORITMA RIZKI TANJUNG 24 (RTG24) UNTUK KOMPARASI KATA PADA FILE TEXT

Compiler ◽

10.28989/compiler.v3i1.68 ◽

2014 ◽

Vol 3 (1) ◽

Author(s):

Rizki Tanjung ◽

Haruno Sajati ◽

Dwi Nugraheny

Keyword(s):

String Matching ◽

Plagiarism Detection ◽

Text Documents ◽

Text Document ◽

Basic Word ◽

Root Word

Plagiarism is the act of taking essay or work of others, and recognize it as his own work. Plagiarism of the text is very common and difficult to avoid. Therefore, many created a system that can assist in plagiarism detection text document. To make the detection of plagiarism of text documents at its core is to perform string matching. This makes the emergence of the idea to build an algorithm that will be implemented in RTG24 Comparison file.txt applications. Document to be compared must be a file. Txt or plaintext, and every word contained in the document must be in the dictionary of Indonesian. RTG24 algorithm works by determining the number of same or similar words in any text between the two documents. In the process RTG24 algorithm has several stages: parsing, filtering, stemming and comparison. Parsing stage is the stage where every sentence in the document will be broken down into basic words, filtering step is cleaning the particles are not important. The next stage, stemming is the stage where every word searchable basic word or root word, this is done to simplify and facilitate comparison between the two documents. Right after through the process of parsing, filtering, and stemming, then the document should be inserted into the array for the comparison or the comparison between the two documents. So it can be determined the percentage of similarity between the two documents.

Download Full-text

Aplikasi Pengecekan Dokumen Digital Tugas Mahasiswa Berbasis Website

Jurnal Buana Informatika ◽

10.24002/jbi.v11i2.3706 ◽

2020 ◽

Vol 11 (2) ◽

pp. 93

Author(s):

Latius Hermawan ◽

Maria Bellaniar Ismiati

Keyword(s):

String Matching ◽

Plagiarism Detection ◽

Matching Method ◽

Text Documents ◽

Matching Algorithm ◽

The Common ◽

Copy And Paste ◽

Online Sources

Abstract. Website-Based Application for Checking Students’ Digital Assignment. Nowadays, technology is not only about computers as it has advanced to smartphones and other things. In UKMC, technology has certainly helped the job. However, in this university, there is no application for checking the plagiarism of the students’ digital assignments, whereas plagiarism is sometimes done by students when working on assignments from online sources. Students’ assignments can be easily done by doing copy and paste without mentioning its reference because students tend to think practically when working on assignments. Plagiarism is strictly prohibited in education because it is not permitted. Therefore, a plagiarism detection application should be created. It applies a string-matching algorithm in text documents to search the common words between documents. By applying the string-matching method in document that match with other documents, an output that will provide information on how similar the text documents are can be generated. After testing, it is obtained that this application can help lecturers and students to reduce the level of plagiarism.Keywords: Application, Plagiarism, Digital, Assignment Abstrak. Sekarang teknologi tidak hanya tentang computer karena kemajuannya telah merambah pada smartphone, dan hal- hal lainnya. Di UKMC, teknologi yang digunakan sudah sangat membantu pekerjaan. Namun di universitas ini, belum ada aplikasi yang dapat memeriksa plagiarisme dari tugas digital mahasiswa padahal plagiarisme terkadang dilakukan oleh mahasiswa saat mengerjakan tugas dari sumber online. Tugas mahasiswa dapat dengan mudah dibuat dengan cara copy-paste tanpa menyebutkan referensi, karena siswa cenderung berpikir praktis ketika mengerjakan tugas. Plagiarisme sangat dilarang dalam pendidikan karena tidak diizinkan. Oleh karena itu aplikasi pendeteksi plagiarisme perlu dibuat. Aplikasi ini menerapkan algoritma pencocokan string dalam dokumen teks untuk mencari kata-kata umum antar dokumen. Dengan metode pencocokan string pada dokumen yang cocok dengan beberapa dokumen lainnya dapat dihasilkan suatu keluaran yang akan memberikan informasi seberapa dekat antar dokumen teks tersebut. Setelah dilakukan pengujian, didapat hasil bahwa aplikasi ini dapat membantu dosen dan mahasiswa untuk mengurangi tingkat plagiarisme.Kata Kunci: aplikasi, plagiarisme, tugas kuliah.

Download Full-text

An Efficient String Matching Algorithm Using Bidirectional and Parallel Processing Structure for Intrusion Detection System

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2010.10.015 ◽

2010 ◽

Author(s):

Gwo-Ching Chang

Keyword(s):

Parallel Processing ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

String Matching ◽

Matching Algorithm

Download Full-text

Fixed Bed Regenerators for HVAC Applications

Proceedings ◽

10.3390/proceedings2019023004 ◽

2019 ◽

Vol 23 (1) ◽

pp. 4 ◽

Cited By ~ 3

Author(s):

Hadi Ramin ◽

Easwaran Krishnan ◽

Carey J. Simonson

Keyword(s):

Heat Transfer ◽

Energy Recovery ◽

Fixed Bed ◽

Experimental Results ◽

Experimental Uncertainty ◽

Heat Transfer Area ◽

Empirical Correlations ◽

Different Types ◽

Numerical Research ◽

The University

Air-to-air energy recovery ventilators (ERVs) are able to reduce the required energy to condition ventilation air in buildings. Among different types of ERVs, fixed-bed regenerators (FBRs) have a higher ratio of heat transfer area to volume. However, there is limited research on FBRs for HVAC applications. This paper presents preliminary experimental and numerical research of FBRs at the University of Saskatchewan. The numerical and experimental results for effectiveness of FBR agree within experimental uncertainty bounds and the results agree with available empirical correlations in the literature.

Download Full-text

Multiprocessing scalable string matching algorithm for network intrusion detection system

International Journal of High Performance Systems Architecture ◽

10.1504/ijhpsa.2018.10022485 ◽

2018 ◽

Vol 8 (3) ◽

pp. 159

Author(s):

Duaa Nazzal ◽

Adnan A. Hnaif ◽

Issa S. Al'otoum ◽

Mohammad A. Alia ◽

Ali Aldahoud

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

String Matching ◽

Network Intrusion Detection ◽

Matching Algorithm ◽

Network Intrusion ◽

Network Intrusion Detection System

Download Full-text

A Solution to Reconstruct Cross-Cut Shredded Text Documents Based on Character Recognition and Genetic Algorithm

Abstract and Applied Analysis ◽

10.1155/2014/829602 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Hedong Xu ◽

Jing Zheng ◽

Ziwei Zhuang ◽

Suohai Fan

Keyword(s):

Genetic Algorithm ◽

Character Recognition ◽

Clustering Algorithm ◽

Feature Matching ◽

Travelling Salesman Problem ◽

Improved Genetic Algorithm ◽

Text Documents ◽

Matching Algorithm ◽

Text Document ◽

Line Spacing

The reconstruction of destroyed paper documents is of more interest during the last years. This topic is relevant to the fields of forensics, investigative sciences, and archeology. Previous research and analysis on the reconstruction of cross-cut shredded text document (RCCSTD) are mainly based on the likelihood and the traditional heuristic algorithm. In this paper, a feature-matching algorithm based on the character recognition via establishing the database of the letters is presented, reconstructing the shredded document by row clustering, intrarow splicing, and interrow splicing. Row clustering is executed through the clustering algorithm according to the clustering vectors of the fragments. Intrarow splicing regarded as the travelling salesman problem is solved by the improved genetic algorithm. Finally, the document is reconstructed by the interrow splicing according to the line spacing and the proximity of the fragments. Computational experiments suggest that the presented algorithm is of high precision and efficiency, and that the algorithm may be useful for the different size of cross-cut shredded text document.

Download Full-text

A Deterministic Cost-effective String Matching Algorithm for Network Intrusion Detection System

2007 IEEE International Conference on Communications ◽

10.1109/icc.2007.218 ◽

2007 ◽

Cited By ~ 2

Author(s):

N.-F. Huang ◽

Y.-M. Chu ◽

C.-Y. Hsieh ◽

C.-H. Tsai ◽

Y.-J. Tzang

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

String Matching ◽

Cost Effective ◽

Network Intrusion Detection ◽

Matching Algorithm ◽

Network Intrusion ◽

Network Intrusion Detection System

Download Full-text

A fast string-matching algorithm for network processor-based intrusion detection system

ACM Transactions on Embedded Computing Systems ◽

10.1145/1015047.1015055 ◽

2004 ◽

Vol 3 (3) ◽

pp. 614-633 ◽

Cited By ~ 56

Author(s):

Rong-Tai Liu ◽

Nen-Fu Huang ◽

Chih-Hao Chen ◽

Chia-Nan Kao

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

String Matching ◽

Network Processor ◽

Matching Algorithm

Download Full-text

CBER: An Effective Classification Approach Based on Enrichment Representation for Short Text Documents

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0066 ◽

2017 ◽

Vol 26 (2) ◽

pp. 233-241

Author(s):

Eman Ismail ◽

Walaa Gad

Keyword(s):

Experimental Results ◽

Text Documents ◽

Short Text ◽

Classification Approach ◽

Semantic Relationships ◽

Text Document ◽

Novel Approach

AbstractIn this paper, we propose a novel approach called Classification Based on Enrichment Representation (CBER) of short text documents. The proposed approach extracts concepts occurring in short text documents and uses them to calculate the weight of the synonyms of each concept. Concepts with the same meanings will increase the weights of their synonyms. However, the text document is short and concepts are rarely repeated; therefore, we capture the semantic relationships among concepts and solve the disambiguation problem. The experimental results show that the proposed CBER is valuable in annotating short text documents to their best labels (classes). We used precision and recall measures to evaluate the proposed approach. CBER performance reached 93% and 94% in precision and recall, respectively.

Download Full-text

Fast circular dictionary-matching algorithm

Mathematical Structures in Computer Science ◽

10.1017/s0960129515000134 ◽

2015 ◽

Vol 27 (2) ◽

pp. 143-156 ◽

Cited By ~ 5

Author(s):

TANVER ATHAR ◽

CARL BARTON ◽

WIDMER BLAND ◽

JIA GAO ◽

COSTAS S. ILIOPOULOS ◽

...

Keyword(s):

String Matching ◽

Synthetic Data ◽

Experimental Results ◽

Matching Problem ◽

Worst Case ◽

Average Case ◽

Matching Algorithm ◽

Dictionary Matching ◽

Total Length

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.

Download Full-text