Text Documents Plagiarism Detection using Rabin-Karp and Jaro-Winkler Distance Algorithms

Author(s):  
Brinardi Leonardo ◽  
Seng Hansun

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.

Compiler ◽  
2014 ◽  
Vol 3 (1) ◽  
Author(s):  
Rizki Tanjung ◽  
Haruno Sajati ◽  
Dwi Nugraheny

Plagiarism is the act of taking essay or work of others, and recognize it as his own work. Plagiarism of the text is very common and difficult to avoid. Therefore, many created a system that can assist in plagiarism detection text document. To make the detection of plagiarism of text documents at its core is to perform string matching. This makes the emergence of the idea to build an algorithm that will be implemented in RTG24 Comparison file.txt applications. Document to be compared must be a file. Txt or plaintext, and every word contained in the document must be in the dictionary of Indonesian. RTG24 algorithm works by determining the number of same or similar words in any text between the two documents. In the process RTG24 algorithm has several stages: parsing, filtering, stemming and comparison. Parsing stage is the stage where every sentence in the document will be broken down into basic words, filtering step is cleaning the particles are not important. The next stage, stemming is the stage where every word searchable basic word or root word, this is done to simplify and facilitate comparison between the two documents. Right after through the process of parsing, filtering, and stemming, then the document should be inserted into the array for the comparison or the comparison between the two documents. So it can be determined the percentage of similarity between the two documents.


2020 ◽  
Vol 11 (2) ◽  
pp. 93
Author(s):  
Latius Hermawan ◽  
Maria Bellaniar Ismiati

Abstract. Website-Based Application for Checking Students’ Digital Assignment. Nowadays, technology is not only about computers as it has advanced to smartphones and other things. In UKMC, technology has certainly helped the job. However, in this university, there is no application for checking the plagiarism of the students’ digital assignments, whereas plagiarism is sometimes done by students when working on assignments from online sources. Students’ assignments can be easily done by doing copy and paste without mentioning its reference because students tend to think practically when working on assignments. Plagiarism is strictly prohibited in education because it is not permitted. Therefore, a plagiarism detection application should be created. It applies a string-matching algorithm in text documents to search the common words between documents. By applying the string-matching method in document that match with other documents, an output that will provide information on how similar the text documents are can be generated. After testing, it is obtained that this application can help lecturers and students to reduce the level of plagiarism.Keywords: Application, Plagiarism, Digital, Assignment Abstrak. Sekarang teknologi tidak hanya tentang computer karena kemajuannya telah merambah pada smartphone, dan hal- hal lainnya. Di UKMC, teknologi yang digunakan sudah sangat membantu pekerjaan. Namun di universitas ini, belum ada aplikasi yang dapat memeriksa plagiarisme dari tugas digital mahasiswa padahal plagiarisme terkadang dilakukan oleh mahasiswa saat mengerjakan tugas dari sumber online. Tugas mahasiswa dapat dengan mudah dibuat dengan cara copy-paste tanpa menyebutkan referensi, karena siswa cenderung berpikir praktis ketika mengerjakan tugas. Plagiarisme sangat dilarang dalam pendidikan karena tidak diizinkan. Oleh karena itu aplikasi pendeteksi plagiarisme perlu dibuat. Aplikasi ini menerapkan algoritma pencocokan string dalam dokumen teks untuk mencari kata-kata umum antar dokumen. Dengan metode pencocokan string pada dokumen yang cocok dengan beberapa dokumen lainnya dapat dihasilkan suatu keluaran yang akan memberikan informasi seberapa dekat antar dokumen teks tersebut. Setelah dilakukan pengujian, didapat hasil bahwa aplikasi ini dapat membantu dosen dan mahasiswa untuk mengurangi tingkat plagiarisme.Kata Kunci: aplikasi, plagiarisme, tugas kuliah.


Proceedings ◽  
2019 ◽  
Vol 23 (1) ◽  
pp. 4 ◽  
Author(s):  
Hadi Ramin ◽  
Easwaran Krishnan ◽  
Carey J. Simonson

Air-to-air energy recovery ventilators (ERVs) are able to reduce the required energy to condition ventilation air in buildings. Among different types of ERVs, fixed-bed regenerators (FBRs) have a higher ratio of heat transfer area to volume. However, there is limited research on FBRs for HVAC applications. This paper presents preliminary experimental and numerical research of FBRs at the University of Saskatchewan. The numerical and experimental results for effectiveness of FBR agree within experimental uncertainty bounds and the results agree with available empirical correlations in the literature.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Hedong Xu ◽  
Jing Zheng ◽  
Ziwei Zhuang ◽  
Suohai Fan

The reconstruction of destroyed paper documents is of more interest during the last years. This topic is relevant to the fields of forensics, investigative sciences, and archeology. Previous research and analysis on the reconstruction of cross-cut shredded text document (RCCSTD) are mainly based on the likelihood and the traditional heuristic algorithm. In this paper, a feature-matching algorithm based on the character recognition via establishing the database of the letters is presented, reconstructing the shredded document by row clustering, intrarow splicing, and interrow splicing. Row clustering is executed through the clustering algorithm according to the clustering vectors of the fragments. Intrarow splicing regarded as the travelling salesman problem is solved by the improved genetic algorithm. Finally, the document is reconstructed by the interrow splicing according to the line spacing and the proximity of the fragments. Computational experiments suggest that the presented algorithm is of high precision and efficiency, and that the algorithm may be useful for the different size of cross-cut shredded text document.


2017 ◽  
Vol 26 (2) ◽  
pp. 233-241
Author(s):  
Eman Ismail ◽  
Walaa Gad

AbstractIn this paper, we propose a novel approach called Classification Based on Enrichment Representation (CBER) of short text documents. The proposed approach extracts concepts occurring in short text documents and uses them to calculate the weight of the synonyms of each concept. Concepts with the same meanings will increase the weights of their synonyms. However, the text document is short and concepts are rarely repeated; therefore, we capture the semantic relationships among concepts and solve the disambiguation problem. The experimental results show that the proposed CBER is valuable in annotating short text documents to their best labels (classes). We used precision and recall measures to evaluate the proposed approach. CBER performance reached 93% and 94% in precision and recall, respectively.


2015 ◽  
Vol 27 (2) ◽  
pp. 143-156 ◽  
Author(s):  
TANVER ATHAR ◽  
CARL BARTON ◽  
WIDMER BLAND ◽  
JIA GAO ◽  
COSTAS S. ILIOPOULOS ◽  
...  

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.


Sign in / Sign up

Export Citation Format

Share Document