Tool support for plagiarism detection in text documents

In this article we present a new semantic and syntactic-based method for external plagiarism detection. In the proposed approach, latent dirichlet allocation (LDA) and parts of speech (POS) tags are used together to detect plagiarism between the sample and a number of source documents. The basic hypothesis is that considering semantic and syntactic information between two text documents may improve the performance of the plagiarism detection task. Our method is based on two steps, naming, which is a pre-processing where we detect the topics from the sentences in documents using the LDA and convert each sentence in POS tags array; then a post processing step where the suspicious cases are verified purely on the basis of semantic rules. For two types of external plagiarism (copy and random obfuscation), we empirically compare our approach to the state-of-the-art N-gram based and stop-word N-gram based methods and observe significant improvements.

Download Full-text

Text Documents Plagiarism Detection using Rabin-Karp and Jaro-Winkler Distance Algorithms

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp462-471 ◽

2017 ◽

Vol 5 (2) ◽

pp. 462 ◽

Cited By ~ 3

Author(s):

Brinardi Leonardo ◽

Seng Hansun

Keyword(s):

Detection System ◽

String Matching ◽

Experimental Results ◽

Plagiarism Detection ◽

Text Documents ◽

Matching Algorithm ◽

Text Document ◽

Different Types ◽

The University

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.

Download Full-text

Disguised plagiarism detection in Arabic text documents

2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP) ◽

10.1109/icnlsp.2018.8374395 ◽

2018 ◽

Cited By ~ 3

Author(s):

El Moatez Billah Nagoudi ◽

Hadda Cherroun ◽

Ali Alshehri

Keyword(s):

Arabic Text ◽

Plagiarism Detection ◽

Text Documents

Download Full-text

Plagiarism Detection of Paraphrases in Text Documents with Document Retrieval

Advances in Computing and Information Technology - Communications in Computer and Information Science ◽

10.1007/978-3-642-22555-0_34 ◽

2011 ◽

pp. 330-338

Author(s):

S. Sandhya ◽

S. Chitrakala

Keyword(s):

Document Retrieval ◽

Plagiarism Detection ◽

Text Documents

Download Full-text

PERANCANGAN DAN PENERAPAN ALGORITMA RIZKI TANJUNG 24 (RTG24) UNTUK KOMPARASI KATA PADA FILE TEXT

Compiler ◽

10.28989/compiler.v3i1.68 ◽

2014 ◽

Vol 3 (1) ◽

Author(s):

Rizki Tanjung ◽

Haruno Sajati ◽

Dwi Nugraheny

Keyword(s):

String Matching ◽

Plagiarism Detection ◽

Text Documents ◽

Text Document ◽

Basic Word ◽

Root Word

Plagiarism is the act of taking essay or work of others, and recognize it as his own work. Plagiarism of the text is very common and difficult to avoid. Therefore, many created a system that can assist in plagiarism detection text document. To make the detection of plagiarism of text documents at its core is to perform string matching. This makes the emergence of the idea to build an algorithm that will be implemented in RTG24 Comparison file.txt applications. Document to be compared must be a file. Txt or plaintext, and every word contained in the document must be in the dictionary of Indonesian. RTG24 algorithm works by determining the number of same or similar words in any text between the two documents. In the process RTG24 algorithm has several stages: parsing, filtering, stemming and comparison. Parsing stage is the stage where every sentence in the document will be broken down into basic words, filtering step is cleaning the particles are not important. The next stage, stemming is the stage where every word searchable basic word or root word, this is done to simplify and facilitate comparison between the two documents. Right after through the process of parsing, filtering, and stemming, then the document should be inserted into the array for the comparison or the comparison between the two documents. So it can be determined the percentage of similarity between the two documents.

Download Full-text

Performance Analysis and Evaluation of Feature Selection Techniques to Find Coherence Using Cosine Similarity

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9027 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4106-4110

Author(s):

Mausumi Goswami ◽

B. S. Purkayastha

Keyword(s):

Feature Selection ◽

Vital Role ◽

Unstructured Data ◽

Sensor Data ◽

Computational Techniques ◽

Plagiarism Detection ◽

Feature Selection Technique ◽

Text Documents ◽

Selection Technique ◽

Feature Selection Techniques

Unstructured Data is utilized in many major applications. It seems 80% of the data generated by various business applications are unstructured. Unstructured data can not be directly processed to generate information. Few major applications which uses AI are Recommendation systems, Sentiment Analysis of customer’s emotions, finding duplicate content through plagiarism detection, document organization based on the requirements etc. Different forms of origin of such data can be categorized as unstructured text on the World Wide Web, sensor data, digital images, videos, sound, result of scientific experiments and user profiles for marketing. Information retrieval from huge text datasets is quite challenging. This is caused by the various characteristics associated with natural languages and a major concern in text mining. Before we apply computational techniques on documents it is important to make the documents ready for processing. Document Preprocessing is one such method applied for text documents. Document Preprocessing plays a vital role in document grouping. In this paper, four feature selection techniques are implemented and empirical investigation results are included. The evaluation of the grouping outcomes are used to evaluate the effectiveness of each feature selection technique. The evaluation of the grouping outcomes are done to evaluate the effectiveness of each feature selection technique.

Download Full-text

Using word semantic concepts for plagiarism detection in text documents

Information Retrieval ◽

10.1007/s10791-021-09394-4 ◽

2021 ◽

Author(s):

Chia-Yang Chang ◽

Shie-Jue Lee ◽

Chih-Hung Wu ◽

Chih-Feng Liu ◽

Ching-Kuan Liu

Keyword(s):

Plagiarism Detection ◽

Text Documents ◽

Semantic Concepts

Download Full-text

Taxonomy of academic plagiarism methods

Zbornik Veleučilišta u Rijeci ◽

10.31784/zvr.9.1.17 ◽

2021 ◽

Vol 9 (1) ◽

pp. 283-300

Author(s):

Tedo Vrbanec ◽

Ana Meštrović

Keyword(s):

Academic Community ◽

Plagiarism Detection ◽

New Classification ◽

Software Developers ◽

Text Documents ◽

Comprehensive Classification ◽

Academic Plagiarism

The article gives an overview of the plagiarism domain, with focus on academic plagiarism. The article defines plagiarism, explains the origin of the term, as well as plagiarism related terms. It identifies the extent of the plagiarism domain and then focuses on the plagiarism subdomain of text documents, for which it gives an overview of current classifications and taxonomies and then proposes a more comprehensive classification according to several criteria: their origin and purpose, technical implementation, consequence, complexity of detection and according to the number of linguistic sources. The article suggests the new classification of academic plagiarism, describes sorts and methods of plagiarism, types and categories, approaches and phases of plagiarism detection, the classification of methods and algorithms for plagiarism detection. The title of the article explicitly targets the academic community, but it is sufficiently general and interdisciplinary, so it can be useful for many other professionals like software developers, linguists and librarians.

Download Full-text

Aplikasi Pendeteksi Tingkat Kesamaan Dokumen Teks: Algoritma Rabin Karp Vs. Winnowing

Digital Zone Jurnal Teknologi Informasi dan Komunikasi ◽

10.31849/digitalzone.v9i1.1242 ◽

2018 ◽

Vol 9 (1) ◽

pp. 82-93

Author(s):

Sugiono Sugiono ◽

Herwin Herwin ◽

Hamdani Hamdani ◽

Erlin Erlin

Keyword(s):

Word Processing ◽

Processing Time ◽

Code Of Conduct ◽

Scientific Writing ◽

Plagiarism Detection ◽

Text Similarity ◽

Text Documents ◽

Processing Application ◽

Text Document ◽

Copy And Paste

Tindakan copy paste dokumen teks sering terjadi dalam penulisan karya ilmiah tanpa memberikan kredit kepada yang mempunyai dokumen teks tersebut. Tindakan melanggar kode etik ini disebabkan karena tersedianya fasilitas menyalin dan menempel teks pada aplikasi pengolah kata. Tujuan dari penelitian ini adalah untuk membangun sebuah aplikasi yang mampu mendeteksi tingkat kesamaan dokumen teks dengan terlebih dahulu membandingkan tingkat kehandalan dari dua algoritma pendeteksi kesamaan teks yaitu algoritma rabin-karp dan algoritma winnowing. Perbandingan dilakukan terhadap dua variabel yaitu tingkat kemampuan mendeteksi dan waktu pemrosesan. Hasil menunjukkan bawah algoritma winnowing lebih unggul dibandingkan algoritma rabin-karp dari sisi tingkat akurasi maupun dari sisi waktu pemrosesan. Abstract The behavior of copy pastes the text document often occurs in scientific writing without giving credit to those who have the text document. The behavior of this missing code of conduct due to the availability of facility to copy and paste the text in a word processing application. The purpose of this study is to build an application that can detect the index of similarity of text documents by first comparing the level of reliability of the two text similarity algorithms, i.e., Rabin-Karp and Winnowing. The comparison is measured based on two variables; the level of capability of detecting and processing time. The result shows that Winnowing algorithm outperforms Rabin-Karp in term of both accuracy and processing time. Keywords: Rabin-Karp, Winnowing, Plagiarism Detection, Text Similarity

Download Full-text

Aplikasi Pengecekan Dokumen Digital Tugas Mahasiswa Berbasis Website

Jurnal Buana Informatika ◽

10.24002/jbi.v11i2.3706 ◽

2020 ◽

Vol 11 (2) ◽

pp. 93

Author(s):

Latius Hermawan ◽

Maria Bellaniar Ismiati

Keyword(s):

String Matching ◽

Plagiarism Detection ◽

Matching Method ◽

Text Documents ◽

Matching Algorithm ◽

The Common ◽

Copy And Paste ◽

Online Sources

Abstract. Website-Based Application for Checking Students’ Digital Assignment. Nowadays, technology is not only about computers as it has advanced to smartphones and other things. In UKMC, technology has certainly helped the job. However, in this university, there is no application for checking the plagiarism of the students’ digital assignments, whereas plagiarism is sometimes done by students when working on assignments from online sources. Students’ assignments can be easily done by doing copy and paste without mentioning its reference because students tend to think practically when working on assignments. Plagiarism is strictly prohibited in education because it is not permitted. Therefore, a plagiarism detection application should be created. It applies a string-matching algorithm in text documents to search the common words between documents. By applying the string-matching method in document that match with other documents, an output that will provide information on how similar the text documents are can be generated. After testing, it is obtained that this application can help lecturers and students to reduce the level of plagiarism.Keywords: Application, Plagiarism, Digital, Assignment Abstrak. Sekarang teknologi tidak hanya tentang computer karena kemajuannya telah merambah pada smartphone, dan hal- hal lainnya. Di UKMC, teknologi yang digunakan sudah sangat membantu pekerjaan. Namun di universitas ini, belum ada aplikasi yang dapat memeriksa plagiarisme dari tugas digital mahasiswa padahal plagiarisme terkadang dilakukan oleh mahasiswa saat mengerjakan tugas dari sumber online. Tugas mahasiswa dapat dengan mudah dibuat dengan cara copy-paste tanpa menyebutkan referensi, karena siswa cenderung berpikir praktis ketika mengerjakan tugas. Plagiarisme sangat dilarang dalam pendidikan karena tidak diizinkan. Oleh karena itu aplikasi pendeteksi plagiarisme perlu dibuat. Aplikasi ini menerapkan algoritma pencocokan string dalam dokumen teks untuk mencari kata-kata umum antar dokumen. Dengan metode pencocokan string pada dokumen yang cocok dengan beberapa dokumen lainnya dapat dihasilkan suatu keluaran yang akan memberikan informasi seberapa dekat antar dokumen teks tersebut. Setelah dilakukan pengujian, didapat hasil bahwa aplikasi ini dapat membantu dosen dan mahasiswa untuk mengurangi tingkat plagiarisme.Kata Kunci: aplikasi, plagiarisme, tugas kuliah.

Download Full-text