Methodology of Selecting the Hadoop Ecosystem Configuration in Order to Improve the Performance of a Plagiarism Detection System

Author(s):  
Andrzej Sobecki ◽  
Marcin Kepa
2020 ◽  
Vol 4 (5) ◽  
pp. 988-997
Author(s):  
Sylvia Putri Gunawan ◽  
Lucia Dwi Krisnawati ◽  
Antonius Rachmat Chrismanto

Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.  


Author(s):  
Maxim Mozgovoy ◽  
Kimmo Fredriksson ◽  
Daniel White ◽  
Mike Joy ◽  
Erkki Sutinen

Author(s):  
Brinardi Leonardo ◽  
Seng Hansun

Plagiarism is an act that is considered by the university as a fraud by taking someone ideas or writings without mentioning the references and claimed as his own. Plagiarism detection system is generally implement string matching algorithm in a text document to search for common words between documents. There are some algorithms used for string matching, two of them are Rabin-Karp and Jaro-Winkler Distance algorithms. Rabin-Karp algorithm is one of compatible algorithms to solve the problem of multiple string patterns, while, Jaro-Winkler Distance algorithm has advantages in terms of time. A plagiarism detection application is developed and tested on different types of documents, i.e. doc, docx, pdf and txt. From the experimental results, we obtained that both of these algorithms can be used to perform plagiarism detection of those documents, but in terms of their effectiveness, Rabin-Karp algorithm is much more effective and faster in the process of detecting the document with the size more than 1000 KB.


1981 ◽  
Vol 13 (1) ◽  
pp. 21-25 ◽  
Author(s):  
John L. Donaldson ◽  
Ann-Marie Lancaster ◽  
Paula H. Sposato

Author(s):  
Maytham Alabbas ◽  
Raidah S. Khudeyer ◽  
Mustafa Radif ◽  
Hassan Khalid Hameed

Using someone else's work or ideas without attribution is plagiarism, whether you meant to do it or not. Unintended plagiarism of snippet of text can have serious consequences and be a serious form of ethical misconduct. The current system is a web application that enables you to check a multilingual text, with special focus on Arabic, for duplicate contents on the World Wide Web. In this system, you can simply input or paste your text through the online system and for each sentence in the text it will go through three popular search engines: Google, Bing, and Yandex SERP and try to find the top three results on the first page for each search engine where duplicate contents already exist. This system is getting data from the three-search engines custom search APIs. Then, the system uses a text similarity technique between the suspicious sentence and the retrieved text snippet for all nine results. The result is the one that gives the highest similarity rate. The results were encouraging and will open doors for new and innovative techniques for researchers in this field.


Sign in / Sign up

Export Citation Format

Share Document