Source Retrieval Model Focused on Aggregation for plagiarism detection

AbstractDetection of local text reuse is central to a variety of applications, including plagiarism detection, origin detection, and information flow analysis. This paper evaluates and compares effectiveness of fingerprint selection algorithms for the source retrieval stage of local text reuse detection. In total, six algorithms are compared – Every p-th, 0 mod p, Winnowing, Hailstorm, Frequency-biased Winnowing (FBW), as well as the proposed modified version of FBW (MFBW).Most of the previously published studies in local text reuse detection are based on datasets having either artificially generated, long-sized, or unobfuscated text reuse. In this study, to evaluate performance of the algorithms, a new dataset has been built containing real text reuse cases from Bachelor and Master Theses (written in English in the field of computer science) where about half of the cases involve less than 1 % of document text while about two-thirds of the cases involve paraphrasing.In the performed experiments, the overall best detection quality is reached by Winnowing, 0 mod p, and MFBW. The proposed MFBW algorithm is a considerable improvement over FBW and becomes one of the best performing algorithms.The software developed for this study is freely available at the author’s website http://www.cs.rtu.lv/jekabsons/.

Download Full-text

Source Retrieval for Plagiarism Detection

Journal of Advances in Information Technology ◽

10.12720/jait.6.1.18-26 ◽

2015 ◽

pp. 18-26

Author(s):

Šimon Suchomel ◽

Michal Brandejs

Keyword(s):

Plagiarism Detection ◽

Source Retrieval

Download Full-text

Helios-r2: A New Bayesian, Open-source Retrieval Model for Brown Dwarfs and Exoplanet Atmospheres

The Astrophysical Journal ◽

10.3847/1538-4357/ab6d71 ◽

2020 ◽

Vol 890 (2) ◽

pp. 174 ◽

Cited By ~ 11

Author(s):

Daniel Kitzmann ◽

Kevin Heng ◽

Maria Oreshenko ◽

Simon L. Grimm ◽

Dániel Apai ◽

...

Keyword(s):

Open Source ◽

Brown Dwarfs ◽

Retrieval Model ◽

Source Retrieval

Download Full-text

Veridical memory in category-based spatial distortions: A retrieval model

PsycEXTRA Dataset ◽

10.1037/e527342012-420 ◽

2007 ◽

Author(s):

Cristina Sampaio ◽

Ranxiao Frances Wang

Keyword(s):

Retrieval Model

Download Full-text

Plagiarism Detection and Avoidance Consequences in Academic World

Journal of Advanced Research in Library and Information Science ◽

10.24321/2395.2288.201706 ◽

2017 ◽

Vol 04 (04) ◽

pp. 6-13

Author(s):

Akhandanand Shukla ◽

Keyword(s):

Plagiarism Detection ◽

Academic World

Download Full-text

Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i5.2486 ◽

2020 ◽

Vol 4 (5) ◽

pp. 988-997

Author(s):

Sylvia Putri Gunawan ◽

Lucia Dwi Krisnawati ◽

Antonius Rachmat Chrismanto

Keyword(s):

Detection System ◽

Plagiarism Detection ◽

Development System ◽

Intrinsic Plagiarism Detection

Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.

Download Full-text