Constructing an Academic Thai Plagiarism Corpus for Benchmarking Plagiarism Detection Systems

AbstractThe article offers analyses of the phenomenon of copying (plagiarism) in higher education. The analyses were based on a quantitative survey using questionnaires, conducted in 2019 at one of the Polish universities. Plagiarism is discussed here both as an element of the learning process and a subject of public practices. The article presents students’ definitions of plagiarism, their strategies for unclear or difficult situations, their experiences with plagiarism and their opinions on how serious and widespread this phenomenon is. Focusing on the non-plagiarism norm, that is the rule that students are not allowed to plagiarize, and in order to redefine it we have determined two strategies adopted by students. The first is withdrawing in fear of making a mistake (omitting the norm), which means not using referencing in unclear situations, e.g. when the data about the source of information are absent. The second is reducing the scope of the norm applicability (limiting the norm), characterized by the fact that there are areas where the non-plagiarism norm must be observed more closely and those where it is not so important, e.g. respondents classify works as credit-level and diploma-level texts, as in the credit-level work they “can” sometimes plagiarize since the detection rate is poor and consequences are not severe. The presented results are particularly significant for interpreting plagiarism in an international context (no uniform definition of plagiarism) and for policies aimed at limiting the scale of the phenomenon (plagiarism detection systems1).

Download Full-text

Systems for the Production of Plagiarists? The Implications Arising from the Use of Plagiarism Detection Systems in UK Universities for Asian Learners

Journal of Academic Ethics ◽

10.1007/s10805-006-9006-4 ◽

2005 ◽

Vol 3 (1) ◽

pp. 55-73 ◽

Cited By ~ 27

Author(s):

Niall Hayes ◽

Lucas Introna

Keyword(s):

Plagiarism Detection ◽

Detection Systems

Download Full-text

Preference comparison for plagiarism detection systems

2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2016.7737903 ◽

2016 ◽

Cited By ~ 1

Author(s):

Sarka Krizkova ◽

Hana Tomaskova ◽

Martin Gavalec

Keyword(s):

Plagiarism Detection ◽

Detection Systems

Download Full-text

A Survey on Plagiarism Detection Systems

International Journal of Computer Theory and Engineering ◽

10.7763/ijcte.2012.v4.447 ◽

2012 ◽

pp. 185-188 ◽

Cited By ~ 11

Author(s):

A. S. Bin-Habtoor ◽

M. A. Zaher

Keyword(s):

Plagiarism Detection ◽

Detection Systems

Download Full-text

iChecker

Scholarly Ethics and Publishing ◽

10.4018/978-1-5225-8057-7.ch011 ◽

2019 ◽

pp. 232-247

Author(s):

Samuel P. M. Choi ◽

Sze Sing Lam

Keyword(s):

Information Retrieval ◽

Chinese Text ◽

False Negative ◽

Detection Algorithm ◽

Sequence Matching ◽

Plagiarism Detection ◽

Detection Systems ◽

Negative Results ◽

New Information ◽

False Negative Results

Academic plagiarism is regarded as a serious offense and much effort in the past has been devoted to build stand-alone plagiarism detection systems for a specific language. This paper proposes a new information retrieval-based plagiarism detection algorithm that handles multilingual documents and enables seamless integration with learning management systems. The proposed algorithm employs information retrieval and sequence matching techniques to identify suspected plagiarized sentences and permits parametric control to reduce both false-positive and false-negative results. The full-featured implementation, called iChecker, not only could quickly identify suspected plagiarized works but also ease academics' effort to evaluate the severity of the offence by a quantified measure. Currently iChecker is adopted by over 300 courses (with some having several hundred of students) and has obtained satisfactory results. During 2012 to 2016, iChecker has processed and verified a total of 276,943 documents in English, Traditional Chinese and Simplified Chinese text.

Download Full-text

DEVELOPMENT OF YOUTH INNOVATIONS BY MEANS OF ELECTRONIC PLAGIARISM DETECTION SYSTEMS

Человек. Общество. Наука ◽

10.53015/2686-8172_2020_3_57 ◽

2020 ◽

pp. 57-60

Author(s):

O.N. Blinnikova ◽

N.I. Nesterova ◽

G.R. Pachin

Keyword(s):

Plagiarism Detection ◽

Detection Systems

Download Full-text

Evaluation of State-of-the-Art Paraphrase Identification and Its Application to Automatic Plagiarism Detection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420530043 ◽

2019 ◽

Vol 34 (04) ◽

pp. 2053004 ◽

Cited By ~ 1

Author(s):

Alaa Altheneyan ◽

Mohamed El Bachir Menai

Keyword(s):

Language Processing ◽

Question Answering ◽

Research Direction ◽

Support Vector ◽

Critical Study ◽

Plagiarism Detection ◽

Detection Systems ◽

Learning Techniques ◽

Vector Machines ◽

Performance Results

Paraphrase identification is a natural language processing (NLP) problem that involves the determination of whether two text segments have the same meaning. Various NLP applications rely on a solution to this problem, including automatic plagiarism detection, text summarization, machine translation (MT), and question answering. The methods for identifying paraphrases found in the literature fall into two main classes: similarity-based methods and classification methods. This paper presents a critical study and an evaluation of existing methods for paraphrase identification and its application to automatic plagiarism detection. It presents the classes of paraphrase phenomena, the main methods, and the sets of features used by each particular method. All the methods and features used are discussed and enumerated in a table for easy comparison. Their performances on benchmark corpora are also discussed and compared via tables. Automatic plagiarism detection is presented as an application of paraphrase identification. The performances on benchmark corpora of existing plagiarism detection systems able to detect paraphrases are compared and discussed. The main outcome of this study is the identification of word overlap, structural representations, and MT measures as feature subsets that lead to the best performance results for support vector machines in both paraphrase identification and plagiarism detection on corpora. The performance results achieved by deep learning techniques highlight that these techniques are the most promising research direction in this field.

Download Full-text

Boosting Algorithm and Meta-Heuristic Based on Genetic Algorithms for Textual Plagiarism Detection

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2015100105 ◽

2015 ◽

Vol 9 (4) ◽

pp. 65-87 ◽

Cited By ~ 10

Author(s):

Hadj Ahmed Bouarara ◽

Reda Mohamed Hamou ◽

Amine Rahmani ◽

Abdelmalek Amine

Keyword(s):

Learning Algorithm ◽

Fitness Function ◽

Modern World ◽

Initial Population ◽

Plagiarism Detection ◽

Detection Systems ◽

Global View ◽

Boosting Algorithm ◽

Crossover And Mutation

Day after day, the plagiarism cases increase and become a crucial problem in the modern world, caused by the quantity of textual information available in the web and the development of communication means such as email service. This paper deals on the unveiling of two plagiarism detection systems: Firstly boosting system based on machine learning algorithm (decision tree C4.5 and K nearest neighbour) composed on three steps (text pre-processing, first detection, and second detection). Secondly using genetic algorithm based on an initial population generated from the dataset used a fitness function fixed and the reproduction rules (selection, crossover, and mutation). For their experimentation, the authors have used the benchmark pan 09 and a set of validation measures (precision, recall, f-measure, FNR, FPR, and entropy) with a variation in configuration of each system; They have compared their results with the performance of other approaches found in literature; Finally, the visualisation service was developed that provides a graphical vision of the results using two methods (3D cub and a cobweb) with the possibility to have a detailed and global view using the functionality of zooming and rotation. The authors' aims are to improve the quality of plagiarism detection systems and preservation of copyright.

Download Full-text

An Analysis of Student Privacy Rights in the Use of Plagiarism Detection Systems

Science and Engineering Ethics ◽

10.1007/s11948-012-9370-y ◽

2012 ◽

Vol 19 (3) ◽

pp. 1255-1266 ◽

Cited By ~ 8

Author(s):

Bo Brinkman

Keyword(s):

Privacy Rights ◽

Plagiarism Detection ◽

Detection Systems ◽

Student Privacy

Download Full-text

Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection

Computational Linguistics ◽

10.1162/coli_a_00153 ◽

2013 ◽

Vol 39 (4) ◽

pp. 917-947 ◽

Cited By ~ 54

Author(s):

Alberto Barrón-Cedeño ◽

Marta Vila ◽

M. Martí ◽

Paolo Rosso

Keyword(s):

State Of The Art ◽

High Density ◽

International Competition ◽

Next Generation ◽

Plagiarism Detection ◽

Detection Systems ◽

Its Analysis ◽

First Time ◽

The Relationship ◽

Second International

Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation. The presented experiments show that (i) more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, (ii) lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and (iii) paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.

Download Full-text