Source code plagiarism detection: The Unix way

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.

Download Full-text

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 ◽

10.1145/3383583.3398594 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tomáš Foltýnek ◽

Richard Všianský ◽

Norman Meuschke ◽

Dita Dlabolová ◽

Bela Gipp

Keyword(s):

Semantic Analysis ◽

Source Code ◽

Plagiarism Detection ◽

Cross Language ◽

Explicit Semantic Analysis

Download Full-text

Work in progress — A novel methodology to reduce instructors' and students' psychological burdens in source code plagiarism detection

2010 IEEE Frontiers in Education Conference (FIE) ◽

10.1109/fie.2010.5673650 ◽

2010 ◽

Cited By ~ 1

Author(s):

Asako Ohno ◽

Hajime Murao

Keyword(s):

Source Code ◽

Plagiarism Detection ◽

Work In Progress

Download Full-text

Instructor-centric source code plagiarism detection and plagiarism corpus

Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education - ITiCSE '12 ◽

10.1145/2325296.2325328 ◽

2012 ◽

Cited By ~ 10

Author(s):

Jonathan Y.H. Poon ◽

Kazunari Sugiyama ◽

Yee Fan Tan ◽

Min-Yen Kan

Keyword(s):

Source Code ◽

Plagiarism Detection

Download Full-text

Java Source Code Plagiarism Detection System

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2018.3748 ◽

2018 ◽

Vol 6 (3) ◽

pp. 3596-3600

Author(s):

Mrs. Ghuge Madhuri Laxman

Keyword(s):

Detection System ◽

Source Code ◽

Plagiarism Detection

Download Full-text

Source Code Representations for Plagiarism Detection

Communications in Computer and Information Science - Learning Technology for Education Challenges ◽

10.1007/978-3-319-95522-3_6 ◽

2018 ◽

pp. 61-69 ◽

Cited By ~ 1

Author(s):

Michal Ďuračík ◽

Emil Kršák ◽

Patrik Hrkút

Keyword(s):

Source Code ◽

Plagiarism Detection

Download Full-text

Source Code Plagiarism Detection in an Educational Context: A Literature Mapping

10.1109/fie49875.2021.9637155 ◽

2021 ◽

Author(s):

Rodrigo C Aniceto ◽

Maristela Holanda ◽

Carla Castanho ◽

Dilma Da Silva

Keyword(s):

Source Code ◽

Educational Context ◽

Plagiarism Detection ◽

Literature Mapping

Download Full-text

Plagiarism Detection Algorithm for Source Code in Computer Science Education

International Journal of Distance Education Technologies ◽

10.4018/ijdet.2015100102 ◽

2015 ◽

Vol 13 (4) ◽

pp. 29-39 ◽

Cited By ~ 5

Author(s):

Xin Liu ◽

Chan Xu ◽

Boyu Ouyang

Keyword(s):

Hash Function ◽

Source Code ◽

College Education ◽

Detection Algorithm ◽

Longest Common Subsequence ◽

The Other ◽

Plagiarism Detection ◽

Detection Algorithms ◽

Common Subsequence ◽

Basic Concepts

Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.

Download Full-text

A Comparison of Source Code Plagiarism Detection Engines

Computer Science Education ◽

10.1080/08993400412331363843 ◽

2004 ◽

Vol 14 (2) ◽

pp. 101-112 ◽

Cited By ~ 32

Author(s):

Thomas Lancaster ◽

Fintan Culwin

Keyword(s):

Source Code ◽

Plagiarism Detection

Download Full-text

WASTK: A Weighted Abstract Syntax Tree Kernel Method for Source Code Plagiarism Detection

Scientific Programming ◽

10.1155/2017/7809047 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 12

Author(s):

Deqiang Fu ◽

Yanyan Xu ◽

Haoran Yu ◽

Boyang Yang

Keyword(s):

Kernel Method ◽

Source Code ◽

Detection Methods ◽

Abstract Syntax ◽

Plagiarism Detection ◽

Abstract Syntax Tree ◽

Syntax Tree ◽

Tree Kernel ◽

Document Frequency ◽

Abstract Syntax Trees

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.

Download Full-text