scholarly journals Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 175347-175359
Author(s):  
Michal Duracik ◽  
Patrik Hrkut ◽  
Emil Krsak ◽  
Stefan Toth
2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Deqiang Fu ◽  
Yanyan Xu ◽  
Haoran Yu ◽  
Boyang Yang

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.


2021 ◽  
Author(s):  
Shreya R. Mehta ◽  
Sneha S. Patil ◽  
Nikita S. Shirguppi ◽  
Vahida Attar

Source Code Summarization refers to the task of creating understandable natural language summaries from a given code snippet. Good-quality and precise source code summaries are needed by numerous companies for a platitude of reasons - training for newly joined employees, understanding what a newly imported project does, in brief, maintaining precise summaries on the evolution of source code (using git history), categorizing the code, retrieving the code, automatically generating documents, etc. There is a considerable distinction between source code and natural language since source code is organized, has loops, conditions, structures, classes, and so on. Most of the models follow an encoder-decoder structure, we propose an alternative approach that uses UAST(Universal Abstract Syntax Tree) of the source code to generate tokens and then use the Transformer model for a self-attention mechanism which unlike the RNN method is helpful for capturing long-range dependencies. We have considered Java code snippets for generating code summaries.


2005 ◽  
Vol 30 (4) ◽  
pp. 1-5 ◽  
Author(s):  
Iulian Neamtiu ◽  
Jeffrey S. Foster ◽  
Michael Hicks

2019 ◽  
Vol 1 (1) ◽  
pp. 46-56 ◽  
Author(s):  
Victor R. L. Shen

Those students who major in computer science and/or engineering are required to design program codes in a variety of programming languages. However, many students submit their source codes they get from the Internet or friends with no or few modifications. Detecting the code plagiarisms done by students is very time-consuming and leads to the problems of unfair learning performance evaluation. This paper proposes a novel method to detect the source code plagiarisms by using a high-level fuzzy Petri net (HLFPN) based on abstract syntax tree (AST). First, the AST of each source code is generated after the lexical and syntactic analyses have been done. Second, token sequence is generated based on the AST. Using the AST can effectively detect the code plagiarism by changing the identifier or program statement order. Finally, the generated token sequences are compared with one another using an HLFPN to determine the code plagiarism. Furthermore, the experimental results have indicated that we can make better determination to detect the code plagiarism.


Sign in / Sign up

Export Citation Format

Share Document