Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.

Download Full-text

Specifying and Detecting Behavioral Changes in Source Code Using Abstract Syntax Tree Differencing

Trustworthy Computing and Services - Communications in Computer and Information Science ◽

10.1007/978-3-642-35795-4_59 ◽

2013 ◽

pp. 466-473

Author(s):

Yuankui Li ◽

Linzhang Wang

Keyword(s):

Source Code ◽

Behavioral Changes ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

Code Summarization: Generating Summary of Code Snippets

10.21467/proceedings.114.47 ◽

2021 ◽

Author(s):

Shreya R. Mehta ◽

Sneha S. Patil ◽

Nikita S. Shirguppi ◽

Vahida Attar

Keyword(s):

Natural Language ◽

Long Range ◽

Source Code ◽

Attention Mechanism ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree ◽

Alternative Approach ◽

Transformer Model ◽

Java Code

Source Code Summarization refers to the task of creating understandable natural language summaries from a given code snippet. Good-quality and precise source code summaries are needed by numerous companies for a platitude of reasons - training for newly joined employees, understanding what a newly imported project does, in brief, maintaining precise summaries on the evolution of source code (using git history), categorizing the code, retrieving the code, automatically generating documents, etc. There is a considerable distinction between source code and natural language since source code is organized, has loops, conditions, structures, classes, and so on. Most of the models follow an encoder-decoder structure, we propose an alternative approach that uses UAST(Universal Abstract Syntax Tree) of the source code to generate tokens and then use the Transformer model for a self-attention mechanism which unlike the RNN method is helpful for capturing long-range dependencies. We have considered Java code snippets for generating code summaries.

Download Full-text

Understanding source code evolution using abstract syntax tree matching

ACM SIGSOFT Software Engineering Notes ◽

10.1145/1082983.1083143 ◽

2005 ◽

Vol 30 (4) ◽

pp. 1-5 ◽

Cited By ~ 35

Author(s):

Iulian Neamtiu ◽

Jeffrey S. Foster ◽

Michael Hicks

Keyword(s):

Source Code ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree ◽

Code Evolution

Download Full-text

A source code plagiarism detecting method using alignment with abstract syntax tree elements

15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) ◽

10.1109/snpd.2014.6888733 ◽

2014 ◽

Cited By ~ 6

Author(s):

Hiroshi Kikuchi ◽

Takaaki Goto ◽

Mitsuo Wakatsuki ◽

Tetsuro Nishino

Keyword(s):

Source Code ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree ◽

Detecting Method

Download Full-text

Source code pattern as anchored abstract syntax tree

2014 IEEE 5th International Conference on Software Engineering and Service Science ◽

10.1109/icsess.2014.6933538 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ken Nakayama ◽

Eko Sakai

Keyword(s):

Source Code ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

A Novel Neural Source Code Representation Based on Abstract Syntax Tree

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) ◽

10.1109/icse.2019.00086 ◽

2019 ◽

Cited By ~ 14

Author(s):

Jian Zhang ◽

Xu Wang ◽

Hongyu Zhang ◽

Hailong Sun ◽

Kaixuan Wang ◽

...

Keyword(s):

Source Code ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

Reverse engineering of source code to sequence diagram using abstract syntax tree

2016 International Conference on Data and Software Engineering (ICoDSE) ◽

10.1109/icodse.2016.7936137 ◽

2016 ◽

Cited By ~ 3

Author(s):

Esa Fauzi ◽

Bayu Hendradjaya ◽

Wikan Danar Sunindyo

Keyword(s):

Reverse Engineering ◽

Source Code ◽

Sequence Diagram ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

Novel Code Plagiarism Detection Based on Abstract Syntax Tree and Fuzzy Petri Nets

International Journal of Engineering Education ◽

10.14710/ijee.1.1.46-56 ◽

2019 ◽

Vol 1 (1) ◽

pp. 46-56 ◽

Cited By ~ 1

Author(s):

Victor R. L. Shen

Keyword(s):

Programming Languages ◽

Source Code ◽

Learning Performance ◽

Abstract Syntax ◽

Plagiarism Detection ◽

Abstract Syntax Tree ◽

Source Codes ◽

Syntax Tree ◽

Fuzzy Petri Nets ◽

High Level

Those students who major in computer science and/or engineering are required to design program codes in a variety of programming languages. However, many students submit their source codes they get from the Internet or friends with no or few modifications. Detecting the code plagiarisms done by students is very time-consuming and leads to the problems of unfair learning performance evaluation. This paper proposes a novel method to detect the source code plagiarisms by using a high-level fuzzy Petri net (HLFPN) based on abstract syntax tree (AST). First, the AST of each source code is generated after the lexical and syntactic analyses have been done. Second, token sequence is generated based on the AST. Using the AST can effectively detect the code plagiarism by changing the identifier or program statement order. Finally, the generated token sequences are compared with one another using an HLFPN to determine the code plagiarism. Furthermore, the experimental results have indicated that we can make better determination to detect the code plagiarism.

Download Full-text