Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.

Download Full-text

Specifying and Detecting Behavioral Changes in Source Code Using Abstract Syntax Tree Differencing

Trustworthy Computing and Services - Communications in Computer and Information Science ◽

10.1007/978-3-642-35795-4_59 ◽

2013 ◽

pp. 466-473

Author(s):

Yuankui Li ◽

Linzhang Wang

Keyword(s):

Source Code ◽

Behavioral Changes ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

IEEE Access ◽

10.1109/access.2020.3026422 ◽

2020 ◽

Vol 8 ◽

pp. 175347-175359

Author(s):

Michal Duracik ◽

Patrik Hrkut ◽

Emil Krsak ◽

Stefan Toth

Keyword(s):

Source Code ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree

Download Full-text

Syntax tree fingerprinting for source code similarity detection

2009 IEEE 17th International Conference on Program Comprehension ◽

10.1109/icpc.2009.5090050 ◽

2009 ◽

Cited By ~ 30

Author(s):

Michel Chilowicz ◽

Etienne Duris ◽

Gilles Roussel

Keyword(s):

Source Code ◽

Syntax Tree ◽

Similarity Detection

Download Full-text

Code Summarization: Generating Summary of Code Snippets

10.21467/proceedings.114.47 ◽

2021 ◽

Author(s):

Shreya R. Mehta ◽

Sneha S. Patil ◽

Nikita S. Shirguppi ◽

Vahida Attar

Keyword(s):

Natural Language ◽

Long Range ◽

Source Code ◽

Attention Mechanism ◽

Abstract Syntax ◽

Abstract Syntax Tree ◽

Syntax Tree ◽

Alternative Approach ◽

Transformer Model ◽

Java Code

Source Code Summarization refers to the task of creating understandable natural language summaries from a given code snippet. Good-quality and precise source code summaries are needed by numerous companies for a platitude of reasons - training for newly joined employees, understanding what a newly imported project does, in brief, maintaining precise summaries on the evolution of source code (using git history), categorizing the code, retrieving the code, automatically generating documents, etc. There is a considerable distinction between source code and natural language since source code is organized, has loops, conditions, structures, classes, and so on. Most of the models follow an encoder-decoder structure, we propose an alternative approach that uses UAST(Universal Abstract Syntax Tree) of the source code to generate tokens and then use the Transformer model for a self-attention mechanism which unlike the RNN method is helpful for capturing long-range dependencies. We have considered Java code snippets for generating code summaries.

Download Full-text