Source-Code Similarity Measurement: Syntax Tree Fingerprinting for Automated Evaluation

2021 ◽  
Author(s):  
Arjun Verma ◽  
Prateksha Udhayanan ◽  
Rahul Murali Shankar ◽  
Nikhila KN ◽  
Sujit Kumar Chakrabarti
2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Deqiang Fu ◽  
Yanyan Xu ◽  
Haoran Yu ◽  
Boyang Yang

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 175347-175359
Author(s):  
Michal Duracik ◽  
Patrik Hrkut ◽  
Emil Krsak ◽  
Stefan Toth

2021 ◽  
Author(s):  
Shreya R. Mehta ◽  
Sneha S. Patil ◽  
Nikita S. Shirguppi ◽  
Vahida Attar

Source Code Summarization refers to the task of creating understandable natural language summaries from a given code snippet. Good-quality and precise source code summaries are needed by numerous companies for a platitude of reasons - training for newly joined employees, understanding what a newly imported project does, in brief, maintaining precise summaries on the evolution of source code (using git history), categorizing the code, retrieving the code, automatically generating documents, etc. There is a considerable distinction between source code and natural language since source code is organized, has loops, conditions, structures, classes, and so on. Most of the models follow an encoder-decoder structure, we propose an alternative approach that uses UAST(Universal Abstract Syntax Tree) of the source code to generate tokens and then use the Transformer model for a self-attention mechanism which unlike the RNN method is helpful for capturing long-range dependencies. We have considered Java code snippets for generating code summaries.


2005 ◽  
Vol 30 (4) ◽  
pp. 1-5 ◽  
Author(s):  
Iulian Neamtiu ◽  
Jeffrey S. Foster ◽  
Michael Hicks

Sign in / Sign up

Export Citation Format

Share Document