Detecting Source Code Plagiarism on .NET Programming Languages using Low-level Representation and Adaptive Local Alignment

Even though there are various source code plagiarism detection approaches, only a few works which are focused on low-level representation for deducting similarity. Most of them are only focused on lexical token sequence extracted from source code. In our point of view, low-level representation is more beneficial than lexical token since its form is more compact than the source code itself. It only considers semantic-preserving instructions and ignores many source code delimiter tokens. This paper proposes a source code plagiarism detection which rely on low-level representation. For a case study, we focus our work on .NET programming languages with Common Intermediate Language as its low-level representation. In addition, we also incorporate Adaptive Local Alignment for detecting similarity. According to Lim et al, this algorithm outperforms code similarity state-of-the-art algorithm (i.e. Greedy String Tiling) in term of effectiveness. According to our evaluation which involves various plagiarism attacks, our approach is more effective and efficient when compared with standard lexical-token approach. 

Download Full-text

Source Code Plagiarism Detection Using Biological String Similarity Algorithms

Journal of Information & Knowledge Management ◽

10.1142/s0219649214500282 ◽

2014 ◽

Vol 13 (03) ◽

pp. 1450028 ◽

Cited By ~ 3

Author(s):

Imad Rahal ◽

Colin Wielga

Keyword(s):

Programming Languages ◽

Source Code ◽

Visual Basic ◽

Biological Sequence ◽

Large Collection ◽

Plagiarism Detection ◽

String Similarity ◽

What Works ◽

String Comparison

Source code plagiarism is easy to commit but difficult to catch. Many approaches have been proposed in the literature to automate its detection; however there is little consensus on what works best. In this paper, we propose two new measures for determining the accuracy of a given technique and describe an approach to convert code files into strings which can then be compared for similarity in order to detect plagiarism. We then compare several string comparison techniques, heavily utilised in the area of biological sequence alignment, and compare their performance on a large collection of student source code containing various types of plagiarism. Experimental results show that the compared techniques succeed in matching a plagiarised file to its original files upwards of 90% of the time. Finally, we propose a modification for these algorithms that drastically improves their runtimes with little or no effect on accuracy. Even though the ideas presented herein are applicable to most programming languages, we focus on a case study pertaining to an introductory-level Visual Basic programming course offered at our institution.

Download Full-text

Generating Adversarial Examples for Holding Robustness of Source Code Processing Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5469 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1169-1176

Author(s):

Huangzhao Zhang ◽

Zhuo Li ◽

Ge Li ◽

Lei Ma ◽

Yang Liu ◽

...

Keyword(s):

Programming Languages ◽

State Of The Art ◽

Source Code ◽

Natural Languages ◽

Current State ◽

Automated Processing ◽

Adversarial Examples ◽

Adversarial Training ◽

New Challenges ◽

And Performance

Automated processing, analysis, and generation of source code are among the key activities in software and system lifecycle. To this end, while deep learning (DL) exhibits a certain level of capability in handling these tasks, the current state-of-the-art DL models still suffer from non-robust issues and can be easily fooled by adversarial attacks.Different from adversarial attacks for image, audio, and natural languages, the structured nature of programming languages brings new challenges. In this paper, we propose a Metropolis-Hastings sampling-based identifier renaming technique, named \fullmethod (\method), which generates adversarial examples for DL models specialized for source code processing. Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of \method in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with \method further confirms the usefulness of DL models-based method for future fully automated source code processing.

Download Full-text

POPLMark reloaded: Mechanizing proofs by logical relations

Journal of Functional Programming ◽

10.1017/s0956796819000170 ◽

2019 ◽

Vol 29 ◽

Cited By ~ 3

Author(s):

ANDREAS ABEL ◽

GUILLAUME ALLAIS ◽

ALIYA HAMEER ◽

BRIGITTE PIENTKA ◽

ALBERTO MOMIGLIANO ◽

...

Keyword(s):

Programming Languages ◽

State Of The Art ◽

General Purpose ◽

Higher Order ◽

Lessons Learned ◽

Benchmark Problems ◽

Proof Assistants ◽

Abstract Syntax ◽

The Masses

Abstract We propose a new collection of benchmark problems in mechanizing the metatheory of programming languages, in order to compare and push the state of the art of proof assistants. In particular, we focus on proofs using logical relations (LRs) and propose establishing strong normalization of a simply typed calculus with a proof by Kripke-style LRs as a benchmark. We give a modern view of this well-understood problem by formulating our LR on well-typed terms. Using this case study, we share some of the lessons learned tackling this problem in different dependently typed proof environments. In particular, we consider the mechanization in Beluga, a proof environment that supports higher-order abstract syntax encodings and contrast it to the development and strategies used in general-purpose proof assistants such as Coq and Agda. The goal of this paper is to engage the community in discussions on what support in proof environments is needed to truly bring mechanized metatheory to the masses and engage said community in the crafting of future benchmarks.

Download Full-text

Flowchart-Based Cross-Language Source Code Similarity Detection

Scientific Programming ◽

10.1155/2020/8835310 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Feng Zhang ◽

Guofan Li ◽

Cong Liu ◽

Qian Song

Keyword(s):

Programming Languages ◽

Programming Language ◽

Computer Programming ◽

Source Code ◽

Common Source ◽

Intermediate Structure ◽

Plagiarism Detection ◽

Code Obfuscation ◽

Similarity Detection ◽

Cross Language

Source code similarity detection has various applications in code plagiarism detection and software intellectual property protection. In computer programming teaching, students may convert the source code written in one programming language into another language for their code assignment submission. Existing similarity measures of source code written in the same language are not applicable for the cross-language code similarity detection because of syntactic differences among different programming languages. Meanwhile, existing cross-language source similarity detection approaches are susceptible to complex code obfuscation techniques, such as replacing equivalent control structure and adding redundant statements. To solve this problem, we propose a cross-language code similarity detection (CLCSD) approach based on code flowcharts. In general, two source code fragments written in different programming languages are transformed into standardized code flowcharts (SCFC), and their similarity is obtained by measuring their corresponding SCFC. More specifically, we first introduce the standardized code flowchart (SCFC) model to be the uniform flowcharts representation of source code written in different languages. SCFC is language-independent, and therefore, it can be used as the intermediate structure for source code similarity detection. Meanwhile, transformation techniques are given to transform source code written in a specific programming language into an SCFC. Second, we propose the SCFC-SPGK algorithm based on the shortest path graph kernel to measure the similarity between two SCFCs. Thus, the similarity between two pieces of source code in different programming languages is given by the similarity between SCFCs. Experimental results show that compared with existing approaches, CLCSD has higher accuracy in cross-language source code similarity detection. Furthermore, CLCSD cannot only handle common source code obfuscation techniques used by students in computer programming teaching but also obtain nearly 90% accuracy in dealing with some complex obfuscation techniques.

Download Full-text

Source code plagiarism detection with low-level structural representation and information retrieval

International Journal of Computers and Applications ◽

10.1080/1206212x.2019.1589944 ◽

2019 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Oscar Karnalim

Keyword(s):

Information Retrieval ◽

Source Code ◽

Plagiarism Detection ◽

Structural Representation ◽

Low Level

Download Full-text

Model-Driven Exception Management Case Study

Handbook of Research on Software Engineering and Productivity Technologies ◽

10.4018/978-1-60566-731-7.ch012 ◽

2010 ◽

pp. 153-173

Author(s):

Susan Entwisle ◽

Sita Ramakrishnan ◽

Elizabeth Kendall

Keyword(s):

Programming Languages ◽

Fault Tolerant ◽

Exception Handling ◽

Software Systems ◽

Management Framework ◽

Low Level ◽

Model Driven ◽

Level Of Abstraction ◽

Management Case Study

Programming languages provide exception handling mechanisms to structure fault tolerant activities within software systems. However, the use of exceptions at this low level of abstraction can be errorprone and complex, potentially leading to new programming errors. To address this we have developed a model-driven exception management framework (DOVE). This approach is a key enabler to support global distributed solution delivery teams. The focus of this paper is the evaluation of the feasibility of this approach through a case study, known as Project Tracker. The case study is used to demonstrate the feasibility and to perform an assessment based on quality and productivity metrics and testing of the DOVE framework. The results of the case study are presented to demonstrate the feasibility of our approach.

Download Full-text

Novel Code Plagiarism Detection Based on Abstract Syntax Tree and Fuzzy Petri Nets

International Journal of Engineering Education ◽

10.14710/ijee.1.1.46-56 ◽

2019 ◽

Vol 1 (1) ◽

pp. 46-56 ◽

Cited By ~ 1

Author(s):

Victor R. L. Shen

Keyword(s):

Programming Languages ◽

Source Code ◽

Learning Performance ◽

Abstract Syntax ◽

Plagiarism Detection ◽

Abstract Syntax Tree ◽

Source Codes ◽

Syntax Tree ◽

Fuzzy Petri Nets ◽

High Level

Those students who major in computer science and/or engineering are required to design program codes in a variety of programming languages. However, many students submit their source codes they get from the Internet or friends with no or few modifications. Detecting the code plagiarisms done by students is very time-consuming and leads to the problems of unfair learning performance evaluation. This paper proposes a novel method to detect the source code plagiarisms by using a high-level fuzzy Petri net (HLFPN) based on abstract syntax tree (AST). First, the AST of each source code is generated after the lexical and syntactic analyses have been done. Second, token sequence is generated based on the AST. Using the AST can effectively detect the code plagiarism by changing the identifier or program statement order. Finally, the generated token sequences are compared with one another using an HLFPN to determine the code plagiarism. Furthermore, the experimental results have indicated that we can make better determination to detect the code plagiarism.

Download Full-text

Python Source Code Plagiarism Attacks in Object-Oriented Environment

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v6i3.217 ◽

2017 ◽

Vol 6 (3) ◽

pp. 87-84

Author(s):

Oscar Karnalim ◽

Aldi Aldiansyah

Keyword(s):

Data Structure ◽

Computer Science ◽

Prior Knowledge ◽

Programming Language ◽

Detection System ◽

Source Code ◽

Object Oriented ◽

Plagiarism Detection ◽

Science Major

Since source code plagiarism is an emerging issue on Computer Science major and Python is a new popular programming language, this paper aims to empirically enlist plagiarism attacks that might be occurred on Python source code. As our case study, our work will be focused on source code plagiarism in object-oriented environment. The result of this work is expected to become either an evaluation baseline or a prior knowledge for developing Python-targeted plagiarism detection system. Based on 280 plagiarism-suspected pairs that were extracted from four Basic Data Structure classes, four findings can be deducted. First, there are 20 distinct Python plagiarism attacks that might be occurred in object-oriented environment. Second, plagiarism attack trend on both object-oriented and procedural environment are considerably similar to each other. Third, there is no need to handle plagiarism attacks in both object-oriented and procedural environment separately. Last, plagiarism attacks in object-oriented environment is more monotonous than such attacks in procedural environment.

Download Full-text

What is Open Data and How to Benefit from It

Zagadnienia Informacji Naukowej - Studia Informacyjne ◽

10.36702/zin.533 ◽

2014 ◽

Vol 52 (1(103)) ◽

pp. 43-51

Author(s):

Sebastian Grabowski

Keyword(s):

Service Providers ◽

Source Code ◽

Open Data ◽

Point Of View ◽

The Internet ◽

Communication Service ◽

End User ◽

Case Study Analysis ◽

Internet Environment

Purpose/Thesis: The aim of this paper is to introduce the concept that Open Data and Open APIs provided by Communication Service Providers integrated in one end-user-oriented application may considerably improve the process of communication between people and institutions. Approach/Methods: Open Data is one of the key elements of the broad Internet ecosystem; other elements, such as open interfaces, open source, API, etc., are the assets that make the Internet environment robust, scalable, and extendable. The paper, based on the case study analysis, presents selected applications integrating the communication enablers in the form of Open APIs and Open Data sources. Results and conclusion: The combination of Open Data and functions provided by telecommunications operators in the form of Open APIs significantly improves and facilitates the processes of communication between people and institutions. Originality/Value: The author proposes to integrate Open Data with real time communication functions provided by Communication Service Providers in the form of Open APIs. Open Data and Open APIs are effective tools to create user-made environments that are convergent and coherent from the application source code point of view.

Download Full-text

A Case Study on the Change of God Representation from the Self-psychological point of view

The Gospel and Praxis ◽

10.25309/kept.2019.8.15.199 ◽

2019 ◽

Vol 52 ◽

pp. 199-231

Author(s):

Kyoung Sun Chae ◽

Keyword(s):

Point Of View ◽

The Self ◽

Psychological Point ◽

God Representation

Download Full-text