Enhancing the Software Clone Detection in BigCloneBench

2021 ◽  
Vol 12 (3) ◽  
pp. 17-31
Author(s):  
Amandeep Kaur ◽  
Munish Saini

In the software system, the code snippets that are copied and pasted in the same software or another software result in cloning. The basic cause of cloning is either a programmer‘s constraint or language constraints. An increase in the maintenance cost of software is the major drawback of code clones. So, clone detection techniques are required to remove or refactor the code clone. Recent studies exhibit the abstract syntax tree (AST) captures the structural information of source code appropriately. Many researchers used tree-based convolution for identifying the clone, but this technique has certain drawbacks. Therefore, in this paper, the authors propose an approach that finds the semantic clone through square-based convolution by taking abstract syntax representation of source code. Experimental results show the effectiveness of the approach to the popular BigCloneBench benchmark.

Clone is the software code snippets that are similar to each other with little modifications. There is a 10-20 percent clone mostly present in the software. Many techniques are developed for detection. With the code clone detection, the software developer gets an idea of removing, refactoring the clone. Code clone has both advantages and disadvantages in the particular software. In this paper, we explore the types of code clones, advantages, and disadvantages, the reason for cloning. Typically, this paper describes various techniques by using several parameters. Lastly, we discuss gaps in the research.


2018 ◽  
Vol 7 (2.27) ◽  
pp. 144
Author(s):  
Gundeep Kaur ◽  
Sumit Sharma

Object-oriented programming today, is the main prototype in typical software development. Code Cloning defines generally, all through the designing and development of software systems. Detection can be based on Textual analysis, Lexical analysis, Syntax analysis, Semantic analysis, Hybrid analysis and Metric analysis. The major drawback of the present research is that it focuses more on fragments of copied code and does not focus on the aspect that the fragments of duplicated code are may be part of a larger replicated program structure. In this process, techniques take a lot of time and it creates complexity. In our research, a source code is then scanned for detecting various methods by adopting a “OPTIMIZED SVM ALGORITHM” and the method definitions are extracted and collected by means of a CLONE CODE and saved for further reference. To evaluate the performance parameters we calculate the LOC, the number of repetitions, and maximum and minimum LOC. To enhance the performance metrics precision recall, accuracy and reduce the error rate and time complexity  


2014 ◽  
Vol 3 (2) ◽  
pp. 143-152 ◽  
Author(s):  
Naresh Babu Bynagari

This article seeks to foray into the nitty-gritty of integrated reasoning for code clone detection and how it is effectively carried out, given the amount of analytics usually associated with such activities. Detection of codes requires high-pitch familiarity with cloning systems and their workings. Hence, discovering similar code segments that are often regarded and seen as code imitations (clone) is not an easy responsibility. More especially, this very detection process might possess key purposes in the context of susceptibility findings, refactoring, and imitation detecting. Through the voyage of discovery this article intends to expose you to, you will realize that identical code segments, more often than not described as code clones, appear to be a serious duty, especially for large code bases <1; 2; 3; 4>. There are certain approaches and deep technicalities that this sort of detection is known for. Still, from the avalanche of resources that formed the bedrock of this article, one would discover the easiest formula to adopt in maneuvering such strenuous issues.


2021 ◽  
Vol 9 (1) ◽  
pp. 20-36
Author(s):  
Mostefai Abdelkader

Software clone detection is a widely researched area over the last two decades. Code clones are fragments of code judged similar by some metric of similarity. This paper proposes an approach for code clone detection using dynamic time warping technique (i.e., DTW). DTW is a well-known algorithm for aligning and measuring similarity of time series and it has been found effective in many domains where similarity plays an important role such as speech and gesture recognition. The proposed approach finds clones in three steps. First software modules are extracted. Then, the extracted modules are turned to time series. Finally, the time series are compared using the DTW algorithm to find clones. The results of the experiment conducted on a well-known Benchmark show that the approach can detect clones effectively in software systems.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Deqiang Fu ◽  
Yanyan Xu ◽  
Haoran Yu ◽  
Boyang Yang

In this paper, we introduce a source code plagiarism detection method, named WASTK (Weighted Abstract Syntax Tree Kernel), for computer science education. Different from other plagiarism detection methods, WASTK takes some aspects other than the similarity between programs into account. WASTK firstly transfers the source code of a program to an abstract syntax tree and then gets the similarity by calculating the tree kernel of two abstract syntax trees. To avoid misjudgment caused by trivial code snippets or frameworks given by instructors, an idea similar to TF-IDF (Term Frequency-Inverse Document Frequency) in the field of information retrieval is applied. Each node in an abstract syntax tree is assigned a weight by TF-IDF. WASTK is evaluated on different datasets and, as a result, performs much better than other popular methods like Sim and JPlag.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 175347-175359
Author(s):  
Michal Duracik ◽  
Patrik Hrkut ◽  
Emil Krsak ◽  
Stefan Toth

Sign in / Sign up

Export Citation Format

Share Document