Open-source tools and benchmarks for code-clone detection

2020 ◽  
Vol 19 (4) ◽  
pp. 28-39 ◽  
Author(s):  
Andrew Walker ◽  
Tomas Cerny ◽  
Eungee Song
2015 ◽  
pp. 1951-1965
Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.


2021 ◽  
Vol 46 (3) ◽  
pp. 24-25
Author(s):  
Armijn Hemel ◽  
Karl Trygve Kalleberg ◽  
Rob Vermaas ◽  
Eelco Dolstra

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.


Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.


Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.


2020 ◽  
Vol 9 (6) ◽  
pp. 3925-3931
Author(s):  
S. Sharma ◽  
D. Rattan ◽  
K. Singh

2018 ◽  
Vol 7 (2.27) ◽  
pp. 144
Author(s):  
Gundeep Kaur ◽  
Sumit Sharma

Object-oriented programming today, is the main prototype in typical software development. Code Cloning defines generally, all through the designing and development of software systems. Detection can be based on Textual analysis, Lexical analysis, Syntax analysis, Semantic analysis, Hybrid analysis and Metric analysis. The major drawback of the present research is that it focuses more on fragments of copied code and does not focus on the aspect that the fragments of duplicated code are may be part of a larger replicated program structure. In this process, techniques take a lot of time and it creates complexity. In our research, a source code is then scanned for detecting various methods by adopting a “OPTIMIZED SVM ALGORITHM” and the method definitions are extracted and collected by means of a CLONE CODE and saved for further reference. To evaluate the performance parameters we calculate the LOC, the number of repetitions, and maximum and minimum LOC. To enhance the performance metrics precision recall, accuracy and reduce the error rate and time complexity  


Sign in / Sign up

Export Citation Format

Share Document