Open-source tools and benchmarks for code-clone detection

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Download Full-text

Finding Software License Violations Through Binary Code Clone Detection - A Retrospective

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468752 ◽

2021 ◽

Vol 46 (3) ◽

pp. 24-25

Author(s):

Armijn Hemel ◽

Karl Trygve Kalleberg ◽

Rob Vermaas ◽

Eelco Dolstra

Keyword(s):

Open Source ◽

Original Problem ◽

Binary Code ◽

Source Code ◽

Clone Detection ◽

Problem Statement ◽

Open Source Code ◽

Code Clone ◽

Software License ◽

The Impact

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.

Download Full-text

CCEyes: An Effective Tool for Code Clone Detection on Large-Scale Open Source Repositories

2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE) ◽

10.1109/icicse52190.2021.9404141 ◽

2021 ◽

Author(s):

Yanzhi Zhang ◽

Tao Wang

Keyword(s):

Open Source ◽

Large Scale ◽

Clone Detection ◽

Code Clone

Download Full-text

Code Clone Detection and Analysis in Open Source Applications

Computer Systems and Software Engineering ◽

10.4018/978-1-5225-3923-0.ch044 ◽

2017 ◽

pp. 1112-1127 ◽

Cited By ~ 2

Author(s):

Al-Fahim Mubarak-Ali ◽

Shahida Sulaiman ◽

Sharifah Mashita Syed-Mohamad ◽

Zhenchang Xing

Keyword(s):

Comparative Study ◽

Open Source ◽

Empirical Evaluation ◽

Two Dimensions ◽

Clone Detection ◽

Code Clones ◽

Time Performance ◽

Code Clone ◽

Current State ◽

Pipeline Model

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Download Full-text

Code Clone Detection and Analysis in Open Source Applications

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Emerging Advancements and Technologies in Software Engineering ◽

10.4018/978-1-4666-6026-7.ch022 ◽

2014 ◽

pp. 494-509

Author(s):

Al-Fahim Mubarak-Ali ◽

Shahida Sulaiman ◽

Sharifah Mashita Syed-Mohamad ◽

Zhenchang Xing

Keyword(s):

Comparative Study ◽

Open Source ◽

Empirical Evaluation ◽

Two Dimensions ◽

Clone Detection ◽

Code Clones ◽

Time Performance ◽

Code Clone ◽

Current State ◽

Pipeline Model

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Download Full-text

A Survey on Software Code Clone Detection to Improve the Maintenance Effort and Maintenance Cost of the Software

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6si3.188192 ◽

2018 ◽

Vol 06 (03) ◽

pp. 188-192

Author(s):

V. Guna ◽

M. Sunil Kumar

Keyword(s):

Maintenance Cost ◽

Clone Detection ◽

Code Clone ◽

Software Code

Download Full-text

CODE CLONE DETECTION USING OBJECT ORIENTED METRICS

Advances in Mathematics: Scientific Journal ◽

10.37418/amsj.9.6.72 ◽

2020 ◽

Vol 9 (6) ◽

pp. 3925-3931

Author(s):

S. Sharma ◽

D. Rattan ◽

K. Singh

Keyword(s):

Object Oriented ◽

Clone Detection ◽

Code Clone ◽

Object Oriented Metrics

Download Full-text

VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets

2020 IEEE European Symposium on Security and Privacy (EuroS&P) ◽

10.1109/eurosp48549.2020.00012 ◽

2020 ◽

Author(s):

Benjamin Bowman ◽

H. Howie Huang

Keyword(s):

Detection System ◽

Clone Detection ◽

Code Clone ◽

Code Property

Download Full-text

Multi-threshold token-based code clone detection

2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner50967.2021.00053 ◽

2021 ◽

Author(s):

Yaroslav Golubev ◽

Viktor Poletansky ◽

Nikita Povarov ◽

Timofey Bryksin

Keyword(s):

Clone Detection ◽

Code Clone

Download Full-text

Metric level based code clone detection using optimized code manager

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.27.13763 ◽

2018 ◽

Vol 7 (2.27) ◽

pp. 144

Author(s):

Gundeep Kaur ◽

Sumit Sharma

Keyword(s):

Semantic Analysis ◽

Performance Metrics ◽

Object Oriented Programming ◽

Software Systems ◽

Clone Detection ◽

Major Drawback ◽

Syntax Analysis ◽

Code Clone ◽

Svm Algorithm ◽

Optimized Code

Object-oriented programming today, is the main prototype in typical software development. Code Cloning defines generally, all through the designing and development of software systems. Detection can be based on Textual analysis, Lexical analysis, Syntax analysis, Semantic analysis, Hybrid analysis and Metric analysis. The major drawback of the present research is that it focuses more on fragments of copied code and does not focus on the aspect that the fragments of duplicated code are may be part of a larger replicated program structure. In this process, techniques take a lot of time and it creates complexity. In our research, a source code is then scanned for detecting various methods by adopting a “OPTIMIZED SVM ALGORITHM” and the method definitions are extracted and collected by means of a CLONE CODE and saved for further reference. To evaluate the performance parameters we calculate the LOC, the number of repetitions, and maximum and minimum LOC. To enhance the performance metrics precision recall, accuracy and reduce the error rate and time complexity

Download Full-text