Code Clone Detection and Analysis in Open Source Applications

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Download Full-text

Code Clone Detection and Analysis in Open Source Applications

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Emerging Advancements and Technologies in Software Engineering ◽

10.4018/978-1-4666-6026-7.ch022 ◽

2014 ◽

pp. 494-509

Author(s):

Al-Fahim Mubarak-Ali ◽

Shahida Sulaiman ◽

Sharifah Mashita Syed-Mohamad ◽

Zhenchang Xing

Keyword(s):

Comparative Study ◽

Open Source ◽

Empirical Evaluation ◽

Two Dimensions ◽

Clone Detection ◽

Code Clones ◽

Time Performance ◽

Code Clone ◽

Current State ◽

Pipeline Model

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Download Full-text

Open-source tools and benchmarks for code-clone detection

ACM SIGAPP Applied Computing Review ◽

10.1145/3381307.3381310 ◽

2020 ◽

Vol 19 (4) ◽

pp. 28-39 ◽

Cited By ~ 2

Author(s):

Andrew Walker ◽

Tomas Cerny ◽

Eungee Song

Keyword(s):

Open Source ◽

Clone Detection ◽

Code Clone

Download Full-text

Integrated Reasoning Engine for Code Clone Detection

ABC Journal of Advanced Research ◽

10.18034/abcjar.v3i2.575 ◽

2014 ◽

Vol 3 (2) ◽

pp. 143-152 ◽

Cited By ~ 5

Author(s):

Naresh Babu Bynagari

Keyword(s):

Clone Detection ◽

Code Clones ◽

High Pitch ◽

Detection Process ◽

Code Clone ◽

Similar Code ◽

Reasoning Engine

This article seeks to foray into the nitty-gritty of integrated reasoning for code clone detection and how it is effectively carried out, given the amount of analytics usually associated with such activities. Detection of codes requires high-pitch familiarity with cloning systems and their workings. Hence, discovering similar code segments that are often regarded and seen as code imitations (clone) is not an easy responsibility. More especially, this very detection process might possess key purposes in the context of susceptibility findings, refactoring, and imitation detecting. Through the voyage of discovery this article intends to expose you to, you will realize that identical code segments, more often than not described as code clones, appear to be a serious duty, especially for large code bases <1; 2; 3; 4>. There are certain approaches and deep technicalities that this sort of detection is known for. Still, from the avalanche of resources that formed the bedrock of this article, one would discover the easiest formula to adopt in maneuvering such strenuous issues.

Download Full-text

Using Dynamic Time Warping to Detect Clones in Software Systems

International Journal of Software Innovation ◽

10.4018/ijsi.2021010103 ◽

2021 ◽

Vol 9 (1) ◽

pp. 20-36

Author(s):

Mostefai Abdelkader

Keyword(s):

Time Series ◽

Dynamic Time Warping ◽

Software Systems ◽

Clone Detection ◽

Code Clones ◽

Time Warping ◽

Code Clone ◽

Software Modules ◽

Software Clone Detection ◽

Dynamic Time

Software clone detection is a widely researched area over the last two decades. Code clones are fragments of code judged similar by some metric of similarity. This paper proposes an approach for code clone detection using dynamic time warping technique (i.e., DTW). DTW is a well-known algorithm for aligning and measuring similarity of time series and it has been found effective in many domains where similarity plays an important role such as speech and gesture recognition. The proposed approach finds clones in three steps. First software modules are extracted. Then, the extracted modules are turned to time series. Finally, the time series are compared using the DTW algorithm to find clones. The results of the experiment conducted on a well-known Benchmark show that the approach can detect clones effectively in software systems.

Download Full-text

An enhanced generic pipeline model for code clone detection

2011 Malaysian Conference in Software Engineering ◽

10.1109/mysec.2011.6140712 ◽

2011 ◽

Author(s):

Al-Fahim Mubarak Ali ◽

Shahida Sulaiman ◽

Sharifah Mashita Syed-Mohamad

Keyword(s):

Clone Detection ◽

Code Clone ◽

Pipeline Model

Download Full-text

Finding Software License Violations Through Binary Code Clone Detection - A Retrospective

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468752 ◽

2021 ◽

Vol 46 (3) ◽

pp. 24-25

Author(s):

Armijn Hemel ◽

Karl Trygve Kalleberg ◽

Rob Vermaas ◽

Eelco Dolstra

Keyword(s):

Open Source ◽

Original Problem ◽

Binary Code ◽

Source Code ◽

Clone Detection ◽

Problem Statement ◽

Open Source Code ◽

Code Clone ◽

Software License ◽

The Impact

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.

Download Full-text

CCEyes: An Effective Tool for Code Clone Detection on Large-Scale Open Source Repositories

2021 IEEE International Conference on Information Communication and Software Engineering (ICICSE) ◽

10.1109/icicse52190.2021.9404141 ◽

2021 ◽

Author(s):

Yanzhi Zhang ◽

Tao Wang

Keyword(s):

Open Source ◽

Large Scale ◽

Clone Detection ◽

Code Clone

Download Full-text

Find Me if You Can: Deep Software Clone Detection by Exploiting the Contest between the Plagiarist and the Detector

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015813 ◽

2019 ◽

Vol 33 ◽

pp. 5813-5820

Author(s):

Yan-Ya Zhang ◽

Ming Li

Keyword(s):

Software Development ◽

Copyright Infringement ◽

Detection Methods ◽

Clone Detection ◽

Code Clones ◽

Detection Systems ◽

Code Clone ◽

Detection Approach ◽

Software Clone Detection ◽

Significant Attention

Code clone is common in software development, which usually leads to software defects or copyright infringement. Researchers have paid significant attention to code clone detection, and many methods have been proposed. However, the patterns for generating the code clones do not always remain the same. In order to fool the clone detection systems, the plagiarists, known as the clone creator, usually conduct a series of tricky modifications on the code fragments to make the clone difficult to detect. The existing clone detection approaches, which neglects the dynamics of the “contest” between the plagiarist and the detectors, is doomed to be not robust to adversarial revision of the code. In this paper, we propose a novel clone detection approach, namely ACD, to mimic the adversarial process between the plagiarist and the detector, which enables us to not only build strong a clone detector but also model the behavior of the plagiarists. Such a plagiarist model may in turn help to understand the vulnerability of the current software clone detection tools. Experiments show that the learned policy of plagiarist can help us build stronger clone detector, which outperforms the existing clone detection methods.

Download Full-text

Structural Code Clone Detection Methodology Using Software Metrics

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500133 ◽

2016 ◽

Vol 26 (02) ◽

pp. 307-332 ◽

Cited By ~ 5

Author(s):

Mehmet S. Aktas ◽

Mustafa Kapdan

Keyword(s):

Software Quality ◽

Software Maintenance ◽

Software Metrics ◽

User Study ◽

Quality Analysis ◽

Clone Detection ◽

Code Clones ◽

Software Projects ◽

Code Clone ◽

Structural Code

Unnecessary repeated codes, also known as code clones, have not been well documented and are difficult to maintain. Code clones may become an important problem in the software development cycle, since any detected error must be fixed in all occurrences. This condition significantly increases software maintenance costs and requires effort/duration for understanding the code. This research introduces a novel methodology to minimize or prevent the code cloning problem in software projects. In particular, this manuscript is focused on the detection of structural code clones, which are defined as similarity in software structure such as design patterns. Our proposed methodology provides a solution to the class-level structural code clone detection problem. We introduce a novel software architecture that provides unification of different software quality analysis tools that take measurements for software metrics for structural code clone detection. We present an empirical evaluation of our approach and investigate its practical usefulness. We conduct a user study using human judges to detect structural code clones in three different open-source software projects. We apply our methodology to the same projects and compare results. The results show that our proposed solution is able to show high consistency compared with the results reached by the human judges. The outcome of this study also indicates that a uniform structural code clone detection system can be built on top of different software quality tools, where each tool takes measurements of different object-oriented software metrics.

Download Full-text