Code Clone Detection and Analysis in Open Source Applications

2015 ◽  
pp. 1951-1965
Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.

Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.


Author(s):  
Al-Fahim Mubarak-Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad ◽  
Zhenchang Xing

Code clone is a portion of codes that contains some similarities in the same software regardless of changes made to the specific code such as removal of white spaces and comments, changes in code syntactic, and addition or removal of code. Over the years, many approaches and tools for code clone detection have been proposed. Most of these approaches and tools have managed to detect and analyze code clones that occur in large software. In this chapter, the authors aim to provide a comparative study on current state-of-the-art in code clone detection approaches and models together with their corresponding tools. They then perform an empirical evaluation on the selected code clone detection tool and organize the large amount of information in a more systematic way. The authors begin with explaining background concepts of code clone terminology. A comparison is done to find out strengths and weaknesses of existing approaches, models, and tools. Based on the comparison done, they then select a tool to be evaluated in two dimensions, which are the amount of detected clones and run time performance of the tool. The result of the study shows that there are various terminologies used for code clone. In addition, the empirical evaluation implies that the selected tool (enhanced generic pipeline model) gives a better code clone output and runtime performance as compared to its generic counterpart.


2020 ◽  
Vol 19 (4) ◽  
pp. 28-39 ◽  
Author(s):  
Andrew Walker ◽  
Tomas Cerny ◽  
Eungee Song

2014 ◽  
Vol 3 (2) ◽  
pp. 143-152 ◽  
Author(s):  
Naresh Babu Bynagari

This article seeks to foray into the nitty-gritty of integrated reasoning for code clone detection and how it is effectively carried out, given the amount of analytics usually associated with such activities. Detection of codes requires high-pitch familiarity with cloning systems and their workings. Hence, discovering similar code segments that are often regarded and seen as code imitations (clone) is not an easy responsibility. More especially, this very detection process might possess key purposes in the context of susceptibility findings, refactoring, and imitation detecting. Through the voyage of discovery this article intends to expose you to, you will realize that identical code segments, more often than not described as code clones, appear to be a serious duty, especially for large code bases <1; 2; 3; 4>. There are certain approaches and deep technicalities that this sort of detection is known for. Still, from the avalanche of resources that formed the bedrock of this article, one would discover the easiest formula to adopt in maneuvering such strenuous issues.


2021 ◽  
Vol 9 (1) ◽  
pp. 20-36
Author(s):  
Mostefai Abdelkader

Software clone detection is a widely researched area over the last two decades. Code clones are fragments of code judged similar by some metric of similarity. This paper proposes an approach for code clone detection using dynamic time warping technique (i.e., DTW). DTW is a well-known algorithm for aligning and measuring similarity of time series and it has been found effective in many domains where similarity plays an important role such as speech and gesture recognition. The proposed approach finds clones in three steps. First software modules are extracted. Then, the extracted modules are turned to time series. Finally, the time series are compared using the DTW algorithm to find clones. The results of the experiment conducted on a well-known Benchmark show that the approach can detect clones effectively in software systems.


Author(s):  
Al-Fahim Mubarak Ali ◽  
Shahida Sulaiman ◽  
Sharifah Mashita Syed-Mohamad

2021 ◽  
Vol 46 (3) ◽  
pp. 24-25
Author(s):  
Armijn Hemel ◽  
Karl Trygve Kalleberg ◽  
Rob Vermaas ◽  
Eelco Dolstra

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.


Author(s):  
Yan-Ya Zhang ◽  
Ming Li

Code clone is common in software development, which usually leads to software defects or copyright infringement. Researchers have paid significant attention to code clone detection, and many methods have been proposed. However, the patterns for generating the code clones do not always remain the same. In order to fool the clone detection systems, the plagiarists, known as the clone creator, usually conduct a series of tricky modifications on the code fragments to make the clone difficult to detect. The existing clone detection approaches, which neglects the dynamics of the “contest” between the plagiarist and the detectors, is doomed to be not robust to adversarial revision of the code. In this paper, we propose a novel clone detection approach, namely ACD, to mimic the adversarial process between the plagiarist and the detector, which enables us to not only build strong a clone detector but also model the behavior of the plagiarists. Such a plagiarist model may in turn help to understand the vulnerability of the current software clone detection tools. Experiments show that the learned policy of plagiarist can help us build stronger clone detector, which outperforms the existing clone detection methods.


Author(s):  
Mehmet S. Aktas ◽  
Mustafa Kapdan

Unnecessary repeated codes, also known as code clones, have not been well documented and are difficult to maintain. Code clones may become an important problem in the software development cycle, since any detected error must be fixed in all occurrences. This condition significantly increases software maintenance costs and requires effort/duration for understanding the code. This research introduces a novel methodology to minimize or prevent the code cloning problem in software projects. In particular, this manuscript is focused on the detection of structural code clones, which are defined as similarity in software structure such as design patterns. Our proposed methodology provides a solution to the class-level structural code clone detection problem. We introduce a novel software architecture that provides unification of different software quality analysis tools that take measurements for software metrics for structural code clone detection. We present an empirical evaluation of our approach and investigate its practical usefulness. We conduct a user study using human judges to detect structural code clones in three different open-source software projects. We apply our methodology to the same projects and compare results. The results show that our proposed solution is able to show high consistency compared with the results reached by the human judges. The outcome of this study also indicates that a uniform structural code clone detection system can be built on top of different software quality tools, where each tool takes measurements of different object-oriented software metrics.


Sign in / Sign up

Export Citation Format

Share Document