Inter-Project Functional Clone Detection Toward Building Libraries - An Empirical Study on 13,000 Projects

Background. Today, redundancy in source code, so-called “clones” caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how onlyfunctionally similar clones(FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research.Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs.Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories.Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.

Download Full-text

An empirical study of clone detection in MATLAB/Simulink models

International Journal of Information and Communication Technology ◽

10.1504/ijict.2018.10010451 ◽

2018 ◽

Vol 13 (1) ◽

pp. 20

Author(s):

Maninder Singh ◽

Dhavleesh Rattan ◽

Rajesh Bhatia

Keyword(s):

Empirical Study ◽

Clone Detection ◽

Matlab Simulink

Download Full-text

An empirical study of clone detection in MATLAB/Simulink models

International Journal of Information and Communication Technology ◽

10.1504/ijict.2018.090434 ◽

2018 ◽

Vol 13 (1) ◽

pp. 20

Author(s):

Dhavleesh Rattan ◽

Rajesh Bhatia ◽

Maninder Singh

Keyword(s):

Empirical Study ◽

Clone Detection ◽

Matlab Simulink

Download Full-text

How are functionally similar code clones syntactically different? An empirical study and a benchmark

10.7287/peerj.preprints.1516v2 ◽

2016 ◽

Author(s):

Stefan Wagner ◽

Asim Abdulkhaleq ◽

Ivan Bogicevic ◽

Jan-Peter Ostberg ◽

Jasmin Ramadani

Keyword(s):

Data Structure ◽

Empirical Study ◽

Random Sample ◽

Source Code ◽

Clone Detection ◽

Code Clones ◽

Syntactic Similarity ◽

Syntactic Differences ◽

Similar Code

Background. Today, redundancy in source code, so-called “clones”, caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, caused not by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactic differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in < 16 % of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.

Download Full-text

A Novel Method of Clone Detection by Neural Networks

European Journal of Engineering Research and Science ◽

10.24018/ejers.2019.4.12.1642 ◽

2019 ◽

Vol 4 (12) ◽

pp. 9-15

Author(s):

Pallavi Sharma ◽

Chetanpal Singh

Keyword(s):

Neural Network ◽

Small Scale ◽

Clone Detection ◽

Plagiarism Detection ◽

Algorithm Performance ◽

Code Fragment ◽

Code Clone ◽

Pattern Similarity ◽

Functional Clone ◽

Novel Method

Code clone is that type of engine that helps to find duplicate code patterns find within the whole code. Programmers usually adopt code reusability task from previous few years, so that time consumption can be reduces. Code reusability can be done via replication or by just copy-paste. Code reusability leads to not writing code from scratch, just copy paste the useful part of the code. In finding of duplicated code fragment or text, plagiarism detection also work pretty well but it is not applicable to the large system in finding functional clone and also it is more time consuming even at small scale which make the detection method inappropriate. In this paper, we proposed a pattern similarity conditions on the basis of textual similarity for finding the code or text clones in the large content on the basis of SVM, Neural Network using Java coding, Neural Network and Sim Cad. This approach detects code or text clones from original one. The resultant simulation is taken place in the MATLAB environment, and it has shown that it is providing better results. The proposed algorithm performance is measured using parameters i.e. FRR, FAR and Accuracy.

Download Full-text

FCCA: Hybrid Code Representation for Functional Clone Detection Using Attention Networks

IEEE Transactions on Reliability ◽

10.1109/tr.2020.3001918 ◽

2020 ◽

pp. 1-15

Author(s):

Wei Hua ◽

Yulei Sui ◽

Yao Wan ◽

Guangzhong Liu ◽

Guandong Xu

Keyword(s):

Clone Detection ◽

Attention Networks ◽

Hybrid Code ◽

Functional Clone

Download Full-text

How are functionally similar code clones syntactically different? An empirical study and a benchmark

10.7287/peerj.preprints.1516 ◽

2016 ◽

Author(s):

Stefan Wagner ◽

Asim Abdulkhaleq ◽

Ivan Bogicevic ◽

Jan-Peter Ostberg ◽

Jasmin Ramadani

Keyword(s):

Data Structure ◽

Empirical Study ◽

Random Sample ◽

Source Code ◽

Clone Detection ◽

Code Clones ◽

Syntactic Similarity ◽

Syntactic Differences ◽

Similar Code

Background. Today, redundancy in source code, so-called “clones”, caused by copy&paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, caused not by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactic differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in < 16 % of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.

Download Full-text

Positive and Unlabeled Learning for Detecting Software Functional Clones with Adversarial Training

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/394 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hui-Hui Wei ◽

Ming Li

Keyword(s):

Software Maintenance ◽

State Of The Art ◽

Clone Detection ◽

Learning Problem ◽

Huge Number ◽

Functional Clone ◽

Positive And Unlabeled Learning ◽

Software Maintenance And Evolution ◽

Adversarial Training ◽

Software Clone Detection

Software clone detection is an important problem for software maintenance and evolution and it has attracted lots of attentions. However, existing approaches ignore a fact that people would label the pairs of code fragments as \emph{clone} only if they happen to discover the clones while a huge number of undiscovered clone pairs and non-clone pairs are left unlabeled. In this paper, we argue that the clone detection task in the real-world should be formalized as a Positive-Unlabeled (PU) learning problem, and address this problem by proposing a novel positive and unlabeled learning approach, namely CDPU, to effectively detect software functional clones, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level, where adversarial training is employed to improve the robustness of the learned model to those non-clone pairs that look extremely similar but behave differently. Experiments on software clone detection benchmarks indicate that the proposed approach together with adversarial training outperforms the state-of-the-art approaches for software functional clone detection.

Download Full-text

Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/423 ◽

2017 ◽

Cited By ~ 25

Author(s):

Huihui Wei ◽

Ming Li

Keyword(s):

Software Maintenance ◽

Feature Learning ◽

Clone Detection ◽

Source Codes ◽

Learning Framework ◽

Deep Feature ◽

Functional Clone ◽

Software Maintenance And Evolution ◽

Software Clone Detection ◽

Learning To Hash

Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with hand-crafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an end-to-end deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.

Download Full-text

Individuals without jobs: An empirical study of job-seeking behavior and reemployment.

Journal of Applied Psychology ◽

10.1037/0021-9010.81.1.76 ◽

1996 ◽

Vol 81 (1) ◽

pp. 76-87 ◽

Cited By ~ 135

Author(s):

Connie R. Wanberg ◽

John D. Watt ◽

Deborah J. Rumsey

Keyword(s):

Empirical Study ◽

Job Seeking ◽

Seeking Behavior

Download Full-text