Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

Software clone detection, aiming at identifying out code fragments with similar functionalities, has played an important role in software maintenance and evolution. Many clone detection approaches have been proposed. However, most of them represent source codes with hand-crafted features using lexical or syntactical information, or unsupervised deep features, which makes it difficult to detect the functional clone pairs, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level. In this paper, we address the software functional clone detection problem by learning supervised deep features. We formulate the clone detection as a supervised learning to hash problem and propose an end-to-end deep feature learning framework called CDLH for functional clone detection. Such framework learns hash codes by exploiting the lexical and syntactical information for fast computation of functional similarity between code fragments. Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.

Download Full-text

Positive and Unlabeled Learning for Detecting Software Functional Clones with Adversarial Training

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/394 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hui-Hui Wei ◽

Ming Li

Keyword(s):

Software Maintenance ◽

State Of The Art ◽

Clone Detection ◽

Learning Problem ◽

Huge Number ◽

Functional Clone ◽

Positive And Unlabeled Learning ◽

Software Maintenance And Evolution ◽

Adversarial Training ◽

Software Clone Detection

Software clone detection is an important problem for software maintenance and evolution and it has attracted lots of attentions. However, existing approaches ignore a fact that people would label the pairs of code fragments as \emph{clone} only if they happen to discover the clones while a huge number of undiscovered clone pairs and non-clone pairs are left unlabeled. In this paper, we argue that the clone detection task in the real-world should be formalized as a Positive-Unlabeled (PU) learning problem, and address this problem by proposing a novel positive and unlabeled learning approach, namely CDPU, to effectively detect software functional clones, i.e., pieces of codes with similar functionality but differing in both syntactical and lexical level, where adversarial training is employed to improve the robustness of the learned model to those non-clone pairs that look extremely similar but behave differently. Experiments on software clone detection benchmarks indicate that the proposed approach together with adversarial training outperforms the state-of-the-art approaches for software functional clone detection.

Download Full-text

Software Clone Detection and Refactoring

ISRN Software Engineering ◽

10.1155/2013/129437 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 7

Author(s):

Francesca Arcelli Fontana ◽

Marco Zanoni ◽

Andrea Ranchetti ◽

Davide Ranchetti

Keyword(s):

Software Quality ◽

Software Maintenance ◽

Quality Metrics ◽

Clone Detection ◽

Software Maintenance And Evolution ◽

Software Clones ◽

Software Clone Detection ◽

Points Of View ◽

Software Quality Metrics ◽

The Impact

Several studies have been proposed in the literature on software clones from different points of view and covering many correlated features and areas, which are particularly relevant to software maintenance and evolution. In this paper, we describe our experience on clone detection through three different tools and investigate the impact of clone refactoring on different software quality metrics.

Download Full-text

Local deep feature learning framework for 3D shape

Computers & Graphics ◽

10.1016/j.cag.2014.09.007 ◽

2015 ◽

Vol 46 ◽

pp. 117-129 ◽

Cited By ~ 16

Author(s):

Shuhui Bu ◽

Pengcheng Han ◽

Zhenbao Liu ◽

Junwei Han ◽

Hongwei Lin

Keyword(s):

Feature Learning ◽

3D Shape ◽

Learning Framework ◽

Deep Feature ◽

Deep Feature Learning

Download Full-text

Cam-softmax for discriminative deep feature learning

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9412895 ◽

2021 ◽

Author(s):

Tamas Suveges ◽

Stephen McKenna

Keyword(s):

Feature Learning ◽

Deep Feature ◽

Deep Feature Learning

Download Full-text

Multiview Deep Feature Learning Network for SAR Automatic Target Recognition

Remote Sensing ◽

10.3390/rs13081455 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1455

Author(s):

Jifang Pei ◽

Weibo Huo ◽

Chenwei Wang ◽

Yulin Huang ◽

Yin Zhang ◽

...

Keyword(s):

Target Recognition ◽

Feature Learning ◽

Automatic Target Recognition ◽

Operating Conditions ◽

Sar Images ◽

Single View ◽

Learning Network ◽

Deep Feature ◽

Deep Feature Learning ◽

Classification Information

Multiview synthetic aperture radar (SAR) images contain much richer information for automatic target recognition (ATR) than a single-view one. It is desirable to establish a reasonable multiview ATR scheme and design effective ATR algorithm to thoroughly learn and extract that classification information, so that superior SAR ATR performance can be achieved. Hence, a general processing framework applicable for a multiview SAR ATR pattern is first given in this paper, which can provide an effective approach to ATR system design. Then, a new ATR method using a multiview deep feature learning network is designed based on the proposed multiview ATR framework. The proposed neural network is with a multiple input parallel topology and some distinct deep feature learning modules, with which significant classification features, the intra-view and inter-view features existing in the input multiview SAR images, will be learned simultaneously and thoroughly. Therefore, the proposed multiview deep feature learning network can achieve an excellent SAR ATR performance. Experimental results have shown the superiorities of the proposed multiview SAR ATR method under various operating conditions.

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text