Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

One pass preprocessing for token-based source code clone detection

2014 IEEE 6th International Conference on Awareness Science and Technology (iCAST) ◽

10.1109/icawst.2014.6981824 ◽

2014 ◽

Cited By ~ 1

Author(s):

Dingkun Li ◽

Minghao Piao ◽

Ho Sun Shon ◽

Keun Ho Ryu ◽

Incheon Paik

Keyword(s):

Source Code ◽

Clone Detection ◽

Code Clone

Download Full-text

An efficient new multi-language clone detection approach from large source code

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/icsmc.2012.6377848 ◽

2012 ◽

Cited By ~ 1

Author(s):

Saif Ur Rehman ◽

Kamran Khan ◽

Simon Fong ◽

Robert Biuk-Aghai

Keyword(s):

Source Code ◽

Clone Detection ◽

Large Source ◽

Detection Approach

Download Full-text

ExPort: Detecting and visualizing API usages in large source code repositories

2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) ◽

10.1109/ase.2013.6693127 ◽

2013 ◽

Cited By ~ 27

Author(s):

Evan Moritz ◽

Mario Linares-Vasquez ◽

Denys Poshyvanyk ◽

Mark Grechanik ◽

Collin McMillan ◽

...

Keyword(s):

Source Code ◽

Large Source

Download Full-text

An Algorithm for Reducing the Size of Finite Element Closed-Form Source Code Files

Volume 13: New Developments in Simulation Methods and Software for Engineering Applications; Safety Engineering, Risk Analysis and Reliability Methods; Transportation Systems ◽

10.1115/imece2009-10391 ◽

2009 ◽

Author(s):

Sara McCaslin ◽

Kent Lawrence

Keyword(s):

Finite Element Analysis ◽

Finite Element ◽

Closed Form ◽

Source Code ◽

Simple Algorithm ◽

Higher Order ◽

Error Estimator ◽

Element Analysis ◽

Closed Form Solutions ◽

Large Source

Closed-form solutions, as opposed to numerically integrated solutions, can now be obtained for many problems in engineering. In the area of finite element analysis, researchers have been able to demonstrate the efficiency of closed-form solutions when compared to numerical integration for elements such as straight-sided triangular [1] and tetrahedral elements [2, 3]. With higher order elements, however, the length of the resulting expressions is excessive. When these expressions are to be implemented in finite element applications as source code files, large source code files can be generated, resulting in line length/ line continuation limit issues with the compiler. This paper discusses a simple algorithm for the reduction of large source code files in which duplicate terms are replaced through the use of an adaptive dictionary. The importance of this algorithm lies in its ability to produce manageable source code files that can be used to improve efficiency in the element generation step of higher order finite element analysis. The algorithm is applied to Fortran files developed for the implementation of closed-form element stiffness and error estimator expressions for straight-sided tetrahedral finite elements through the fourth order. Reductions in individual source code file size by as much as 83% are demonstrated.

Download Full-text

Semantically Enhanced Code Clone Refinement Algorithm Based on Analysis of Multiple Detection Reports

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p0322 ◽

2011 ◽

Vol 15 (3) ◽

pp. 322-328

Author(s):

Ricardo Sotolongo ◽

◽

Fangyan Dong ◽

Kaoru Hirota

Keyword(s):

Semantic Information ◽

Semantic Analysis ◽

Source Code ◽

Size Reduction ◽

Code Clones ◽

Detection Algorithms ◽

Code Clone ◽

Multiple Detection ◽

Semantically Enhanced ◽

Refinement Algorithm

An algorithm based on semantic analysis of multiple detection tools’ reports using WordNet is proposed oriented on the refinement of code clones. It parses different detection tools’ reports looking for new clone specifications, and refines the location of existing ones using semantic information contained in source code. It is applied to a real and complex software system and is compared to three other well-known detection algorithms, discovering 4888 clone pairs more than the average detected by other tools; also making the code clones 3 lines longer (for a subset of the same system the results are proportional to the size reduction). The objective is to provide higher quantity of code clones, and more appropriated localization to be used in refactoring processes.

Download Full-text

Finding Software License Violations Through Binary Code Clone Detection - A Retrospective

ACM SIGSOFT Software Engineering Notes ◽

10.1145/3468744.3468752 ◽

2021 ◽

Vol 46 (3) ◽

pp. 24-25

Author(s):

Armijn Hemel ◽

Karl Trygve Kalleberg ◽

Rob Vermaas ◽

Eelco Dolstra

Keyword(s):

Open Source ◽

Original Problem ◽

Binary Code ◽

Source Code ◽

Clone Detection ◽

Problem Statement ◽

Open Source Code ◽

Code Clone ◽

Software License ◽

The Impact

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.

Download Full-text

Intelligent token-based code clone detection system for large scale source code

Proceedings of the Conference on Research in Adaptive and Convergent Systems - RACS '19 ◽

10.1145/3338840.3355654 ◽

2019 ◽

Cited By ~ 1

Author(s):

Abdulrahman Abu Elkhail ◽

Jan Svacina ◽

Tomas Cerny

Keyword(s):

Large Scale ◽

Detection System ◽

Source Code ◽

Clone Detection ◽

Code Clone

Download Full-text

A Method for Detecting Bad Smells and its Application to Software Engineering Education

International Journal of Software Innovation ◽

10.4018/ijsi.2015040102 ◽

2015 ◽

Vol 3 (2) ◽

pp. 13-23

Author(s):

Yuki Ito ◽

Atsuo Hazeyama ◽

Yasuhiko Morimoto ◽

Hiroaki Kaminaga ◽

Shoichi Nakamura ◽

...

Keyword(s):

Software Engineering ◽

Software Development ◽

Engineering Education ◽

Source Code ◽

Software Systems ◽

Large Source ◽

Software Engineering Education ◽

Meta Programming

In order to extend and maintenance software systems, it is necessary to remove factors behind bad smells from source code through refactoring. However, it is time-consuming process to detect and remove factors behind bad smells manually from large source code. And, learning how to refactor bad smells can be difficult for students because they are not yet software development experts. Therefore, the authors propose a method for detecting bad smells using declarative meta programming that can be applied to software development training. In this manner, software development training is facilitated.

Download Full-text

A Deep Learning Approach for a Source Code Detection Model Using Self-Attention

Complexity ◽

10.1155/2020/5027198 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Yao Meng ◽

Long Liu

Keyword(s):

Deep Learning ◽

Source Code ◽

Classification Performance ◽

Detection Model ◽

Learning Framework ◽

Code Clone ◽

Representation Model ◽

Discriminative Model ◽

Bidirectional Lstm ◽

Lstm Network

With the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and a discriminative model. The representation model firstly transforms the source code into an abstract syntactic tree and splits it into a sequence of statement trees; then, it encodes each of the statement trees with a deep-first traversal algorithm. Finally, the representation model encodes the sequence of statement vectors via a bidirectional LSTM network, which is a classical deep learning framework, with a self-attention layer and outputs a vector representing the given source code. The discriminative model identifies the code clone depending on the vectors generated by the presentation model. Our proposed model retains both the syntactics and semantics of the source code in the process of encoding, and the self-attention algorithm makes the classifier concentrate on the effect of key statements and improves the classification performance. The contrast experiments on the benchmarks OJClone and BigCloneBench indicate that At-LSTM is effective and outperforms the state-of-art approaches in source code clone detection.

Download Full-text