scholarly journals Finding Code-Clone Snippets in Large Source-Code Collection by ccgrep

Author(s):  
Katsuro Inoue ◽  
Yuya Miyamoto ◽  
Daniel M. German ◽  
Takashi Ishio
Keyword(s):  
Author(s):  
Evan Moritz ◽  
Mario Linares-Vasquez ◽  
Denys Poshyvanyk ◽  
Mark Grechanik ◽  
Collin McMillan ◽  
...  
Keyword(s):  

Author(s):  
Sara McCaslin ◽  
Kent Lawrence

Closed-form solutions, as opposed to numerically integrated solutions, can now be obtained for many problems in engineering. In the area of finite element analysis, researchers have been able to demonstrate the efficiency of closed-form solutions when compared to numerical integration for elements such as straight-sided triangular [1] and tetrahedral elements [2, 3]. With higher order elements, however, the length of the resulting expressions is excessive. When these expressions are to be implemented in finite element applications as source code files, large source code files can be generated, resulting in line length/ line continuation limit issues with the compiler. This paper discusses a simple algorithm for the reduction of large source code files in which duplicate terms are replaced through the use of an adaptive dictionary. The importance of this algorithm lies in its ability to produce manageable source code files that can be used to improve efficiency in the element generation step of higher order finite element analysis. The algorithm is applied to Fortran files developed for the implementation of closed-form element stiffness and error estimator expressions for straight-sided tetrahedral finite elements through the fourth order. Reductions in individual source code file size by as much as 83% are demonstrated.


Author(s):  
Ricardo Sotolongo ◽  
◽  
Fangyan Dong ◽  
Kaoru Hirota

An algorithm based on semantic analysis of multiple detection tools’ reports using WordNet is proposed oriented on the refinement of code clones. It parses different detection tools’ reports looking for new clone specifications, and refines the location of existing ones using semantic information contained in source code. It is applied to a real and complex software system and is compared to three other well-known detection algorithms, discovering 4888 clone pairs more than the average detected by other tools; also making the code clones 3 lines longer (for a subset of the same system the results are proportional to the size reduction). The objective is to provide higher quantity of code clones, and more appropriated localization to be used in refactoring processes.


2021 ◽  
Vol 46 (3) ◽  
pp. 24-25
Author(s):  
Armijn Hemel ◽  
Karl Trygve Kalleberg ◽  
Rob Vermaas ◽  
Eelco Dolstra

Ten years ago, we published the article Finding software license violations through binary code clone detection at the MSR 2011 conference. Our paper was motivated by the tendency of em- bedded hardware vendors to only release binary blobs of their rmware, often violating the licensing terms of open-source soft- ware present inside those blobs. The techniques presented in our paper were designed to accurately identify open-source code hid- den inside binary blobs. Here, we give our perspectives on the impact of our work, both industrially and academically, and re- visit the original problem statement to see what has happened in the eld of open-source compliance in the intervening decade.


2015 ◽  
Vol 3 (2) ◽  
pp. 13-23
Author(s):  
Yuki Ito ◽  
Atsuo Hazeyama ◽  
Yasuhiko Morimoto ◽  
Hiroaki Kaminaga ◽  
Shoichi Nakamura ◽  
...  

In order to extend and maintenance software systems, it is necessary to remove factors behind bad smells from source code through refactoring. However, it is time-consuming process to detect and remove factors behind bad smells manually from large source code. And, learning how to refactor bad smells can be difficult for students because they are not yet software development experts. Therefore, the authors propose a method for detecting bad smells using declarative meta programming that can be applied to software development training. In this manner, software development training is facilitated.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Yao Meng ◽  
Long Liu

With the development of deep learning, many approaches based on neural networks are proposed for code clone. In this paper, we propose a novel source code detection model At-biLSTM based on a bidirectional LSTM network with a self-attention layer. At-biLSTM is composed of a representation model and a discriminative model. The representation model firstly transforms the source code into an abstract syntactic tree and splits it into a sequence of statement trees; then, it encodes each of the statement trees with a deep-first traversal algorithm. Finally, the representation model encodes the sequence of statement vectors via a bidirectional LSTM network, which is a classical deep learning framework, with a self-attention layer and outputs a vector representing the given source code. The discriminative model identifies the code clone depending on the vectors generated by the presentation model. Our proposed model retains both the syntactics and semantics of the source code in the process of encoding, and the self-attention algorithm makes the classifier concentrate on the effect of key statements and improves the classification performance. The contrast experiments on the benchmarks OJClone and BigCloneBench indicate that At-LSTM is effective and outperforms the state-of-art approaches in source code clone detection.


Sign in / Sign up

Export Citation Format

Share Document