Retrieval on source code: a neural code search

Author(s):  
Saksham Sachdev ◽  
Hongyu Li ◽  
Sifei Luan ◽  
Seohyun Kim ◽  
Koushik Sen ◽  
...  
Keyword(s):  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yao Meng

The intelligent code search with natural language queries has become an important researching area in software engineering. In this paper, we propose a novel deep learning framework At-CodeSM for source code search. The powerful code encoder in At-CodeSM, which is implemented with an abstract syntax tree parsing algorithm (Tree-LSTM) and token-level encoders, maintains both the lexical and structural features of source code in the process of code vectorizing. Both the representative and discriminative models are implemented with deep neural networks. Our experiments on the CodeSearchNet dataset show that At-CodeSM yields better performance in the task of intelligent code searching than previous approaches.


Author(s):  
Fuqi Cai ◽  
Changjing Wang ◽  
Qing Huang ◽  
Zhengkang Zuo ◽  
Yunyan Liao

Third-party libraries always evolve and produce multiple versions. Lucene, for example, released ten new versions (from version 7.7.0 to 8.4.0) in 2019. These versions confuse the existing code search methods to retrieve the source code that is not compatible with local programming language. To solve this issue, we propose DCSE, a deep code search model based on evolving information (i.e. evolved code tokens and evolution description). DCSE first deeply excavates evolved code tokens and evolution description in the code evolution process; then it takes evolved code tokens and evolution description as one feature of source code and code description, respectively. With such fuller representation, DCSE embeds source code and its code description into a high-dimensional shared vector space, and makes the cosine distance of their vectors closer. For the ever-evolving third-party libraries like Lucene, the experimental results show that DCSE could retrieve the source code that is compatible with local programming language, it outperforms the state-of-the-art methods (e.g. CODEnn) by 56.9–60.9[Formula: see text] in RFVersion. For the rarely-evolving third-party libraries, DCSE outperforms the state-of-the-art methods (e.g. CODEnn) by 4–11[Formula: see text] in Precision.


2012 ◽  
Vol 38 (5) ◽  
pp. 1069-1087 ◽  
Author(s):  
Collin McMillan ◽  
Mark Grechanik ◽  
Denys Poshyvanyk ◽  
Chen Fu ◽  
Qing Xie

2021 ◽  
Author(s):  
Jian Gu ◽  
Zimin Chen ◽  
Martin Monperrus

2021 ◽  
Vol 9 (1) ◽  
pp. 69-110
Author(s):  
Shailesh Kumar Shivakumar

In this paper, the authors introduce the novel concept of intent-based code search that categorizes code search goals into a hierarchy. They will explore state-of-the-art techniques in source code search covering various tools, techniques, and algorithms related to source code search. They will survey the code search field through the core use cases of code search such as code reusability, code understanding, and code repair. They propose a user intent-based taxonomy based on the code search goals. The code search goal taxonomy is derived based on deep analysis of literature survey of code search, and the taxonomy is validated based on their exclusive developer survey conducted as part of this paper. The code search goal taxonomy is based on logical categorization of code search goals and shared characteristics (query type, expected response, and such) for each of the categories in the taxonomy. The paper also details the latest trends and surveys the code search tools and the implications on tool design.


Sign in / Sign up

Export Citation Format

Share Document