Retrieval on source code: a neural code search

The intelligent code search with natural language queries has become an important researching area in software engineering. In this paper, we propose a novel deep learning framework At-CodeSM for source code search. The powerful code encoder in At-CodeSM, which is implemented with an abstract syntax tree parsing algorithm (Tree-LSTM) and token-level encoders, maintains both the lexical and structural features of source code in the process of code vectorizing. Both the representative and discriminative models are implemented with deep neural networks. Our experiments on the CodeSearchNet dataset show that At-CodeSM yields better performance in the task of intelligent code searching than previous approaches.

Download Full-text

Search for Compatible Source Code

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500169 ◽

2021 ◽

Vol 31 (03) ◽

pp. 477-502

Author(s):

Fuqi Cai ◽

Changjing Wang ◽

Qing Huang ◽

Zhengkang Zuo ◽

Yunyan Liao

Keyword(s):

Programming Language ◽

State Of The Art ◽

Source Code ◽

The State ◽

Third Party ◽

Search Model ◽

Code Search ◽

Art Methods ◽

Local Programming ◽

Cosine Distance

Third-party libraries always evolve and produce multiple versions. Lucene, for example, released ten new versions (from version 7.7.0 to 8.4.0) in 2019. These versions confuse the existing code search methods to retrieve the source code that is not compatible with local programming language. To solve this issue, we propose DCSE, a deep code search model based on evolving information (i.e. evolved code tokens and evolution description). DCSE first deeply excavates evolved code tokens and evolution description in the code evolution process; then it takes evolved code tokens and evolution description as one feature of source code and code description, respectively. With such fuller representation, DCSE embeds source code and its code description into a high-dimensional shared vector space, and makes the cosine distance of their vectors closer. For the ever-evolving third-party libraries like Lucene, the experimental results show that DCSE could retrieve the source code that is compatible with local programming language, it outperforms the state-of-the-art methods (e.g. CODEnn) by 56.9–60.9[Formula: see text] in RFVersion. For the rarely-evolving third-party libraries, DCSE outperforms the state-of-the-art methods (e.g. CODEnn) by 4–11[Formula: see text] in Precision.

Download Full-text

Experiences and Lessons Learned with the Development of a Source Code Search Engine

Finding Source Code on the Web for Remix and Reuse ◽

10.1007/978-1-4614-6596-6_7 ◽

2013 ◽

pp. 121-134

Author(s):

Eduardo Santana de Almeida

Keyword(s):

Search Engine ◽

Source Code ◽

Lessons Learned ◽

Code Search ◽

Source Code Search

Download Full-text

Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications

IEEE Transactions on Software Engineering ◽

10.1109/tse.2011.84 ◽

2012 ◽

Vol 38 (5) ◽

pp. 1069-1087 ◽

Cited By ~ 62

Author(s):

Collin McMillan ◽

Mark Grechanik ◽

Denys Poshyvanyk ◽

Chen Fu ◽

Qing Xie

Keyword(s):

Search Engine ◽

Source Code ◽

Code Search ◽

Source Code Search

Download Full-text

Using Structured Queries for Source Code Search

2014 IEEE International Conference on Software Maintenance and Evolution ◽

10.1109/icsme.2014.68 ◽

2014 ◽

Cited By ~ 4

Author(s):

Brian P. Eddy ◽

Nicholas A. Kraft

Keyword(s):

Source Code ◽

Code Search ◽

Source Code Search

Download Full-text

Multimodal Representation for Neural Code Search

10.1109/icsme52107.2021.00049 ◽

2021 ◽

Author(s):

Jian Gu ◽

Zimin Chen ◽

Martin Monperrus

Keyword(s):

Neural Code ◽

Code Search ◽

Multimodal Representation

Download Full-text

A Survey and Taxonomy of Intent-Based Code Search

International Journal of Software Innovation ◽

10.4018/ijsi.2021010106 ◽

2021 ◽

Vol 9 (1) ◽

pp. 69-110

Author(s):

Shailesh Kumar Shivakumar

Keyword(s):

State Of The Art ◽

Source Code ◽

The Novel ◽

User Intent ◽

Code Search ◽

Search Field ◽

Query Type ◽

Novel Concept ◽

Code Understanding ◽

Source Code Search

In this paper, the authors introduce the novel concept of intent-based code search that categorizes code search goals into a hierarchy. They will explore state-of-the-art techniques in source code search covering various tools, techniques, and algorithms related to source code search. They will survey the code search field through the core use cases of code search such as code reusability, code understanding, and code repair. They propose a user intent-based taxonomy based on the code search goals. The code search goal taxonomy is derived based on deep analysis of literature survey of code search, and the taxonomy is validated based on their exclusive developer survey conducted as part of this paper. The code search goal taxonomy is based on logical categorization of code search goals and shared characteristics (query type, expected response, and such) for each of the categories in the taxonomy. The paper also details the latest trends and surveys the code search tools and the implications on tool design.

Download Full-text