code search Latest Research Papers

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.

Download Full-text

CodeMatcher: Searching Code Based on Sequential Semantics of Important Query Words

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3465403 ◽

2022 ◽

Vol 31 (1) ◽

pp. 1-37

Author(s):

Chao Liu ◽

Xin Xia ◽

David Lo ◽

Zhiwe Liu ◽

Ahmed E. Hassan ◽

...

Keyword(s):

Large Scale ◽

Fuzzy Search ◽

Code Search ◽

Proposed Model ◽

Accuracy Measure ◽

Indexing Technique ◽

Google Search ◽

The Relationship ◽

Sequential Semantics

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that inherits the advantages of DeepCS (i.e., the capability of understanding the sequential semantics in important query words), while it can leverage the indexing technique in the IR-based model to accelerate the search response time substantially. CodeMatcher first collects metadata for query words to identify irrelevant/noisy ones, then iteratively performs fuzzy search with important query words on the codebase that is indexed by the Elasticsearch tool and finally reranks a set of returned candidate code according to how the tokens in the candidate code snippet sequentially matched the important words in a query. We verified its effectiveness on a large-scale codebase with ~41K repositories. Experimental results showed that CodeMatcher achieves an MRR (a widely used accuracy measure for code search) of 0.60, outperforming DeepCS, CodeHow, and UNIF by 82%, 62%, and 46%, respectively. Our proposed model is over 1.2K times faster than DeepCS. Moreover, CodeMatcher outperforms two existing online search engines (GitHub and Google search) by 46% and 33%, respectively, in terms of MRR. We also observed that: fusing the advantages of IR-based and DL-based models is promising; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.

Download Full-text

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search

10.1145/3459637.3482127 ◽

2021 ◽

Author(s):

Lun Du ◽

Xiaozhou Shi ◽

Yanlin Wang ◽

Ensheng Shi ◽

Shi Han ◽

...

Keyword(s):

Ensemble Learning ◽

Learning Approach ◽

Model Ensemble ◽

Single Model ◽

Semantic Code ◽

Code Search ◽

Semantic Code Search

Download Full-text

Enriching query semantics for code search with reinforcement learning

Neural Networks ◽

10.1016/j.neunet.2021.09.025 ◽

2021 ◽

Author(s):

Chaozheng Wang ◽

Zhenhao Nong ◽

Cuiyun Gao ◽

Zongjie Li ◽

Jichuan Zeng ◽

...

Keyword(s):

Reinforcement Learning ◽

Code Search ◽

Query Semantics

Download Full-text

Readability and Understandability Scores for Snippet Assessment: an Exploratory Study

10.5753/vem.2021.17217 ◽

2021 ◽

Author(s):

Carlos Eduardo C. Dantas ◽

Marcelo A. Maia

Keyword(s):

Search Engines ◽

Exploratory Study ◽

Subjective Perception ◽

Principal Characteristic ◽

Readability Score ◽

Code Search ◽

Nested Loops ◽

Comprehension Score ◽

Code Search Engines

Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work investigate scores for understandability and readability features, under the perspective of the possible subjective perception of code snippet comprehension. We find that code snippets with higher readability score has better comprehension than lower ones. The understandability score presents better comprehension in specific situations, e.g. nested loops or if-else chains. The developers also mentioned writability aspects as the principal characteristic to evaluate code snippets comprehension. These results provide insights for future works in code comprehension score optimization.

Download Full-text

Autism Spectrum Disorder in Pediatric Idiopathic Intracranial Hypertension

Life ◽

10.3390/life11090972 ◽

2021 ◽

Vol 11 (9) ◽

pp. 972

Author(s):

Anne K. Jensen ◽

Claire A. Sheldon ◽

Grace L. Paley ◽

Christina L. Szperka ◽

Geraldine W. Liu ◽

...

Keyword(s):

Autism Spectrum Disorder ◽

Intracranial Hypertension ◽

Idiopathic Intracranial Hypertension ◽

Pediatric Patients ◽

Case Series ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Diagnosis Code ◽

Medical Comorbidities ◽

Code Search

In recent years, the substantial burden of medical comorbidities in autism spectrum disorder (ASD) populations has been described. We report a retrospective observational case series of pediatric patients with suspected idiopathic intracranial hypertension (IIH) and concurrent ASD. Pediatric subjects with suspected IIH aged 2–18 years were identified by review of a pediatric neuro-ophthalmologist’s database spanning from July 1993 to April 2013. ASD diagnoses were identified within this cohort by an ICD-9 diagnosis code search and database review. Three subjects had concurrent ASD diagnoses; all were non-obese males. Since the retrospective observational case series was performed in April 2013, we identified three additional IIH cases in boys with ASD. Our experience suggests that IIH may be a comorbidity of ASD, particularly in non-obese boys.

Download Full-text

Multimodal Representation for Neural Code Search

10.1109/icsme52107.2021.00049 ◽

2021 ◽

Author(s):

Jian Gu ◽

Zimin Chen ◽

Martin Monperrus

Keyword(s):

Neural Code ◽

Code Search ◽

Multimodal Representation

Download Full-text

Cross-language code search using static and dynamic analyses

10.1145/3468264.3468538 ◽

2021 ◽

Author(s):

George Mathew ◽

Kathryn T. Stolee

Keyword(s):

Code Search ◽

Dynamic Analyses ◽

Cross Language

Download Full-text

Guided pattern mining for API misuse detection by change-based code analysis

Automated Software Engineering ◽

10.1007/s10515-021-00294-x ◽

2021 ◽

Vol 28 (2) ◽

Author(s):

Sebastian Nielebock ◽

Robert Heumüller ◽

Kevin Michael Schott ◽

Frank Ortmeier

Keyword(s):

Pattern Mining ◽

Third Party ◽

Just In Time ◽

False Alarms ◽

Misuse Detection ◽

Code Search ◽

Usage Patterns ◽

Development Processes ◽

Code Changes ◽

Api Usage

AbstractLack of experience, inadequate documentation, and sub-optimal API design frequently cause developers to make mistakes when re-using third-party implementations. Such API misuses can result in unintended behavior, performance losses, or software crashes. Therefore, current research aims to automatically detect such misuses by comparing the way a developer used an API to previously inferred patterns of the correct API usage. While research has made significant progress, these techniques have not yet been adopted in practice. In part, this is due to the lack of a process capable of seamlessly integrating with software development processes. Particularly, existing approaches do not consider how to collect relevant source code samples from which to infer patterns. In fact, an inadequate collection can cause API usage pattern miners to infer irrelevant patterns which leads to false alarms instead of finding true API misuses. In this paper, we target this problem (a) by providing a method that increases the likelihood of finding relevant and true-positive patterns concerning a given set of code changes and agnostic to a concrete static, intra-procedural mining technique and (b) by introducing a concept for just-in-time API misuse detection which analyzes changes at the time of commit. Particularly, we introduce different, lightweight code search and filtering strategies and evaluate them on two real-world API misuse datasets to determine their usefulness in finding relevant intra-procedural API usage patterns. Our main results are (1) commit-based search with subsequent filtering effectively decreases the amount of code to be analyzed, (2) in particular method-level filtering is superior to file-level filtering, (3) project-internal and project-external code search find solutions for different types of misuses and thus are complementary, (4) incorporating prior knowledge of the misused API into the search has a negligible effect.

Download Full-text

MACA: A Residual Network with Multi-Attention and Core Attributes for Code Search (S)

10.18293/seke2021-079 ◽

2021 ◽

Author(s):

Lian Gu

Keyword(s):

Residual Network ◽

Code Search

Download Full-text

code search
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Opportunities and Challenges in Code Search Tools

CodeMatcher: Searching Code Based on Sequential Semantics of Important Query Words

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search

Enriching query semantics for code search with reinforcement learning

Readability and Understandability Scores for Snippet Assessment: an Exploratory Study

Autism Spectrum Disorder in Pediatric Idiopathic Intracranial Hypertension

Multimodal Representation for Neural Code Search

Cross-language code search using static and dynamic analyses

Guided pattern mining for API misuse detection by change-based code analysis

MACA: A Residual Network with Multi-Attention and Core Attributes for Code Search (S)

Export Citation Format

code searchRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Opportunities and Challenges in Code Search Tools

CodeMatcher: Searching Code Based on Sequential Semantics of Important Query Words

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search

Enriching query semantics for code search with reinforcement learning

Readability and Understandability Scores for Snippet Assessment: an Exploratory Study

Autism Spectrum Disorder in Pediatric Idiopathic Intracranial Hypertension

Multimodal Representation for Neural Code Search

Cross-language code search using static and dynamic analyses

Guided pattern mining for API misuse detection by change-based code analysis

MACA: A Residual Network with Multi-Attention and Core Attributes for Code Search (S)

code search
Recently Published Documents