code search
Recently Published Documents


TOTAL DOCUMENTS

180
(FIVE YEARS 55)

H-INDEX

19
(FIVE YEARS 3)

2022 ◽  
Vol 54 (9) ◽  
pp. 1-40
Author(s):  
Chao Liu ◽  
Xin Xia ◽  
David Lo ◽  
Cuiyun Gao ◽  
Xiaohu Yang ◽  
...  

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.


2022 ◽  
Vol 31 (1) ◽  
pp. 1-37
Author(s):  
Chao Liu ◽  
Xin Xia ◽  
David Lo ◽  
Zhiwe Liu ◽  
Ahmed E. Hassan ◽  
...  

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that inherits the advantages of DeepCS (i.e., the capability of understanding the sequential semantics in important query words), while it can leverage the indexing technique in the IR-based model to accelerate the search response time substantially. CodeMatcher first collects metadata for query words to identify irrelevant/noisy ones, then iteratively performs fuzzy search with important query words on the codebase that is indexed by the Elasticsearch tool and finally reranks a set of returned candidate code according to how the tokens in the candidate code snippet sequentially matched the important words in a query. We verified its effectiveness on a large-scale codebase with ~41K repositories. Experimental results showed that CodeMatcher achieves an MRR (a widely used accuracy measure for code search) of 0.60, outperforming DeepCS, CodeHow, and UNIF by 82%, 62%, and 46%, respectively. Our proposed model is over 1.2K times faster than DeepCS. Moreover, CodeMatcher outperforms two existing online search engines (GitHub and Google search) by 46% and 33%, respectively, in terms of MRR. We also observed that: fusing the advantages of IR-based and DL-based models is promising; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.


2021 ◽  
Author(s):  
Chaozheng Wang ◽  
Zhenhao Nong ◽  
Cuiyun Gao ◽  
Zongjie Li ◽  
Jichuan Zeng ◽  
...  

2021 ◽  
Author(s):  
Carlos Eduardo C. Dantas ◽  
Marcelo A. Maia

Code search engines usually use readability feature to rank code snippets. There are several metrics to calculate this feature, but developers may have different perceptions about readability. Correlation between readability and understandability features has already been proposed, i.e., developers need to read and comprehend the code snippet syntax, but also understand the semantics. This work investigate scores for understandability and readability features, under the perspective of the possible subjective perception of code snippet comprehension. We find that code snippets with higher readability score has better comprehension than lower ones. The understandability score presents better comprehension in specific situations, e.g. nested loops or if-else chains. The developers also mentioned writability aspects as the principal characteristic to evaluate code snippets comprehension. These results provide insights for future works in code comprehension score optimization.


Life ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 972
Author(s):  
Anne K. Jensen ◽  
Claire A. Sheldon ◽  
Grace L. Paley ◽  
Christina L. Szperka ◽  
Geraldine W. Liu ◽  
...  

In recent years, the substantial burden of medical comorbidities in autism spectrum disorder (ASD) populations has been described. We report a retrospective observational case series of pediatric patients with suspected idiopathic intracranial hypertension (IIH) and concurrent ASD. Pediatric subjects with suspected IIH aged 2–18 years were identified by review of a pediatric neuro-ophthalmologist’s database spanning from July 1993 to April 2013. ASD diagnoses were identified within this cohort by an ICD-9 diagnosis code search and database review. Three subjects had concurrent ASD diagnoses; all were non-obese males. Since the retrospective observational case series was performed in April 2013, we identified three additional IIH cases in boys with ASD. Our experience suggests that IIH may be a comorbidity of ASD, particularly in non-obese boys.


2021 ◽  
Author(s):  
Jian Gu ◽  
Zimin Chen ◽  
Martin Monperrus

2021 ◽  
Vol 28 (2) ◽  
Author(s):  
Sebastian Nielebock ◽  
Robert Heumüller ◽  
Kevin Michael Schott ◽  
Frank Ortmeier

AbstractLack of experience, inadequate documentation, and sub-optimal API design frequently cause developers to make mistakes when re-using third-party implementations. Such API misuses can result in unintended behavior, performance losses, or software crashes. Therefore, current research aims to automatically detect such misuses by comparing the way a developer used an API to previously inferred patterns of the correct API usage. While research has made significant progress, these techniques have not yet been adopted in practice. In part, this is due to the lack of a process capable of seamlessly integrating with software development processes. Particularly, existing approaches do not consider how to collect relevant source code samples from which to infer patterns. In fact, an inadequate collection can cause API usage pattern miners to infer irrelevant patterns which leads to false alarms instead of finding true API misuses. In this paper, we target this problem (a) by providing a method that increases the likelihood of finding relevant and true-positive patterns concerning a given set of code changes and agnostic to a concrete static, intra-procedural mining technique and (b) by introducing a concept for just-in-time API misuse detection which analyzes changes at the time of commit. Particularly, we introduce different, lightweight code search and filtering strategies and evaluate them on two real-world API misuse datasets to determine their usefulness in finding relevant intra-procedural API usage patterns. Our main results are (1) commit-based search with subsequent filtering effectively decreases the amount of code to be analyzed, (2) in particular method-level filtering is superior to file-level filtering, (3) project-internal and project-external code search find solutions for different types of misuses and thus are complementary, (4) incorporating prior knowledge of the misused API into the search has a negligible effect.


Sign in / Sign up

Export Citation Format

Share Document