Text Retrieval Approaches for Concept Location in Source Code

Author(s):  
Andrian Marcus ◽  
Sonia Haiduc
2017 ◽  
Author(s):  
Mohammad Masudur Rahman ◽  
Chanchal Roy

During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a, concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique –ACER– that takes an initial query, identifies appropriate search terms from the source code using a novel term weight –CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.


Author(s):  
Mohammad Masudur Rahman ◽  
Chanchal Roy

During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.


2013 ◽  
Vol 22 (01) ◽  
pp. 1350005
Author(s):  
RICARDO PÉREZ-CASTILLO ◽  
MARIO PIATTINI ◽  
BARBARA WEBER

Concept location is a key activity during software modernization since it allows maintainers to exactly determine what pieces of source code support a specific concept. Real-world business processes and information systems providing operational IT support for respective processes can be misaligned as a consequence of uncontrolled maintenance over time. When concepts supported by an information system are getting outdated or misaligned, concept location becomes a time-consuming and error-prone task. Moreover, enterprise information systems (which implement business processes) embed significant business knowledge over time that is neither present nor documented anywhere else. To support the evolution of existing information systems, the embedded knowledge must first be retrieved and depicted in up-to-date business process models and then be mapped to the source code. This paper addresses this issue through a concept location approach that considers business activities as the key concept to be located and discovers different partial business process views for each piece of source code. Thus, the concept location problem becomes the problem of extracting such views. This approach follows model-driven development principles and an automatic model transformation is implemented to facilitate its adoption. Moreover, a case study involving two real-life information system demonstrates its feasibility.


2012 ◽  
Vol 15 (2) ◽  
Author(s):  
Laura Moreno ◽  
Jairo Aponte

Within the software engineering field, researchers have investigated whether it is pos- sible and useful to summarize software artifacts, in order to provide developers with concise representations of the content of the original artifacts. As an initial step to- wards automatic summarization of source code, we conducted an empirical study where a group of Java developers provided manually written summaries for a variety of source code elements. Such summaries were analyzed and used to evaluate some summarization techniques based on Text Retrieval. This paper describes what are the main features of the summaries written by developers, what kind of information should be (ideally) included in automatically generated sum- maries, and the internal quality of the summaries generated by some automatic methods.


Sign in / Sign up

Export Citation Format

Share Document