Text Retrieval Approaches for Concept Location in Source Code

Improved query reformulation for concept location using CodeRank and document structures

10.7287/peerj.preprints.3186v1 ◽

2017 ◽

Author(s):

Mohammad Masudur Rahman ◽

Chanchal Roy

Keyword(s):

Software Maintenance ◽

Source Code ◽

Quality Analysis ◽

Query Reformulation ◽

Search Terms ◽

Concept Location ◽

Document Structures ◽

The Right ◽

Change Requests ◽

Change Task

During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a, concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique –ACER– that takes an initial query, identifies appropriate search terms from the source code using a novel term weight –CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.

Download Full-text

Improved query reformulation for concept location using CodeRank and document structures

10.7287/peerj.preprints.3186 ◽

2017 ◽

Cited By ~ 1

Author(s):

Mohammad Masudur Rahman ◽

Chanchal Roy

Keyword(s):

Software Maintenance ◽

Source Code ◽

Quality Analysis ◽

Query Reformulation ◽

Search Terms ◽

Concept Location ◽

Document Structures ◽

The Right ◽

Change Requests ◽

Change Task

During software maintenance, developers usually deal with a significant number of software change requests. As a part of this, they often formulate an initial query from the request texts, and then attempt to map the concepts discussed in the request to relevant source code locations in the software system (a.k.a., concept location). Unfortunately, studies suggest that they often perform poorly in choosing the right search terms for a change task. In this paper, we propose a novel technique --ACER-- that takes an initial query, identifies appropriate search terms from the source code using a novel term weight --CodeRank, and then suggests effective reformulation to the initial query by exploiting the source document structures, query quality analysis and machine learning. Experiments with 1,675 baseline queries from eight subject systems report that our technique can improve 71% of the baseline queries which is highly promising. Comparison with five closely related existing techniques in query reformulation not only validates our empirical findings but also demonstrates the superiority of our technique.

Download Full-text

CONCEPT LOCATION MODELING THROUGH BUSINESS PROCESS VIEWS

International Journal of Cooperative Information Systems ◽

10.1142/s0218843013500056 ◽

2013 ◽

Vol 22 (01) ◽

pp. 1350005

Author(s):

RICARDO PÉREZ-CASTILLO ◽

MARIO PIATTINI ◽

BARBARA WEBER

Keyword(s):

Information System ◽

Information Systems ◽

Business Process ◽

Business Processes ◽

Source Code ◽

Real Life ◽

Process Models ◽

Concept Location ◽

Process Views ◽

Over Time

Concept location is a key activity during software modernization since it allows maintainers to exactly determine what pieces of source code support a specific concept. Real-world business processes and information systems providing operational IT support for respective processes can be misaligned as a consequence of uncontrolled maintenance over time. When concepts supported by an information system are getting outdated or misaligned, concept location becomes a time-consuming and error-prone task. Moreover, enterprise information systems (which implement business processes) embed significant business knowledge over time that is neither present nor documented anywhere else. To support the evolution of existing information systems, the embedded knowledge must first be retrieved and depicted in up-to-date business process models and then be mapped to the source code. This paper addresses this issue through a concept location approach that considers business activities as the key concept to be located and discovers different partial business process views for each piece of source code. Thus, the concept location problem becomes the problem of extracting such views. This approach follows model-driven development principles and an automatic model transformation is implemented to facilitate its adoption. Moreover, a case study involving two real-life information system demonstrates its feasibility.

Download Full-text

Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code

15th IEEE International Conference on Program Comprehension (ICPC '07) ◽

10.1109/icpc.2007.13 ◽

2007 ◽

Cited By ~ 111

Author(s):

D. Poshyvanyk ◽

A. Marcus

Keyword(s):

Information Retrieval ◽

Formal Concept Analysis ◽

Source Code ◽

Concept Analysis ◽

Formal Concept ◽

Concept Location

Download Full-text

Improving software text retrieval using conceptual knowledge in source code

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) ◽

10.1109/ase.2017.8115625 ◽

2017 ◽

Cited By ~ 4

Author(s):

Zeqi Lin ◽

Yanzhen Zou ◽

Junfeng Zhao ◽

Bing Xie

Keyword(s):

Conceptual Knowledge ◽

Source Code ◽

Text Retrieval

Download Full-text

Clustering Support for Static Concept Location in Source Code

2011 IEEE 19th International Conference on Program Comprehension ◽

10.1109/icpc.2011.13 ◽

2011 ◽

Cited By ~ 40

Author(s):

Giuseppe Scanniello ◽

Andrian Marcus

Keyword(s):

Source Code ◽

Concept Location ◽

Static Concept

Download Full-text

Searching program source code with a structured text retrieval system (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99 ◽

10.1145/312624.312739 ◽

1999 ◽

Cited By ~ 1

Author(s):

Charles Clarke ◽

Anthony Cox ◽

Susan Sim

Keyword(s):

Retrieval System ◽

Source Code ◽

Text Retrieval ◽

Poster Abstract

Download Full-text

On the Analysis of Human and Automatic Summaries of Source Code

CLEI electronic journal ◽

10.19153/cleiej.15.2.6 ◽

2012 ◽

Vol 15 (2) ◽

Cited By ~ 7

Author(s):

Laura Moreno ◽

Jairo Aponte

Keyword(s):

Software Engineering ◽

Empirical Study ◽

Source Code ◽

Initial Step ◽

Text Retrieval ◽

Internal Quality ◽

Automatic Summarization ◽

Software Artifacts ◽

Automatic Methods

Within the software engineering field, researchers have investigated whether it is pos- sible and useful to summarize software artifacts, in order to provide developers with concise representations of the content of the original artifacts. As an initial step to- wards automatic summarization of source code, we conducted an empirical study where a group of Java developers provided manually written summaries for a variety of source code elements. Such summaries were analyzed and used to evaluate some summarization techniques based on Text Retrieval. This paper describes what are the main features of the summaries written by developers, what kind of information should be (ideally) included in automatically generated sum- maries, and the internal quality of the summaries generated by some automatic methods.

Download Full-text