scholarly journals Mining Change Logs and Release Notes to Understand Software Maintenance and Evolution

2009 ◽  
Vol 12 (2) ◽  
Author(s):  
Liguo Yu

Software change logs and release notes are documents released together with new versions of a software product. They contain the description of the changes made to the previous version and the new features introduced in the new version. In this paper, we present a keywordbased approach to mining and analyzing non-source code documents and define a mathematical framework to represent the data. This approach is applied in the study of the change logs of Linux and the release notes of FreeBSD. The results show that the software maintenance process and evolution process share some common properties and the keyword-based text mining technique could be used as a systematic method to study software maintenance and evolution.

Author(s):  
Liguo Yu

In C-alike programs, the source code is separated into header files and source files. During the software evolution process, both these two kinds of files need to adapt to changing requirement and changing environment. This paper studies the coevolution of header files and source files of C-alike programs. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the header file difference and source file difference between versions of an evolving software product. Header files distance and source files distance are compared to understand their difference in pace of evolution. Mantel tests are performed to investigate the correlation of header file evolution and source file evolution. The study is performed on the source code of Apache HTTP web server.


Author(s):  
Xing Hu ◽  
Ge Li ◽  
Xin Xia ◽  
David Lo ◽  
Shuai Lu ◽  
...  

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.


Author(s):  
Shu-Ting Shi ◽  
Ming Li ◽  
David Lo ◽  
Ferdian Thung ◽  
Xuan Huo

Code review is the process of manual inspection on the revision of the source code in order to find out whether the revised source code eventually meets the revision requirements. However, manual code review is time-consuming, and automating such the code review process will alleviate the burden of code reviewers and speed up the software maintenance process. To construct the model for automatic code review, the characteristics of the revisions of source code (i.e., the difference between the two pieces of source code) should be properly captured and modeled. Unfortunately, most of the existing techniques can easily model the overall correlation between two pieces of source code, but not for the “difference” between two pieces of source code. In this paper, we propose a novel deep model named DACE for automatic code review. Such a model is able to learn revision features by contrasting the revised hunks from the original and revised source code with respect to the code context containing the hunks. Experimental results on six open source software projects indicate by learning the revision features, DACE can outperform the competing approaches in automatic code review.


2017 ◽  
Vol 8 (2) ◽  
pp. 17-26
Author(s):  
Liguo Yu

In C-alike programs, the source code is separated into header files and source files. During the software evolution process, both these two kinds of files need to adapt to changing requirement and changing environment. This paper studies the coevolution of header files and source files of C-alike programs. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the header file difference and source file difference between versions of an evolving software product. Header files distance and source files distance are compared to understand their difference in pace of evolution. Mantel tests are performed to investigate the correlation of header file evolution and source file evolution. The study is performed on the source code of Apache HTTP web server.


2018 ◽  
Vol 2 (1) ◽  
pp. 10-15
Author(s):  
Rozita Kadar ◽  
Sharifah Mashita Syed-Mohamad ◽  
Putra Sumari ◽  
Nur 'Aini Abdul Rashid

Program comprehension is an important process carried out involving much effort in software maintenance process. A key challenge to developers in program comprehension process is to comprehend a source code. Nowadays, software systems have grown in size ca using increase in developers' tasks to explore and understand millions of lines of source code. Meanwhile, source code is a crucial resource for developers to become familiar with a software system since some system documentations are often unavailable or outdated. However, there are problems exist in understanding source codes, which are tricky with different programming styles, and insufficient comments. Although many researchers have discussed different strategies and techniques to overcome program compr ehension problem, only a shallow knowledge is obtained about the challenges in trying to understand a software system through reading source code. Therefore, this study attempts to overcome the problems in source code comprehension by suggesting a suitable comprehension technique. The proposed technique is based on using ontology approach for knowledge representation. This approach is able to easily explain the concept and relationship of program domain. Thus, the proposed work will create a better way for improving program comprehension.


Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 3
Author(s):  
Gábor Antal ◽  
Zoltán Tóth ◽  
Péter Hegedűs ◽  
Rudolf Ferenc

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.


Author(s):  
Yu Zhou ◽  
Yanxiang Tong ◽  
Taolue Chen ◽  
Jin Han

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.


Sign in / Sign up

Export Citation Format

Share Document