Text mining and software engineering: an integrated source code and document analysis approach

IET Software ◽  
2008 ◽  
Vol 2 (1) ◽  
pp. 3 ◽  
Author(s):  
R. Witte ◽  
Q. Li ◽  
Y. Zhang ◽  
J. Rilling
2020 ◽  
Author(s):  
Willian N. Oizumi ◽  
Alessandro F. Garcia

Design problems affect most software projects and make their maintenance expensive and impeditive. Thus, the identification of potential design problems in the source code – which is very often the only available and upto-date artifact in a project – becomes essential in long-living software systems. This identification task is challenging as the reification of design problems in the source code tend to be scattered through several code elements. However, stateof-the-art techniques do not provide enough information to effectively help developers in this task. In this work, we address this challenge by proposing a new technique to support developers in revealing design problems. This technique synthesizes information about potential design problems, which are materialized in the implementation under the form of syntactic and semantic anomaly agglomerations. Our evaluation shows that the proposed synthesis technique helps to reveal more than 1200 design problems across 7 industry-strength systems, with a median precision of 71% and a median recall of 78%. The relevance of our work has been widely recognized by the software engineering community through 2 awards and 7 publications in international and national venues.


Author(s):  
Minh Ngoc Ngo

Due to the need to reengineer and migrating aging software and legacy systems, reverse engineering has started to receive some attention. It has now been established as an area in software engineering to understand the software structure, to recover or extract design and features from programs mainly from source code. The inference of design and feature from codes has close similarity with data mining that extracts and infers information from data. In view of their similarity, reverse engineering from program codes can be called as program mining. Traditionally, the latter has been mainly based on invariant properties and heuristics rules. Recently, empirical properties have been introduced to augment the existing methods. This article summarizes some of the work in this area.


Author(s):  
Adrián Hernández-López ◽  
Ricardo Colomo-Palacios ◽  
Pedro Soto-Acosta ◽  
Cristina Casado Lumberas

Productivity measurement is constructed by the measure of tree categories of elements: inputs, outputs and factors. This concept, which started being used in the manufacturing industry, has been also a research topic within Software Engineering (SE). In this area, the most used inputs are time and effort and the most used outputs are source code and functionality. Despite of their known limitations, many of the most used productivity measures are still being used due to the information that they provide for management goals. In order to enable the construction of new productivity measures for SE practitioners, the existence of other inputs apart from time and effort, and other outputs, apart from source code and functionality is analyzed in this paper. Moreover, differences in usage of the inputs and production of the outputs among some SE job positions are analyzed and explained.


2020 ◽  
Vol 10 (20) ◽  
pp. 7088
Author(s):  
Luka Pavlič ◽  
Marjan Heričko ◽  
Tina Beranič

In scientific research, evidence is often based on empirical data. Scholars tend to rely on students as participants in experiments in order to validate their thesis. They are an obvious choice when it comes to scientific research: They are usually willing to participate and are often themselves pursuing an education in the experiment’s domain. The software engineering domain is no exception. However, readers, authors, and reviewers do sometimes question the validity of experimental data that is gathered in controlled experiments from students. This is why we will address this difficult-to-answer question: Are students a proper substitute for experienced professional engineers while performing experiments in a typical software engineering experiment. As we demonstrate in this paper, it is not a “yes or no” answer. In some aspects, students were not outperformed by professionals, but in others, students would not only give different answers compared to professionals, but their answers would also diverge. In this paper we will show and analyze the results of a controlled experiment in the source code quality domain in terms of comparing student and professional responses. We will show that authors have to be careful when employing students in experiments, especially when complex and advanced domains are addressed. However, they may be a proper substitution in cases, where non-advanced aspects are required.


2018 ◽  
Vol 1 ◽  
pp. 1-5
Author(s):  
Florian Ledermann

Following Aristotle, F. P. Brooks (1987) emphasizes the distinction between “essential difficulties” and “accidental difficulties” as a key challenge in software engineering. From the point of view of cartography, it would be desirable to identify the cartographic essence of a program, and subject it to additional scrutiny, while its accidental proper-ties, again from the point of view of cartography, are usually of lesser relevance to cartographic analysis. In this paper, two methods that facilitate extracting the cartographic essence of programs are presented: close reading of their source code, and the automated analysis of their runtime behavior. The advantages and shortcomings of both methods are discussed, followed by an outlook to future developments and potential applications.


Author(s):  
Bello Muriana ◽  
Ogba Paul Onuh

Measures of software complexity are essential part of software engineering. Complexity metrics can be used to forecast key information regarding the testability, reliability, and manageability of software systems from study of the source code. This paper presents the results of three distinct software complexity metrics that were applied to two searching algorithms (Linear and Binary search algorithm). The goal is to compare the complexity of linear and binary search algorithms implemented in (Python, Java, and C++ languages) and measure the sample algorithms using line of code, McCabe and Halstead metrics. The findings indicate that the program difficulty of Halstead metrics has minimal value for both linear and binary search when implemented in python. Analysis of Variance (ANOVA) was adopted to determine whether there is any statistically significant differences between the search algorithms when implemented in the three programming languages and it was revealed that the three (3) programming languages do not vary considerably for both linear and binary search techniques which implies that any of the (3) programming languages is suitable for coding linear and binary search algorithms.


Sign in / Sign up

Export Citation Format

Share Document