A Practical Software Quality Classification Model Using Genetic Programming

Author(s):  
Yi Liu ◽  
Taghi M. Khoshgoftaar

A software quality estimation model is an important tool for a given software quality assurance initiative. Software quality classification models can be used to indicate which program modules are fault-prone (FP) and not fault-prone (NFP). Such models assume that enough resources are available for quality improvement of all the modules predicted as FP. In conjunction with a software quality classification model, a quality-based ranking of program modules has practical benefits since priority can be given to modules that are more FP. However, such a ranking cannot be achieved by traditional classification techniques. We present a novel software quality classification model based on multi-objective optimization with genetic programming (GP). More specifically, the GP-based model provides both a classification (FP or NFP) and a quality-based ranking for the program modules. The quality factor used to rank the modules is typically the number of faults or defects associated with a module. Genetic programming is ideally suited for optimizing multiple criteria simultaneously. In our study, three performance criteria are used to evolve a GP-based software quality model: classification performance, module ranking, and size of the GP tree. The third criterion addresses a commonly observed phenomena in GP,that is, bloating. The proposed model is investigated with case studies of software measurement data obtained from two industrial software systems.

Author(s):  
TAGHI M. KHOSHGOFTAAR ◽  
LOFTON A. BULLARD ◽  
KEHAN GAO

A rule-based classification model is presented to identify high-risk software modules. It utilizes the power of rough set theory to reduce the number of attributes, and the equal frequency binning algorithm to partition the values of the attributes. As a result, a set of conjuncted Boolean predicates are formed. The model is inherently influenced by the practical needs of the system being modeled, thus allowing the analyst to determine which rules are to be used for classifying the fault-prone and not fault-prone modules. The proposed model also enables the analyst to control the number of rules that constitute the model. Empirical validation of the model is accomplished through a case study of a large legacy telecommunications system. The ease of rule interpretation and the transparency of the functional aspects of the model are clearly demonstrated. It is concluded that the new model is effective in achieving the software quality classification.


Author(s):  
TAGHI M. KHOSHGOFTAAR ◽  
KEHAN GAO ◽  
AMRI NAPOLITANO

The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are built with five different learners and evaluated with two performance metrics. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs as well as the best performer of the six commonly used techniques.


Sign in / Sign up

Export Citation Format

Share Document