Building Decision Tree Software Quality Classification Models Using Genetic Programming

Author(s):  
Yi Liu ◽  
Taghi M. Khoshgoftaar
Author(s):  
Yi Liu ◽  
Taghi M. Khoshgoftaar

A software quality estimation model is an important tool for a given software quality assurance initiative. Software quality classification models can be used to indicate which program modules are fault-prone (FP) and not fault-prone (NFP). Such models assume that enough resources are available for quality improvement of all the modules predicted as FP. In conjunction with a software quality classification model, a quality-based ranking of program modules has practical benefits since priority can be given to modules that are more FP. However, such a ranking cannot be achieved by traditional classification techniques. We present a novel software quality classification model based on multi-objective optimization with genetic programming (GP). More specifically, the GP-based model provides both a classification (FP or NFP) and a quality-based ranking for the program modules. The quality factor used to rank the modules is typically the number of faults or defects associated with a module. Genetic programming is ideally suited for optimizing multiple criteria simultaneously. In our study, three performance criteria are used to evolve a GP-based software quality model: classification performance, module ranking, and size of the GP tree. The third criterion addresses a commonly observed phenomena in GP,that is, bloating. The proposed model is investigated with case studies of software measurement data obtained from two industrial software systems.


2005 ◽  
Vol 76 (2) ◽  
pp. 111-126 ◽  
Author(s):  
Taghi M. Khoshgoftaar ◽  
Naeem Seliya ◽  
Angela Herzberg

2003 ◽  
Vol 12 (03) ◽  
pp. 207-225 ◽  
Author(s):  
Taghi M. Khoshgoftaar ◽  
Naeem Seliya

Predicting the quality of system modules prior to software testing and operations can benefit the software development team. Such a timely reliability estimation can be used to direct cost-effective quality improvement efforts to the high-risk modules. Tree-based software quality classification models based on software metrics are used to predict whether a software module is fault-prone or not fault-prone. They are white box quality estimation models with good accuracy, and are simple and easy to interpret. An in-depth study of calibrating classification trees for software quality estimation using the SPRINT decision tree algorithm is presented. Many classification algorithms have memory limitations including the requirement that datasets be memory resident. SPRINT removes all of these limitations and provides a fast and scalable analysis. It is an extension of a commonly used decision tree algorithm, CART, and provides a unique tree pruning technique based on the Minimum Description Length (MDL) principle. Combining the MDL pruning technique and the modified classification algorithm, SPRINT yields classification trees with useful accuracy. The case study used consists of software metrics collected from a very large telecommunications system. It is observed that classification trees built by SPRINT are more balanced and demonstrate better stability than those built by CART.


Sign in / Sign up

Export Citation Format

Share Document