An Empirical Study on the Procedure to Derive Software Quality Estimation Models

In this work, we present a genetic algorithm to optimize predictive models used to estimate software quality characteristics. Software quality assessment is crucial in the software development field since it helps reduce cost, time and effort. However, software quality characteristics cannot be directly measured but they can be estimated based on other measurable software attributes (such as coupling, size and complexity). Software quality estimation models establish a relationship between the unmeasurable characteristics and the measurable attributes. However, these models are hard to generalize and reuse on new, unseen software as their accuracy deteriorates significantly. In this paper, we present a genetic algorithm that adapts such models to new data. We give empirical evidence illustrating that our approach out-beats the machine learning algorithm C4.5 and random guess.

Download Full-text

Count Models for Software Quality Estimation

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch055 ◽

2011 ◽

pp. 346-352

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar

Keyword(s):

Regression Model ◽

Software Quality ◽

Software Reliability ◽

Software Metrics ◽

Quantitative Prediction ◽

Reliability Engineering ◽

Quality Estimation ◽

Count Models ◽

Software Modules ◽

Estimation Models

Timely and accurate prediction of the quality of software modules in the early stages of the software development life cycle is very important in the field of software reliability engineering. With such predictions, a software quality assurance team can assign the limited quality improvement resources to the needed areas and prevent problems from occurring during system operation. Software metrics-based quality estimation models are tools that can achieve such predictions. They are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes (Khoshgoftaar et al., 2005b), and a quantitative prediction model that estimates the number of faults (or some other quality factor) that are likely to occur in software modules (Ohlsson et al., 1998). In recent years, a variety of techniques have been developed for software quality estimation (Briand et al., 2002; Khoshgoftaar et al., 2002; Ohlsson et al., 1998; Ping et al., 2002), most of which are suited for either prediction or classification, but not for both. For example, logistic regression (Khoshgoftaar & Allen, 1999) can only be used for classification, whereas multiple linear regression (Ohlsson et al., 1998) can only be used for prediction. Some software quality estimation techniques, such as case-based reasoning (Khoshgoftaar & Seliya, 2003), can be used to calibrate both prediction and classification models, however, they require distinct modeling approaches for both types of models. In contrast to such software quality estimation methods, count models such as the Poisson regression model (PRM) and the zero-inflated Poisson (ziP) regression model (Khoshgoftaar et al., 2001) can be applied to yield both with just one modeling approach. Moreover, count models are capable of providing the probability that a module has a given number of faults. Despite the attractiveness of calibrating software quality estimation models with count modeling techniques, we feel that their application in software reliability engineering has been very limited (Khoshgoftaar et al., 2001). This study can be used as a basis for assessing the usefulness of count models for predicting the number of faults and quality-based class of software modules.

Download Full-text