Count Models for Software Quality Estimation

Author(s):  
Kehan Gao ◽  
Taghi M. Khoshgoftaar

Timely and accurate prediction of the quality of software modules in the early stages of the software development life cycle is very important in the field of software reliability engineering. With such predictions, a software quality assurance team can assign the limited quality improvement resources to the needed areas and prevent problems from occurring during system operation. Software metrics-based quality estimation models are tools that can achieve such predictions. They are generally of two types: a classification model that predicts the class membership of modules into two or more quality-based classes (Khoshgoftaar et al., 2005b), and a quantitative prediction model that estimates the number of faults (or some other quality factor) that are likely to occur in software modules (Ohlsson et al., 1998). In recent years, a variety of techniques have been developed for software quality estimation (Briand et al., 2002; Khoshgoftaar et al., 2002; Ohlsson et al., 1998; Ping et al., 2002), most of which are suited for either prediction or classification, but not for both. For example, logistic regression (Khoshgoftaar & Allen, 1999) can only be used for classification, whereas multiple linear regression (Ohlsson et al., 1998) can only be used for prediction. Some software quality estimation techniques, such as case-based reasoning (Khoshgoftaar & Seliya, 2003), can be used to calibrate both prediction and classification models, however, they require distinct modeling approaches for both types of models. In contrast to such software quality estimation methods, count models such as the Poisson regression model (PRM) and the zero-inflated Poisson (ziP) regression model (Khoshgoftaar et al., 2001) can be applied to yield both with just one modeling approach. Moreover, count models are capable of providing the probability that a module has a given number of faults. Despite the attractiveness of calibrating software quality estimation models with count modeling techniques, we feel that their application in software reliability engineering has been very limited (Khoshgoftaar et al., 2001). This study can be used as a basis for assessing the usefulness of count models for predicting the number of faults and quality-based class of software modules.

2009 ◽  
pp. 3142-3159 ◽  
Author(s):  
Witold Pedrycz ◽  
Giancarlo Succi

The learning abilities and high transparency are the two important and highly desirable features of any model of software quality. The transparency and user-centricity of quantitative models of software engineering are of paramount relevancy as they help us gain a better and more comprehensive insight into the revealed relationships characteristic to software quality and software processes. In this study, we are concerned with logic-driven architectures of logic models based on fuzzy multiplexers (fMUXs). Those constructs exhibit a clear and modular topology whose interpretation gives rise to a collection of straightforward logic expressions. The design of the logic models is based on the genetic optimization and genetic algorithms, in particular. Through the prudent usage of this optimization framework, we address the issues of structural and parametric optimization of the logic models. Experimental studies exploit software data that relates software metrics (measures) to the number of modifications made to software modules.


Author(s):  
TAGHI M. KHOSHGOFTAAR ◽  
EDWARD B. ALLEN

Embedded-computer systems have become essential to life in modern society. For example, the backbone of society's information infrastructure is telecommunications. Embedded systems must have highly reliable software, so that we avoid the severe consequences of failures, intolerable down-time, and expensive repairs in remote locations. Moreover, today's fast-moving technology marketplace mandates that embedded systems evolve, resulting in multiple software releases embedded in multiple products. Software quality models can be valuable tools for software engineering of embedded systems, because some software-enhancement techniques are so expensive or time-consuming that it is not practical to apply them to all modules. Targeting such enhancement techniques is an effective way to reduce the likelihood of faults discovered in the field. Research has shown software metrics to be useful predictors of software faults. A software quality model is developed using measurements and fault data from a past release. The calibrated model is then applied to modules currently under development. Such models yield predictions on a module-by-module basis. This paper examines the Classification And Regression Trees (CART) algorithm for building tree-based models that predict which software modules have high risk of faults to be discovered during operations. CART is attractive because it emphasizes pruning to achieve robust models. This paper presents details on the CART algorithm in the context of software engineering of embedded systems. We illustrate this approach with a case study of four consecutive releases of software embedded in a large telecommunications system. The level of accuracy achieved in the case study would be useful to developers of an embedded system. The case study indicated that this model would continue to be useful over several releases as the system evolves.


Author(s):  
Witold Pedrycz ◽  
Giancarlo Succi

The learning abilities and high transparency are the two important and highly desirable features of any model of software quality. The transparency and user-centricity of quantitative models of software engineering are of paramount relevancy as they help us gain a better and more comprehensive insight into the revealed relationships characteristic to software quality and software processes. In this study, we are concerned with logic-driven architectures of logic models based on fuzzy multiplexers (fMUXs). Those constructs exhibit a clear and modular topology whose interpretation gives rise to a collection of straightforward logic expressions. The design of the logic models is based on the genetic optimization and genetic algorithms, in particular. Through the prudent usage of this optimization framework, we address the issues of structural and parametric optimization of the logic models. Experimental studies exploit software data that relates software metrics (measures) to the number of modifications made to software modules.


Author(s):  
DANIELLE AZAR

In this work, we present a genetic algorithm to optimize predictive models used to estimate software quality characteristics. Software quality assessment is crucial in the software development field since it helps reduce cost, time and effort. However, software quality characteristics cannot be directly measured but they can be estimated based on other measurable software attributes (such as coupling, size and complexity). Software quality estimation models establish a relationship between the unmeasurable characteristics and the measurable attributes. However, these models are hard to generalize and reuse on new, unseen software as their accuracy deteriorates significantly. In this paper, we present a genetic algorithm that adapts such models to new data. We give empirical evidence illustrating that our approach out-beats the machine learning algorithm C4.5 and random guess.


Author(s):  
Idrees S. Kocher

The reliability of software is founded on the development, testing, evaluation and maintenance of software systems. In recent years, researchers have been come to see software reliability as a major focus. This is due to the fact that reliability is central to all software quality concepts. System Reliability Engineering is the study of the processes and results of software systems in relation to the basic requirements of users. This paper provides an overview (roadmap) of current developments in software reliability metrics, modeling and operational profiles. It outlines several software engineering methods to achieve reasonable system reliability. Finally, failure metrics are considered based on feedback collected from users after releasing the software and case studies of detected failures. Consequently, numbers and types of failures will be recorded from the users feedback.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Hao Zhang ◽  
Jie Zhang ◽  
Ke Shi ◽  
Hui Wang

Structural modeling is an important branch of software reliability modeling. It works in the early reliability engineering to optimize the architecture design and guide the later testing. Compared with traditional models using test data, structural models are often difficult to be applied due to lack of actual data. A software metrics-based method is presented here for empirical studies. The recurrent neural network (RNN) is used to process the metric data to identify defeat-prone code blocks, and a specified aggregation scheme is used to calculate the module reliability. Based on this, a framework is proposed to evaluate overall reliability for actual projects, in which algebraic tools are introduced to build the structural reliability model automatically and accurately. Studies in two open-source projects show that early evaluation results based on this framework are effective and the related methods have good applicability.


2003 ◽  
Vol 12 (03) ◽  
pp. 207-225 ◽  
Author(s):  
Taghi M. Khoshgoftaar ◽  
Naeem Seliya

Predicting the quality of system modules prior to software testing and operations can benefit the software development team. Such a timely reliability estimation can be used to direct cost-effective quality improvement efforts to the high-risk modules. Tree-based software quality classification models based on software metrics are used to predict whether a software module is fault-prone or not fault-prone. They are white box quality estimation models with good accuracy, and are simple and easy to interpret. An in-depth study of calibrating classification trees for software quality estimation using the SPRINT decision tree algorithm is presented. Many classification algorithms have memory limitations including the requirement that datasets be memory resident. SPRINT removes all of these limitations and provides a fast and scalable analysis. It is an extension of a commonly used decision tree algorithm, CART, and provides a unique tree pruning technique based on the Minimum Description Length (MDL) principle. Combining the MDL pruning technique and the modified classification algorithm, SPRINT yields classification trees with useful accuracy. The case study used consists of software metrics collected from a very large telecommunications system. It is observed that classification trees built by SPRINT are more balanced and demonstrate better stability than those built by CART.


Sign in / Sign up

Export Citation Format

Share Document