The necessity of assuring quality in software measurement data

10th International Symposium on Software Metrics, 2004. Proceedings. ◽

10.1109/metric.2004.1357896 ◽

2004 ◽

Author(s):

T.M. Khoshgoftaar

Keyword(s):

Measurement Data ◽

Software Measurement

Download Full-text

k-Attractors: A Clustering Algorithm for Software Measurement Data Analysis

19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007) ◽

10.1109/ictai.2007.31 ◽

2007 ◽

Author(s):

Yiannis Kanellopoulos ◽

Panos Antonellis ◽

Christos Tjortjis ◽

Christos Makris

Keyword(s):

Data Analysis ◽

Clustering Algorithm ◽

Measurement Data ◽

Software Measurement

Download Full-text

Rule-based noise detection for software measurement data

Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, 2004. IRI 2004. ◽

10.1109/iri.2004.1431478 ◽

2005 ◽

Author(s):

T.M. Khoshgoftaar

Keyword(s):

Measurement Data ◽

Noise Detection ◽

Software Measurement ◽

Download Full-text

Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data

2007 IEEE International Conference on Information Reuse and Integration ◽

10.1109/iri.2007.4296691 ◽

2007 ◽

Author(s):

Jason Van Hulse ◽

Taghi M. Khoshgoftaa

Keyword(s):

Nearest Neighbor ◽

Measurement Data ◽

Software Measurement ◽

Nearest Neighbor Imputation

Download Full-text

Quality Problem in Software Measurement Data

Advances in Computers - Quality Software Development ◽

10.1016/s0065-2458(05)66002-0 ◽

2006 ◽

pp. 43-77 ◽

Author(s):

Pierre Rebours ◽

Taghi M. Khoshgoftaar

Keyword(s):

Measurement Data ◽

Software Measurement ◽

Quality Problem

Download Full-text

Assessments of Feature Selection Techniques with Respect to Data Sampling for Highly Imbalanced Software Measurement Data

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539315500102 ◽

2015 ◽

Vol 22 (02) ◽

pp. 1550010 ◽

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar

Keyword(s):

Feature Selection ◽

Measurement Data ◽

Classification Performance ◽

Training Data ◽

Classification Model ◽

Sampling Techniques ◽

Data Sampling ◽

Software Measurement ◽

Data Set ◽

In the process of software defect prediction, a classification model is first built using software metrics and fault data gathered from a past software development project, then that model is applied to data in a similar project or a new release of the same project to predict new program modules as either fault-prone (fp) or not-fault-prone (nfp). The benefit of such a model is to facilitate the optimal use of limited financial and human resources for software testing and inspection. The predictive power of a classification model constructed from a given data set is affected by many factors. In this paper, we are more interested in two problems that often arise in software measurement data: high dimensionality and unequal example set size of the two types of modules (e.g., many more nfp modules than fp modules found in a data set). These directly result in learning time extension and a decline in predictive performance of classification models. We consider using data sampling followed by feature selection (FS) to deal with these problems. Six data sampling strategies (which are made up of three sampling techniques, each consisting of two post-sampling proportion ratios) and six commonly used feature ranking approaches are employed in this study. We evaluate the FS techniques by means of: (1) a general method, i.e., assessing the classification performance after the training data is modified, and (2) studying the stability of a FS method, specifically with the goal of understanding the effect of data sampling techniques on the stability of FS when using the sampled data. The experiments were performed on nine data sets from a real-world software project. The results demonstrate that the FS techniques that most enhance the models' classification performance do not also show the best stability, and vice versa. In addition, the classification performance is more affected by the sampling techniques themselves rather than by the post-sampling proportions, whereas this is opposite for the stability.

Download Full-text

Imputation techniques for multivariate missingness in software measurement data

Software Quality Journal ◽

10.1007/s11219-008-9054-7 ◽

2008 ◽

Vol 16 (4) ◽

pp. 563-600 ◽

Author(s):

Taghi M. Khoshgoftaar ◽

Jason Van Hulse

Keyword(s):

Measurement Data ◽

Software Measurement

Download Full-text

Incomplete-case nearest neighbor imputation in software measurement data

Information Sciences ◽

10.1016/j.ins.2010.12.017 ◽

2014 ◽

Vol 259 ◽

pp. 596-610 ◽

Author(s):

Jason Van Hulse ◽

Taghi M. Khoshgoftaar

Keyword(s):

Nearest Neighbor ◽

Measurement Data ◽

Software Measurement ◽

Nearest Neighbor Imputation

Download Full-text

Analyzing software measurement data with clustering techniques

IEEE Intelligent Systems ◽

10.1109/mis.2004.1274907 ◽

2004 ◽

Vol 19 (2) ◽

pp. 20-27 ◽

Author(s):

S. Zhong ◽

T.M. Khoshgoftaar ◽

N. Seliya

Keyword(s):

Measurement Data ◽

Software Measurement ◽

Clustering Techniques

Download Full-text

Modeling software measurement data

IEEE Transactions on Software Engineering ◽

10.1109/32.950316 ◽

2001 ◽

Vol 27 (9) ◽

pp. 788-804 ◽

Author(s):

B.A. Kitchenham ◽

R.T. Hughes ◽

S.G. Linkman

Keyword(s):

Measurement Data ◽

Software Measurement ◽

Modeling Software

Download Full-text

Impact of Data Sampling on Stability of Feature Selection for Software Measurement Data

2011 IEEE 23rd International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2011.172 ◽

2011 ◽

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Feature Selection ◽

Measurement Data ◽

Data Sampling ◽

Software Measurement ◽

Download Full-text