AUTOMATICALLY EXPLORING HYPOTHESES ABOUT FAULT PREDICTION: A COMPARATIVE STUDY OF INDUCTIVE LOGIC PROGRAMMING METHODS
We evaluate a class of learning algorithms known as inductive logic programming (ILP) methods on the task of predicting fault density in C++ classes. Using these methods, a large space of possible hypotheses is searched in an automated fashion; further, the hypotheses are based directly on an abstract logical representation of the software, eliminating the need to manually propose numerical metrics that predict fault density. We compare two ILP systems, FOIL and FLIPPER, and conclude that FLIPPER generally outperforms FOIL on this problem. We analyze the reasons for the differing performance of these two systems, and based on the analysis, propose two extensions to FLIPPER: a user-directed bias towards easy-to-evaluate clauses, and an extension that allows FLIPPER to learn "counting clauses". Counting clauses augment logic programs with a variation of the "number restrictions" used in description logics, and significantly improve performance on this problem when prior knowledge is used. We also evaluate the use of ILP techniques for automatic generation of Boolean indicators and numeric metrics from the calling tree representation.