Explainable analytics: Understanding causes, correcting errors, and increasingly achieving perfect accuracy from nature of distinguishable patterns
Abstract In addition to pursue accurate analytics, it is invaluable to clarify how and why inaccuracy exists. We propose a transparent classification method (TC). In training, we discover patterns from positive and negative observations respectively; next, patterns are excluded if they appear in both types. In testing, observations are scored by the pure patterns and connected like social networks. Based on set theory, pure patterns have explanatory power for distinguishing tangled relationship between negative and positive observations. Experimental results demonstrate that TC can identify all positive (e.g., malignant) observations at low ratios of training to testing, e.g., 1:9 in Breast Cancer Wisconsin (Original) and 3:7 in Contraceptive Method Choice dataset. Without fine-tuned parameters and random selection, TC eliminates uncertainty of the methodology. TC can visualize causes, and therefore, prediction errors are traceable and can be corrected. Further, TC shows potential of identifying whether the ground truth is incorrect (e.g., diagnostic errors).