Requirements for the Empirical Assessment of Human-AI Work Systems: A Contribution to AI Measurement Science
The development of AI systems represents a significant investment. But to realize the promise of that investment, performance assessment is necessary. Empirical evaluation of Human-AI work systems must adduce convincing empirical evidence that the work method and its AI technology are learnable, usable, and useful. The theme to this Report is the notion that AI assessment must be effective but must also be efficient. Bench testing of a prototype of an AI system cannot require extensive series of experiments with complex designs. Thus, the empirical requirements that are presented in this Report involve escaping some of the constraints that are imposed in traditional laboratory research. Also, there is a recognition of new constraints that are unique to AI evaluation contexts. Empirical requirements are presented covering study design, research methods, statistical analyses, and online experimentation. The 15 requirements presented in this Report should be applicable to all research intended to evaluate the effectivity of AI systems.