The Multi-Relationship Evaluation Design Framework: Designing Testing Plans to Comprehensively Assess Advanced and Intelligent Technologies
As new technologies develop and mature, it becomes critical to provide both formative and summative assessments on their performance. Performance assessment events range in form from a few simple tests of key elements of the technology to highly complex and extensive evaluation exercises targeting specific levels and capabilities of the system under scrutiny. Typically the more advanced the system, the more often performance evaluations are warranted, and the more complex the evaluation planning becomes. Numerous evaluation frameworks have been developed to generate evaluation designs intent on characterizing the performance of intelligent systems. Many of these frameworks enable the design of extensive evaluations, but each has its own focused objectives within an inherent set of known boundaries. This paper introduces the Multi-Relationship Evaluation Design (MRED) framework whose ultimate goal is to automatically generate an evaluation design based upon multiple inputs. The MRED framework takes input goal data and outputs an evaluation blueprint complete with specific evaluation elements including level of technology to be tested, metric type, user type, and, evaluation environment. Some of MRED’s unique features are that it characterizes these relationships and manages their uncertainties along with those associated with evaluation input. The authors will introduce MRED by first presenting relationships between four main evaluation design elements. These evaluation elements are defined and the relationships between them are established including the connections between evaluation personnel (not just the users), their level of knowledge, and decision-making authority. This will be further supported through the definition of key terms. An example will be presented in which these terms and relationships are applied to the evaluation design of an automobile technology. An initial validation step follows where MRED is applied to the speech translation technology whose evaluation design was inspired by the successful use of a pre-existing evaluation framework. It is important to note that MRED is still in its early stages of development where this paper presents numerous MRED outputs. Future publications will present the remaining outputs, the uncertain inputs, and MRED’s implementation steps that produce the detailed evaluation blueprints.