Key Concepts in AI Safety: Interpretability in Machine Learning
Keyword(s):
This paper is the third installment in a series on “AI safety,” an area of machine learning research that aims to identify causes of unintended behavior in machine learning systems and develop tools to ensure these systems work safely and reliably. The first paper in the series, “Key Concepts in AI Safety: An Overview,” described three categories of AI safety issues: problems of robustness, assurance, and specification. This paper introduces interpretability as a means to enable assurance in modern machine learning systems.
2021 ◽
Keyword(s):
2015 ◽
Vol 1
(1)
◽
pp. 12
◽
Keyword(s):
Keyword(s):