Safety frameworks have become standard practice amongst frontier AI developers. In them, developers outline key risks, how they’ll measure them and steps to mitigate them. But this is no easy task, particularly when the risks are novel, fast-changing, and hard to pin down.
Over 11 frameworks have now been released, and counting, spurring a wave of research on how to write, implement, and refine them. Two of our latest papers contribute to that conversation. The first provides developers an overview of emerging practices in safety frameworks. The second proposes a method to help implement them: safety cases.
In this blog, we explain what safety cases are and how they can assist AI developers in determining whether an AI system meets the safety thresholds outlined in their safety framework.
Effectively implementing safety frameworks means demonstrating that an AI system is safe. To do that, three things are needed:
Safety cases are a widely used technique that brings all three of these together into a single clear, assessable argument (Favaro et al., 2023; Sujan et al., 2016; Bloomfield et al., 2012; Inge, 2007).
We’ve previously written about why we’re working on safety cases at AISI, and what safety cases for frontier AI systems might look like. Our new paper looks at how safety cases can be used for frontier AI and why developers might find them useful.
Safety cases can be used to inform organisational decision-making on the safety of frontier AI systems:
They are broadly useful whenever decisions about safety are being made. At the moment, they are likely most useful for internal company decision-making. In the future, we can imagine safety cases being shared with third parties, and published in some format, much like model cards and capability evaluations are today.
Let’s look at some examples:
Safety frameworks typically specify conditions for safe development and deployment of frontier AI systems based on the system’s capabilities and the safety measures implemented. Safety cases can help developers test a particular system against this safety framework. Safety cases thereby complement safety frameworks, which outlines broad policies and principles that apply across systems at the organisational level, with system-specific analysis. In the paper, we look in more detail at how safety cases can contribute to the fulfilment of commitments made in safety frameworks. This builds on work outlined in our paper on emerging practices in safety frameworks.
We don’t yet know how to write robust arguments that frontier AI systems are safe – and this means that we can’t yet write full and correct safety cases.
There’s a whole host of open problems to solve before we reach that stage, both on methodology and on substance. For example, we currently don’t know:
There are also technical machine learning questions that come up time and time again when sketching safety cases. For example, how much can we rely on capability evaluations? Are we correctly eliciting capabilities? Could future models sandbag evaluations?
We’re optimistic that we can solve many of these problems by writing safety case sketches (our best guesses about how to write an argument for a particular system) and safety case templates (rough arguments that can be filled in for a particular system).
To learn more – including more details about how safety cases can be used, why we’re excited about them, and a more detailed list of open problems – take a look at the paper.