The AI Safety Institute reflects on its first year
Remember where we were a year ago.
World leaders gathered for the world’s first AI Safety Summit and concluded that powerful AI could have the “potential for serious, even catastrophic, harm.”
The UK’s cyber agency warned that “AI will almost certainly increase the volume and heighten the impact of cyber-attacks over the next two years.”
Demis Hassabis, at this point yet to receive his Nobel Prize, called for government to treat “the risks of AI as seriously as other major global challenges, like climate change.”
And the UK Government responded by launching the world’s first AI Safety Institute.
But there was no playbook for what an AI Safety Institute should be.
A year later, we have built a world leading government organisation for understanding AI threats. We run state of the art evaluations to measure potentially dangerous AI capabilities. We use these insights to create an international approach to defining and measuring these risks. And we conduct world-leading research to understand how these harms will impact our society.
On our first anniversary, we explain what we have built and where we are going next.
AI safety isn’t sci-fi. If we’re concerned that powerful AI models might be used for cyber attacks, or might resist human control, or might be used to persuade and manipulate populations, then let’s measure that risk. Let’s introduce empiricism into debates about AI safety.
That’s the goal of the AI Safety Institute.
Our mission is to equip governments with an empirical understanding of the safety of advanced AI systems.
We work with some of the most innovative, fast-moving tech companies in the world. We particularly focus on the gravest of risks which are the sober responsibility of governments. We have bottled the agility, innovation and ambition of the former with the expertise and gravitas of the latter, to create a cutting-edge start-up in government.
Over the last year, we’ve channelled our efforts into three lines of work: becoming the leading authority on the science of assessing model safety; galvanising the AI safety research field; and working with other governments and AI companies to develop a set of global standards.
Are the most powerful AI models dangerous?
We wanted to build the tools to answer that question, and to refresh that answer as the models get more powerful.
So we’ve built evaluation suites: sets of tests which we can run on the most powerful AI models to understand their capabilities and safety. Our evaluations target those faculties which would cause us most concern: cyber-attacks, chemical and biological misuses, autonomous agent capabilities, safeguards, and the impacts on society.
Building these evaluation suites has taken thousands of hours from a dedicated and brilliant team. It has paid off: we are now world leading. We now have the expertise, proprietary tools, and infrastructure in place to run multiple testing exercises at once, quickly and rigorously, for a range of frontier model families.
That’s why OpenAI, Google DeepMind and Anthropic are among the companies that have worked with AISI to test their most advanced models. And we use these tools to regularly evaluate publicly available models to give us a snapshot of the frontier of AI capabilities.
In our first year, we’ve conducted evaluations on 16 models.
We call these “suites” because there is no one test which is the test. This is still very much an open science, and we’re finding that we need to combine a range of different methods to get a comprehensive view of a given model’s risk profile. Our automated benchmarks fire hundreds of questions at a model to get a quick read on its capabilities. Our red-teaming puts threat-experts in front of the model for in-depth, scenario-based assessments. And we run human uplift studies – akin to randomised control trials – in which ‘representative users’ (e.g. novices; or computer science undergrads) use AI and we measure whether this gives them a boost in completing potentially harmful tasks.
These tests are not “government safety certificates.” We never endorse the safety of a particular AI model. Nor can we release all our results into the public domain, since they often contain sensitive material.
But these evaluations give governments an insight into the risks developing at the frontier of AI, and an empirical basis to decide if, when and how to intervene.
Our work here is not done. Evaluations are a new science. We’ve published some of our lessons from our first year; in the next year, we want to hone our skill, add new threats to our evaluations suite, and work with more companies to test the world’s most powerful AI models.
Will the world be ready to respond if the risks from frontier AI increases?
AI systems may be developed in individual countries but once released, they’re used all over the world. There’s no route to safe AI from one country acting alone.
It’s crucial that if AI risks increase, the international community has a developed reflex for responding. That reflex would rely on agreeing, in advance, what harms we are concerned about and the valid methods of detection.
That’s what we, with other government partners, are building.
We started by launching an independent, international review of the science of AI safety. Yoshua Bengio chairs a drafting team which draws on the expertise of 30 countries alongside the UN and EU. The UK AISI provides the Secretariat. The aim is to create a shared, international baseline for understanding AI safety. We published the interim report in May. The final report will be published before the Third AI Safety Summit in Paris next year.
That gives us today’s baseline. Next, we need to define tomorrow’s red lines. What don’t we want AI models to be able to do?
The UK co-hosted the second AI Safety Summit in Seoul in May, in which 28 international partners agreed that there are some capabilities that frontier AI models should not exhibit without appropriate mitigations. The negotiated text specifically called out offensive capabilities relating to chemical and biological weapons, and the potential ability of models to evade human oversight. Now we are working with industry, academia and international partners to flesh out specific thresholds.
If we understand the baseline, and we can define the red lines, then the remaining gap is to agree how we track the moving frontier, and how we respond as we approach or cross these red lines.
A lot of this response is about building the muscle of working together as an international community – particularly among governments who are taking these risks seriously.
While the UK was first to establish an AI Safety Institute, we were not the last. There are now 10 AISI equivalents coordinated into the International Network of AISIs, which will meet for the first time next week. We’ll be discussing how we measure and mitigate AI risks and start building the institutional muscle of coordination and cooperation on the science of AI safety.
We’ve already shown that it is possible for governments to cooperate on technical safety work. In April, we signed an agreement with the US AISI to create an ‘interoperable’ capability. We’ve since conducted several joint testing exercises with the US government and are working together to taxonomise risks and define best practices for managing them.
These international efforts require contributions from the AI developers. It is these companies, after all, who are ultimately responsible for the safety of their models. At the summit in Seoul, we secured commitments from 16 AI companies to identify, assess and manage AI risks. We’re advancing these Frontier AI Safety Commitments at our upcoming conference, in which AI developers can compare their approaches to developing their safety policies.
There’s a lot of raw science left to be done to guarantee the safety of AI models. So, our third and final effort focuses on galvanising orders of magnitude more research by others on urgent and important questions.
Some of this is about working directly with talented researchers. We recently launched an open call for collaborators on key research areas like safeguards, science of evaluations, and safety cases. We’ve launched a bounty to develop novel evaluations and agent scaffolding. And we’re funding research into the systemic risks associated with AI.
But we also want to put tools in the hands of AI safety researchers everywhere. That’s why we open sourced our evaluations platform, Inspect. Inspect helps us run evaluations quickly and easily: now it is used by other governments and some of the leading AI developers.
Today we’re going further, and releasing dozens of evaluations in Inspect. If we can make AI safety research easier, we can make the science go faster.
We’ve achieved a lot in a year. More than anything else, this is a testament to the extraordinary people who have come to work at AISI.
We’ve had to move fast because AI moves fast. Our evaluations show AI models getting more powerful quarter by quarter.
By creating a technical authority on AI safety, driving global AI safety standards, and supporting wider research efforts, we're working to ensure that AI’s harms don’t foreclose its potential.
The government has now set out its long-term vision for AISI’s future and its plans to move the organisation onto a statutory footing. This will give AI developers long-term clarity on how they should work with us. It is a critical step towards giving us the power and independence required to enhance the safety of AI over the long term.
We are only a year into this journey. There is so much more work to be done.
If you want to do that work with us, we’re hiring.