Bounty programme for novel evaluations and agent scaffolding

Note to readers: we changed our name to the AI Security Institute on 14 February 2025. Read more here.‍

Due to higher-than-expected interest, we are extending the deadline to submit your initial evaluation or scaffold idea to December 14th.

The AI Safety Institute (AISI) evaluates advanced AI systems across a range of potential risk domains, including societal impacts, offensive cyber, dual-use chemistry/biology and autonomous systems.

To increase the breadth of our evaluation suite, we’re looking for talented individuals and organisations to help us build evaluations, specifically for risks related to autonomous systems. Additionally, we are seeking to license agent scaffolding tools that enhance performance on relevant tasks, enabling us to test the full extent of frontier AI systems' capabilities.

This bounty programme will contribute directly to AISI’s work assessing future AI models and informing robust and appropriate governance. Successful applicants will receive compensation for their work.

Opening our dangerous capabilities evaluation and agent scaffoldings bounty

Dangerous capabilities evaluations test the ability of frontier models to perform dangerous actions, as well as foundational skills that are prerequisites for those dangerous actions. More granular evaluations allow us to develop more accurate capabilities thresholds to anchor governance and policy. Having a more comprehensive evaluation suite gives us assurance that we are covering all possible risks when testing a model. Additionally, high-performing agent scaffolding is crucial for assessing the upper limits of these models’ capabilities, such that we do not underestimate their potential.

We are seeking applications and proposals on the following topics:

Autonomous agent capabilities evaluations

These evaluations assess an AI model's ability to work independently, completing tasks that might be risky or could lead to unintended consequences. As one example, we are interested in an AI system’s ability to replicate itself across the internet, potentially reducing oversight and control. Please reference our Autonomous Systems evaluation standard and use our template repository while building your evaluation. For evaluation ideas, judging criteria, and details around IP and payout, please see here.

Agent Scaffolding

Agent scaffolding consists of a set of tools, prompting procedures, and error handling to assist the agent in troubleshooting issues that arise from tool-calling. We are interested in purchasing or licensing state-of-the-art agent scaffolds that improve performance on relevant planning, execution and coding tasks — see here for one representative example.

Final submissions should be in the form of Inspect-solvers together with relevant tooling. The agent scaffolding should help solve agent-based tasks in the inspect platform. You can see more details regarding the Inspect agent API here. We have devised a list of priority existing agent scaffolds to implement in Inspect, as well as new ideas. For these, judging criteria, and details around IP and payout, please see here.

‍

How to apply

Please apply through the application form.

Applications must be submitted by November 30, 2024. Each submission will be reviewed by a member of AISI’s technical staff. Evaluation applicants who successfully proceed to the second stage (building the evaluation) will receive an award of £2,000 for compute expenditures. We will work with applicants to agree on a timeline for the final submission at this point. At applicants' request, we can match you with other applicants who are excited about working on similar ideas.

Full bounty payments will be made following submission of the resulting evaluations that successfully meet our criteria. If your initial application is successful, we will endeavour to provide information as early as possible on your chances of winning the bounty payout. The size of the bounty payout will be based on the development time required and success as measured against the judging criteria. To give an indication, we expect to reward a successful task with £100-200 per development hour. This means a successful applicant would receive £3000-£15,000 for a successful task, though we will reward exceptionally high-quality and effortful tasks with a higher payout.

‍

Timeline

December 14th, midnight anywhere in the world: Deadline for Stage 1 application. In Stage 1, you will submit a design for your evaluation or proposal for an agent scaffold.
December 15th 2024-January 19th 2025: Feedback given to applicants, progression to Stage 2 confirmed. Submissions received before December 14th may be reviewed and progressed before December 14th.
Variable timelines for Stage 2 submission: In Stage 2, you will build and submit the corresponding evaluation or scaffold. The latest work can be submitted is March 15, 2025, although we hope that the majority of work will be completed by February 15, 2025. We will iteratively give feedback on your application during this time.

‍

Office Hours

We previously held two office hours to answer community questions about our evaluation methodology, focus areas, and the Inspect framework.

Office Hour 1: Wednesday 6th November, 19:30–20:30 BST
Office Hour 2: Monday 11th November, 17:00–18:00 BST

Recordings of both sessions are now available:

AISI Bounty Q&A #1
AISI Bounty Q&A #2

‍

Contribute to the forefront of AI safety

By contributing to our evaluation suite, you'll directly support our critical work. Your contributions will help shape the measurement and governance of the most advanced AI systems, making a tangible difference in ensuring the safe and responsible development of AI. This is a unique opportunity to be at the forefront of AI safety. We look forward to reviewing your applications!

‍

How does the bounty differ from AISI's Systemic AI Safety grants programme?

Systemic AI safety is an emerging field focused on holistically understanding and mitigating AI deployment risks across society and sectors - of which model evaluations are just one component to mitigating risks. AISI's Systemic AI Safety fast grants programme is a parallel program aimed at incentivising research into this emerging field, and offers seed grants of up to £200,000 for UK-based researcher organisations and international project partners where applicable. The programme is running for 12 months and the grant call closes on the 26th of November. You can find out more at aisi.gov.uk/grants

‍

Resources

The following are resources provide information on our chosen risk models and showcase high-quality evaluation tasks, though the latter do not always conform to our chosen risk models:

Phuong et. al. 2024, ‘Evaluating Frontier Models for Dangerous Capabilities’

Kinniment et. al. 20242023, ‘Evaluating Language-Model Agents on Realistic Autonomous Tasks’

Benton et. al. 2024, 'Sabotage Evaluations for Frontier Models'

Apollo Research, 'A Starter Guide for Evals'

METR Example Task Suite (Public)

Jimenez et. al. 2024, 'SWE-Bench: Can Language Models Resolve Real-World Github Issues'

Chan et. al. 2024, 'MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering'

The following resources provide useful guidance on agent scaffolds:

Wang et. al. 2023, 'A Survey on Large Language Model based Autonomous Agents'

Masterman et. al. 2024, 'The Landscape of Emergent AI Agent Architectures for Reasoning, Planning and Tool Calling: A Survey‍

Jim Yang et. al. 2024, 'SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering'

Yao et. al. 2023, 'ReAct: Synergizing Reasoning and Acting in Language Models'

‍

Acknowledgements

We also thank METR for their guidance and help.