Please enable javascript for this website.

Bounty programme for novel evaluations and agent scaffolding

We are launching a bounty for novel evaluations and agent scaffolds to help assess dangerous capabilities in frontier AI systems.

Jasmine Wang, Jacob Arbeid, Sid Black, Oliver Sourbut, Mojmir Stehlik, Jay Bailey, Michael Schmatz, Jessica Wang, Alan Cooney

The AI Safety Institute (AISI) evaluates advanced AI systems across a range of potential risk domains, including societal impacts, offensive cyber, dual-use chemistry/biology and autonomous systems.  

To increase the breadth of our evaluation suite, we’re looking for talented individuals and organisations to help us build evaluations, specifically for risks related to autonomous systems. Additionally, we are seeking to license agent scaffolding tools that enhance performance on relevant tasks, enabling us to test the full extent of frontier AI systems' capabilities.

This bounty programme will contribute directly to AISI’s work assessing future AI models and informing robust and appropriate governance. Successful applicants will receive compensation for their work.

 

Opening our dangerous capabilities evaluation and agent scaffoldings bounty

Dangerous capabilities evaluations test the ability of frontier models to perform dangerous actions, as well as foundational skills that are prerequisites for those dangerous actions. More granular evaluations allow us to develop more accurate capabilities thresholds to anchor governance and policy. Having a more comprehensive evaluation suite gives us assurance that we are covering all possible risks when testing a model. Additionally, high-performing agent scaffolding is crucial for assessing the upper limits of these models’ capabilities, such that we do not underestimate their potential.  

We are seeking applications and proposals on the following topics:

Autonomous agent capabilities evaluations

These evaluations assess an AI model's ability to work independently, completing tasks that might be risky or could lead to unintended consequences. As one example, we are interested in an AI system’s ability to replicate itself across the internet, potentially reducing oversight and control. Please reference our Autonomous Systems evaluation standard and use our template repository while building your evaluation. For evaluation ideas, judging criteria, and details around IP and payout, please see here.

Agent Scaffolding

Agent scaffolding consists of a set of tools, prompting procedures, and error handling to assist the agent in troubleshooting issues that arise from tool-calling. We are interested in purchasing or licensing state-of-the-art agent scaffolds that improve performance on relevant planning, execution and coding tasks —  see here for one representative example.  

Final submissions should be in the form of Inspect-solvers together with relevant tooling. The agent scaffolding should help solve agent-based tasks in the inspect platform. You can see more details regarding the Inspect agent API here.  

How to apply

Please apply through the application form.

Applications must be submitted by  November 30, 2024. Each submission will be reviewed by a member of AISI’s technical staff. Evaluation applicants who successfully proceed to the second stage (building the evaluation) will receive an award of £2,000 for compute expenditures.  We will work with developers to agree on a timeline for the final submission at this point.  Full bounty payments will be made following submission of the resulting evaluations that successfully meet our criteria. Payment amounts will be determined at the discretion of AISI, and will be based largely on the development time required and success as measured against the judging criteria.

 

Timeline

November 30, midnight GMT: Deadline for Stage 1 application. In Stage 1, you will submit a design for your evaluation or proposal for an agent scaffold.

November 30-January 5: Feedback given to applicants, progression to Stage 2 confirmed.  

Variable timelines for Stage 2 submission: In Stage 2, you will build and submit the corresponding evaluation or scaffold. The latest work can be submitted is March 15, 2025, although we hope that the majority of work will be completed by February 15, 2025.

Office Hours

We’re organising two office hours early in the bounty period to field questions from the community about our evaluation methodology, focus areas and the Inspect framework.  

Office hour 1: Wednesday 6th November, 19.30-20.30 BST. Register here.

Office hour 2: Monday 11th November, 17.00-18.00 BST. Register here.

We will record the office hours and update this post with the links.  

Contribute to the forefront of AI safety

By contributing to our evaluation suite, you'll directly support our critical work. Your contributions will help shape the measurement and governance of the most advanced AI systems, making a tangible difference in ensuring the safe and responsible development of AI. This is a unique opportunity to be at the forefront of AI safety. We look forward to reviewing your applications!  

Acknowledgements

We also thank METR for their guidance and help.