Summary: Mechanisms to Verify International Agreements about AI Development

intelligence.orgby Joe RogeroMarch 18, 20268 min read1 views

If world leaders agree to halt or limit AI development, they will need to verify that other nations are keeping their commitments. To this end, it helps to know where AI chips are, how they’re used, and what the AIs trained on them can do. In this post, we informally summarize “Mechanisms to Verify International […] The post Summary: Mechanisms to Verify International Agreements about AI Development appeared first on Machine Intelligence Research Institute .

In this post, we informally summarize “Mechanisms to Verify International Agreements About AI Development”, written by the Technical Governance Team‘s Aaron Scher and Lisa Thiergart, and originally published in November 2024. For a more technical overview, we recommend the executive summary.

Here, we’ll cover three illustrative policy goals explored in the paper, along with potential verification methods for each. The three goals involve:

Tracking the location of AI compute (chips).
Verifying that tracked compute is not doing large-scale training.
Certifying model evaluations.

We focus primarily on verification for international governance, and discuss some promising methods useful for a range of future verification needs.1

Goal One: Tracking AI compute

Modern AIs are trained using highly advanced and specialized chips, usually (but not necessarily) clustered in large datacenters for efficiency. International governance thus requires some way to track these high-end chips. There are low-tech and high-tech ways to do this.

Low tech: In-person inspections. International inspectors might physically visit datacenters (perhaps via an arrangement like the START nuclear weapons treaty) to count chips, examine chips for tampering, audit security, and set up cameras for continuous monitoring. These methods require physical access to the datacenters, and so a tighter and more detailed agreement among participants, but they could be implemented almost immediately with current technology.

High tech: Secure chip governance. Chips can be designed or modified to remotely confirm their location, or retrofitted with hardware that does this. For instance, regularly pinging multiple external servers and measuring the time delay between pings can approximate a chip’s general location on Earth. Regulators can require chip manufacturers to install such mechanisms, bypassing the need for ongoing physical access to datacenters. This method requires developing better tamper-proofing; current chip security measures aren’t designed to stop a well-resourced attacker from simply extracting a chip’s private key and using it to spoof the location.

It is recommended to primarily track chips rather than datacenters, as many datacenters don’t train AIs and the highly bottlenecked chip supply chain is easier to secure.

Goal Two: Verifying lack of large-scale training

The most capable AIs tend to be those trained at the largest scale, so training compute is a useful, though imperfect, proxy for capabilities in AI governance. Most chips that are used to run AI can also be used for training; if states choose to ban large-scale training runs, and someone insists “I’m only using these chips for inference,” states need a way to verify that claim.

Currently, very large datacenters are used to train frontier AIs. Physical inspections and continuous monitoring might provide confirmation that the quality and type of chips in a datacenter don’t permit large training runs. But an international agreement needs to be robust against attempts to cheat, and monitoring large datacenters incentivizes would-be cheaters to find ways to distribute training runs across many smaller datacenters. A better method would involve chips designed solely for inference rather than merely specialized for it.2 For now, the majority of CPUs and non-datacenter GPUs are not powerful enough to require monitoring, though states will eventually need to adjust what qualifies as AI compute as algorithms improve.

Once datacenters with the capacity for large training runs are identified, states need to verify that such training isn’t taking place. There are a few ways one might do so, for example:

Require datacenters to log their chip activities for later auditing.
Re-run some fraction of those activities on mutually trusted hardware.
Classify workloads based on power draw, bandwidth, and other externally detectable metrics.

Logging and verifying chip workloads can provide strong evidence that the chips weren’t used for illegal training. But since the actual content of the training runs might be sensitive or secret, it helps to have verification options that don’t expose the code, weights, or training data.

Fortunately, large-scale training has characteristics that make it stand out. For example, it needs high interconnect bandwidth for communication among clusters of chips, and enforcing limits on this bandwidth is one way to limit large training runs. If granted temporary access to sample workloads from a given datacenter, verifiers can figure out which patterns of power draw and network activity correspond to innocuous chip use, making external monitoring much more effective.

Securing chips against tampering, even by well-resourced state actors, opens up additional options. For instance, the chips themselves could log and report their activity. In one proposed design, a second computer attached to the chip performs monitoring and other governance functions. The monitor itself would need to be secure, but with a less demanding workload, it could be easier to make than high-end AI chips.

A drawback of indirect methods is that they are difficult to make adversarially robust, especially given likely advances in algorithmic efficiency and distributed training. Secure chips and trusted compute infrastructure thus make good targets for proactive research.

Goal Three: Certifying model evaluations

Separately or in addition to limits on training, countries may want to evaluate each other’s AI models. The main difficulties here include:

Securing the evaluation process.
Evaluating the correct model.
Running effective evaluations.

Evaluations might be made more secure by running only mutually approved code in Trusted Execution Environments (TEEs).3 For instance, a verifier might require that the model code and weights run in a TEE must be verifiably the same during evaluation as during training or deployment, which could catch attempts to sneak a weaker model into the eval.

While minimal versions of these techniques will likely be available soon, existing AI chips may not be sufficiently secure against well-resourced nation-state actors. Research into secure evaluations should begin before it’s imminently needed.

Even if security and model identification are solved, the science of evaluations is still in its early stages. As of 2026, AI capabilities are still outpacing evaluators’ ability to test them. If frontier AI progress is slowed by international agreement, evaluations might be able to catch up, but for now, evaluating the capacities of frontier AIs remains a difficult problem.

Other verification mechanisms

Some mechanisms could be useful for a wide range of policy goals. Whistleblower programs have been useful in reporting breaches of conduct in sensitive industries, and can help provide additional verification.4 With the right capabilities and scaffolding, AI-enabled methods could classify workloads (Goal Two) or conduct evaluations (Goal Three) without leaking sensitive data.

Another option is to require a safety case for AI deployments: a structured and well-evidenced argument that a particular application of AI is safe. Such requirements are common in mature industries such as airlines, oil and gas, and medical devices.

Many policy goals may require monitoring the behavior of every copy of a deployed AI. This is a difficult challenge that may nevertheless be doable if model weights can be secured from theft in a small number of datacenters.

It has become common practice for AI developers to train their AIs on a behavior specification. These “model specs” are far from reliable; despite instructions, AIs can still badly misbehave, and to date there are no defenses that can stop a determined attacker with access to model weights from bypassing specs. But to the extent that model specs can influence AI behavior at all, it makes sense to include agreed-upon principles like “don’t violate international agreements” and to look for better ways to make them stick.

Final thoughts

MIRI has gone to great lengths to communicate that building powerful artificial intelligence under current conditions most likely kills everyone on Earth. Preventing this outcome will likely require a concerted (but not unprecedented) diplomatic effort, backed by robust technical solutions.

If states are able to pull together and commit to an international agreement, it seems possible to set up a verification regime and avert global catastrophe. But verification will be far more secure and robust, with less costly tradeoffs, if we do the hard work of figuring out how to implement the most promising methods now. Some core challenges are the increasing pace of algorithmic progress and distributed training, the difficulty of classifying AI chip activities in an adversarial setting, and the novel threat landscape from highly capable AI systems. Proactive research into solving these challenges will better position us for international and domestic regulation alike.

Footnotes

Many of these tools can also help domestic regulators enforce laws around AI development, but some will be overkill because domestic companies tend to be less adversarial and less willing to break laws.
Since November 2024, there’s been a growing market for chips specialized for inference, like Sohu or chips from Groq. These would probably be inefficient at pre-training, if they work at all, but a determined state actor might be able to use them anyway. The specialized chips we’re proposing would need to be explicitly designed to prevent that.
Existing TEE implementations include NVIDIA’s Confidential Computing or Apple’s Secure Enclave.
Prearranged insider interviews may also be helpful.

Original source

intelligence.org

https://intelligence.org/2026/03/18/mechanisms-to-verify-international-agreements-about-ai-development/

Was this article helpful?