Agent runtime fallbacks: configuring backend failover for ACP turns

Agent runtime fallbacks are the difference between a personal AI assistant that quietly fails when its primary model is rate-limited and one that keeps answering through a brief outage. The 2026.5.12 release shipped acp.fallbacks, a config that lets an ACP turn try one or more backup runtime backends before any output is emitted, so a single unhealthy provider does not end the turn.

This post explains what changed in that release, when fallbacks help, when they do not, and how to set them up without making your debugging story worse.

What acp.fallbacks actually does

ACP (Agent Communication Protocol) is the layer that brokers turns between a frontend (Claude Code, an IDE pane, a chat client) and an agent runtime. Before this release, a turn had a single bound backend: if that backend errored or stalled before emitting output, the user saw the failure.

The 2026.5.12 release notes describe the new behavior plainly: “add acp.fallbacks so ACP turns can try configured backup runtime backends when the primary backend is unavailable before any output is emitted.” Two things in that sentence matter.

First, fallbacks fire only when the primary is unavailable. This is not load balancing. It is not retry-with-different-temperature. It is a discrete escalation path for connection refused, provider down, rate-limit-with-no-stream-started style failures.

Second, fallbacks only apply before any output is emitted. Once the primary has started streaming a reply, the system does not silently swap mid-stream. This avoids the worst kind of bug, where half a paragraph comes from Claude Opus and the rest from a smaller open-weight model and nobody can tell.

When fallbacks are worth configuring

Not every failure benefits from a fallback. A bad fallback chain hides real problems and makes incidents harder to diagnose. Use this matrix to decide:

Failure modeFallback helps?Why
Provider rate limit before stream startsYesA different provider key or backend can serve the turn
Provider 5xx, transientYesBackup absorbs the blip while primary recovers
Auth refresh wedgedSometimesOnly if the backup uses a different auth profile
Model returns bad outputNoFallback runs the same prompt; same result likely
Tool call rejectedNoTool topology is shared; backup hits the same wall
Mid-stream stallNoFallbacks only fire before any output

The honest read is: fallbacks save you from boring infrastructure failures, not from prompt or tooling problems. That is enough. Boring infrastructure failures are the most common reason an always-on agent goes quiet.

A minimal config

acp.fallbacks lives in .json (the master config). The backends listed are runtime IDs the host already knows about, ordered from first attempted backup to last. A practical small example:

{
  "acp": {
    "fallbacks": [
      "anthropic-vertex",
      "anthropic-cli"
    ]
  }
}

If the primary backend cannot accept the turn, the host tries anthropic-vertex next, then anthropic-cli. Each backup goes through its own credential resolution and its own health check before being attempted.

A few practical rules:

  • List backups that use different credentials from the primary. A second provider that reads the same expired token is not really a fallback.
  • Order by latency, not by preference. A slow backup that consistently answers beats a fast one that frequently times out.
  • Keep the chain short. Two backends is usually enough. Three is the practical maximum before the time-to-first-token in the worst case starts annoying users.

How fallbacks interact with auth profiles

Auth profile changes shipped earlier in the 4.x cycle moved per-backend credentials into structured secrets.providers[id] entries. Fallbacks lean on that work: each listed backend resolves its credentials through its own profile, not through a global env var. That has a useful side effect — if an OAuth refresh wedges on the primary, a backup with its own profile is not blocked by the same lock.

There is also a quieter improvement in this release: the auth layer now reclaims dead-owner stale file locks before retrying locked writes. Combined with fallbacks, that means a crashed OAuth refresh on the primary does not wedge the whole turn while also blocking the backup from saving a fresh token.

What you still have to think about yourself

Fallbacks do not paper over the harder reliability questions. A few worth keeping in mind:

  • Cost shape. A primary on a flat-rate plan with a metered backup will silently spike your bill on a bad week. If cost matters, pick backups that fail in similar pricing categories.
  • Capability skew. A backup that lacks image input or tool calling will accept the turn but produce a worse reply. ACP does not currently filter the fallback list by capability; that is your job in config.
  • Quietness. A turn that succeeds on backup 2 still succeeded, so the host emits a normal reply. If you want to know how often the primary fails, you have to look at runtime logs or telemetry. Do that periodically. A backup that carries 30% of traffic is a primary you have not noticed.
  • Test the chain. Stop the primary on purpose, in dev, and watch a turn fall through. If you have not done this, you do not know whether the fallback works. Most production fallback configs that have never been exercised in test fail the first time they are needed.

Where this fits in a self-hosted agent

For self-hosted operators, agent runtime fallbacks matter most when the agent lives in a channel users do not refresh on their own — Telegram, Discord, Slack, email. Those surfaces look the same whether the agent is busy or dead. A fallback chain converts a class of obvious-to-the-host failures into invisible-to-the-user successes.

If you run on a single machine with a single provider, a fallback adds limited value: when the network drops, nothing routes. The pattern earns its keep when the primary and backups share no failure domain — different providers, different auth profiles, ideally one cloud-hosted and one local.

See what is for the broader picture of how the runtime layers fit together, and how works for where ACP sits relative to channels and skills. The Telegram reliability deep-dive covers another angle on the same problem: keeping channel ingress alive when other parts of the system stall.

Verifying it works

Once configured, three checks are worth running:

  1. Kill the primary backend in dev (drop its credentials or block its hostname) and send a turn. The host should log a fallback attempt and the reply should come through.
  2. Watch first-token latency on healthy primary turns. Fallbacks add a small bookkeeping cost; if you see a regression, the chain may be doing pre-flight checks too aggressively.
  3. Look at runtime logs for acp.fallbacks activations over a week. Zero activations means either you have a very stable primary or your fallback chain is broken in a way that never triggers. Both are worth investigating.

Putting agent runtime fallbacks together

Agent runtime fallbacks earn their place in a self-hosted assistant when the cost of silent failure is high. Configure two backups with independent credentials, order them by realistic latency, exercise the chain in dev, and re-check activations on a regular cadence. That gets you most of the benefit without the hidden-cost or hidden-quality traps.

The 2026.5.12 release ships fallbacks as a config field, not a default. That is the right call: bad fallbacks are worse than none, because they hide the signal that the primary is broken. Treat the chain as part of your runtime contract, and revisit it whenever you change a provider, an auth profile, or a model.

FAQ

What are agent runtime fallbacks?

Agent runtime fallbacks are a configured chain of backup model or runtime backends that an ACP turn can try when its primary backend cannot accept the turn. The host attempts each backup in order, before any output reaches the user.

Do fallbacks switch mid-reply?

No. The 2026.5.12 release notes are explicit that fallbacks only fire before any output is emitted. Once the primary has started streaming, the host does not swap to a backup mid-stream.

How many fallbacks should I list?

Two is usually enough. Three is the practical maximum before worst-case time-to-first-token gets noticeable. List backups that use different credentials from the primary so a single auth failure does not knock out the whole chain.

Will a fallback fix bad model output?

No. Fallbacks address availability, not quality. A model that returns a wrong answer is not unavailable from the host’s perspective, so the fallback chain never fires. Treat output quality as a prompt and tooling problem.

Are fallbacks worth configuring on a single-machine self-hosted setup?

It depends on whether the primary and backups share a failure domain. If both depend on the same machine and the same network egress, a fallback adds limited value. The pattern earns its keep when the chain spans independent providers or mixes a cloud-hosted backend with a local one.

Sources: v2026.5.12 release notes, Anthropic on production reliability for AI agents, Google: long-running agents that pause and resume, OpenTelemetry semantic conventions for GenAI