Token budget per agent (runaway prevention) #106

New issue

Open

opened 2026-05-23 16:34:04 -04:00 by jasoncouture · 1 comment

jasoncouture commented

2026-05-23 16:34:04 -04:00

(Migrated from github.com)

Surface:

Per-agent config: `tokenBudget: { window: "1h", limit: 500000, action: "pause|stop|warn" }`.
Aggregation includes sub-agents spawned by the agent (transients, heartbeat, future spawn-tool).
Enforcement point: before the next provider call, check budget; if exceeded, take the configured action and emit an event.
Operator-visible: dashboard surface + event bus signal (`agent:budget-exceeded`).

Considerations:

Window shape: sliding vs. fixed-bucket (sliding is more accurate, fixed-bucket is cheaper).
Reset on operator command.
Defaults: should be unset / unlimited unless operator opts in.

Depends on accurate token usage (sibling) — budgets that key on estimates aren't budgets.

Cap how many tokens an agent (plus all its sub-agents) can spend over a configurable rolling window. When the budget is exhausted, the agent gets back-pressure / pause / hard-stop depending on policy. Prevents a runaway loop (or an adversarial prompt) from burning through quota / money / local GPU time unchecked. Surface: - Per-agent config: \`tokenBudget: { window: \"1h\", limit: 500000, action: \"pause|stop|warn\" }\`. - Aggregation includes sub-agents spawned by the agent (transients, heartbeat, future spawn-tool). - Enforcement point: before the next provider call, check budget; if exceeded, take the configured action and emit an event. - Operator-visible: dashboard surface + event bus signal (\`agent:budget-exceeded\`). Considerations: - Window shape: sliding vs. fixed-bucket (sliding is more accurate, fixed-bucket is cheaper). - Reset on operator command. - Defaults: should be unset / unlimited unless operator opts in. Depends on accurate token usage (sibling) — budgets that key on estimates aren't budgets.

Keesan12 commented

2026-06-05 14:20:47 -04:00

(Migrated from github.com)

Per-agent budgets are a lot more useful than one global spend ceiling because they fail in a way you can reason about.

The part I'd strongly consider pairing with this is a progress test, not just a spend test. A run can stay under budget and still waste hours if it keeps making the same tool call with the same evidence.

The pattern that has held up best for me is:

budget cap per agent
cap on identical failure streaks
one required verifier/proof before another attempt
explicit receipt when the run stops

That last part matters because people can only trust the stop if they can see why it stopped.

We've been seeing the same need in MartinLoop from the control-plane side.

Per-agent budgets are a lot more useful than one global spend ceiling because they fail in a way you can reason about. The part I'd strongly consider pairing with this is a progress test, not just a spend test. A run can stay under budget and still waste hours if it keeps making the same tool call with the same evidence. The pattern that has held up best for me is: - budget cap per agent - cap on identical failure streaks - one required verifier/proof before another attempt - explicit receipt when the run stops That last part matters because people can only trust the stop if they can see why it stopped. We've been seeing the same need in MartinLoop from the control-plane side.