Accurate token usage accounting #103

New issue

Open

opened 2026-05-23 16:33:28 -04:00 by jasoncouture · 0 comments

jasoncouture commented

2026-05-23 16:33:28 -04:00

(Migrated from github.com)

Why: compaction heuristics (#11, follow-up), budget enforcement (sibling issue, runaway prevention), UI dashboards, and cost reporting all depend on this. Estimates drift, especially across providers and tokenizers.

Scope:

Capture per-turn usage from every provider (OpenAI, Ollama, future).
Persist usage on the context entry so it survives restarts.
Aggregate per-session and per-agent.
Expose via OTEL (`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` already shipped on spans — wire these from real values instead of estimates).

Pairs with: smarter compaction (#11), compactor tool (sibling), token budget / runaway prevention (sibling).

Today token counts come from rough estimation. Replace with actual usage as reported by the provider (per request: prompt tokens, completion tokens, total) and surface it everywhere the framework reasons about context size. Why: compaction heuristics (#11, follow-up), budget enforcement (sibling issue, runaway prevention), UI dashboards, and cost reporting all depend on this. Estimates drift, especially across providers and tokenizers. Scope: - Capture per-turn usage from every provider (OpenAI, Ollama, future). - Persist usage on the context entry so it survives restarts. - Aggregate per-session and per-agent. - Expose via OTEL (\`gen_ai.usage.input_tokens\` / \`gen_ai.usage.output_tokens\` already shipped on spans — wire these from real values instead of estimates). Pairs with: smarter compaction (#11), compactor tool (sibling), token budget / runaway prevention (sibling).