Accurate token usage accounting #103

Open
opened 2026-05-23 16:33:28 -04:00 by jasoncouture · 0 comments
jasoncouture commented 2026-05-23 16:33:28 -04:00 (Migrated from github.com)

Today token counts come from rough estimation. Replace with actual usage as reported by the provider (per request: prompt tokens, completion tokens, total) and surface it everywhere the framework reasons about context size.

Why: compaction heuristics (#11, follow-up), budget enforcement (sibling issue, runaway prevention), UI dashboards, and cost reporting all depend on this. Estimates drift, especially across providers and tokenizers.

Scope:

  • Capture per-turn usage from every provider (OpenAI, Ollama, future).
  • Persist usage on the context entry so it survives restarts.
  • Aggregate per-session and per-agent.
  • Expose via OTEL (`gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` already shipped on spans — wire these from real values instead of estimates).

Pairs with: smarter compaction (#11), compactor tool (sibling), token budget / runaway prevention (sibling).

Today token counts come from rough estimation. Replace with actual usage as reported by the provider (per request: prompt tokens, completion tokens, total) and surface it everywhere the framework reasons about context size. Why: compaction heuristics (#11, follow-up), budget enforcement (sibling issue, runaway prevention), UI dashboards, and cost reporting all depend on this. Estimates drift, especially across providers and tokenizers. Scope: - Capture per-turn usage from every provider (OpenAI, Ollama, future). - Persist usage on the context entry so it survives restarts. - Aggregate per-session and per-agent. - Expose via OTEL (\`gen_ai.usage.input_tokens\` / \`gen_ai.usage.output_tokens\` already shipped on spans — wire these from real values instead of estimates). Pairs with: smarter compaction (#11), compactor tool (sibling), token budget / runaway prevention (sibling).
Sign in to join this conversation.
No description provided.