feat(provider): OpenAI-compatible chat provider with extras pass-through #43

Merged
jasoncouture merged 3 commits from feat/provider-openai into main 2026-05-07 18:47:37 -04:00
jasoncouture commented 2026-05-07 17:51:15 -04:00 (Migrated from github.com)

Summary

A single OpenAI-shaped provider that works against api.openai.com, llama-server, vLLM, LM Studio, TabbyAPI — anything that speaks /v1/chat/completions. The two-birds-one-stone trick: OpenAIProviderOptions.ExtraRequestParams (a JsonObject) gets deep-merged into every request body, so vendor knobs (cache_prompt, slot_id, samplers, n_probs, min_p, vLLM's guided_choice, etc.) round-trip without forking the provider per backend. Per-agent overrides layer on top via the existing AgentProviderOptions.Resolve merge.

Replaces the originally-planned standalone "llama.cpp native" provider since llama-server's /v1/chat/completions endpoint accepts both the OpenAI-spec fields and llama-server's native extras as the same call. Cuts a project in half and keeps tool calls working out of the box.

Surface

  • OpenAILanguageModel : ILanguageModel. Hand-rolled SSE reader against /v1/chat/completions — no SDK dependency, so we can ship arbitrary extras without fighting a typed schema.
  • Tool-call streaming deltas accumulate per index and flush on finish_reason: tool_calls, with an end-of-stream drain for backends that don't emit a finish reason.
  • Same source__name flatten convention as the Ollama provider so multi-MCP routing is identical across backends.
  • Multimodal user turns map to OpenAI's vision shape (content: [{type:text}, {type:image_url}]).
  • Reasoning content from delta.reasoning_content (or the older reasoning field) becomes IModelThoughtResponse fragments.
  • usage block on the trailer chunk produces a single IModelCompletionResponse fragment.
  • Wired into AddApi() next to AddOllamaProvider() so a host can use either via the agent's model.id.provider name.

Tests

8 new unit tests under tests/LlamaShears.UnitTests/Provider/OpenAI/ cover request shape, extras merge, tool flatten, streaming text fragments, reasoning routing, accumulated tool-call assembly, and usage → completion fragment. Stub HttpMessageHandler captures the request and returns canned SSE.

405/405 tests pass.

Out of scope

  • Embedding model — separate task; not blocking chat usage.
  • The standalone "llama.cpp native /completion" provider — the design discussion landed on "use /v1/chat/completions with extras instead" and the TASKS entry will be revised post-merge.

Test plan

  • dotnet test — green.
  • Manual: agent on this host's gemma4 retargeted to OPENAI/gemma-4-26B-A4B-it-MXFP4_MOE.gguf, host pointed at crapple.alertr.info:8080, end-to-end chat through the existing UI.

🤖 Generated with Claude Code

## Summary A single OpenAI-shaped provider that works against api.openai.com, llama-server, vLLM, LM Studio, TabbyAPI — anything that speaks `/v1/chat/completions`. The two-birds-one-stone trick: `OpenAIProviderOptions.ExtraRequestParams` (a `JsonObject`) gets deep-merged into every request body, so vendor knobs (`cache_prompt`, `slot_id`, `samplers`, `n_probs`, `min_p`, vLLM's `guided_choice`, etc.) round-trip without forking the provider per backend. Per-agent overrides layer on top via the existing `AgentProviderOptions.Resolve` merge. Replaces the originally-planned standalone "llama.cpp native" provider since llama-server's `/v1/chat/completions` endpoint accepts both the OpenAI-spec fields and llama-server's native extras as the same call. Cuts a project in half and keeps tool calls working out of the box. ## Surface - `OpenAILanguageModel : ILanguageModel`. Hand-rolled SSE reader against `/v1/chat/completions` — no SDK dependency, so we can ship arbitrary extras without fighting a typed schema. - Tool-call streaming deltas accumulate per `index` and flush on `finish_reason: tool_calls`, with an end-of-stream drain for backends that don't emit a finish reason. - Same `source__name` flatten convention as the Ollama provider so multi-MCP routing is identical across backends. - Multimodal user turns map to OpenAI's vision shape (`content: [{type:text}, {type:image_url}]`). - Reasoning content from `delta.reasoning_content` (or the older `reasoning` field) becomes `IModelThoughtResponse` fragments. - `usage` block on the trailer chunk produces a single `IModelCompletionResponse` fragment. - Wired into `AddApi()` next to `AddOllamaProvider()` so a host can use either via the agent's `model.id.provider` name. ## Tests 8 new unit tests under `tests/LlamaShears.UnitTests/Provider/OpenAI/` cover request shape, extras merge, tool flatten, streaming text fragments, reasoning routing, accumulated tool-call assembly, and usage → completion fragment. Stub `HttpMessageHandler` captures the request and returns canned SSE. 405/405 tests pass. ## Out of scope - Embedding model — separate task; not blocking chat usage. - The standalone "llama.cpp native `/completion`" provider — the design discussion landed on "use `/v1/chat/completions` with extras instead" and the TASKS entry will be revised post-merge. ## Test plan - [x] `dotnet test` — green. - [ ] Manual: agent on this host's gemma4 retargeted to `OPENAI/gemma-4-26B-A4B-it-MXFP4_MOE.gguf`, host pointed at `crapple.alertr.info:8080`, end-to-end chat through the existing UI. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
github-actions[bot] commented 2026-05-07 17:52:57 -04:00 (Migrated from github.com)
Package Line Rate Branch Rate Complexity Health
LlamaShears.Core.Abstractions.Context 100% 100% 4
LlamaShears.Provider.Ollama 3% 1% 166
LlamaShears.IntegrationTests 87% 73% 71
LlamaShears.Core 46% 31% 879
LlamaShears.Core.Abstractions.Content 0% 100% 1
LlamaShears.Core.Abstractions.Caching 100% 100% 1
LlamaShears.Core.Eventing 90% 73% 53
StrangeSoft.Plugins.Host 20% 21% 87
LlamaShears.Core.Abstractions.Commands 64% 100% 3
LlamaShears.Core.Abstractions.Provider 32% 20% 66
LlamaShears.Core.Abstractions.Memory 0% 100% 3
LlamaShears.Api.Web 37% 20% 348
LlamaShears.Hosting 26% 8% 27
LlamaShears.Core.Abstractions.Events 21% 6% 79
LlamaShears.Core.Abstractions.SystemPrompt 100% 100% 2
LlamaShears 65% 25% 11
LlamaShears.Core.Abstractions.PromptContext 89% 100% 2
LlamaShears.Provider.Onnx.Embeddings 4% 0% 68
LlamaShears.Plugins.Host 34% 24% 36
LlamaShears.Core.Abstractions.Agent 73% 100% 11
LlamaShears.Api 11% 3% 344
LlamaShears.Plugins 0% 100% 1
LlamaShears.Provider.OpenAI 2% 0% 209
LlamaShears.Core.Eventing.Extensions 100% 100% 1
LlamaShears.Core.Abstractions.Context 100% 100% 4
LlamaShears.Provider.Ollama 3% 1% 166
LlamaShears.Core 45% 31% 879
LlamaShears.Core.Abstractions.Content 0% 100% 1
LlamaShears.Core.Abstractions.Caching 100% 100% 1
LlamaShears.Core.Eventing 90% 73% 53
StrangeSoft.Plugins.Host 20% 21% 87
LlamaShears.Core.Abstractions.Commands 64% 100% 3
LlamaShears.Core.Abstractions.Provider 32% 20% 66
LlamaShears.Core.Abstractions.Memory 0% 100% 3
LlamaShears.Api.Web 25% 12% 348
LlamaShears.Hosting 26% 8% 27
LlamaShears.Core.Abstractions.Events 21% 6% 79
LlamaShears.Core.Abstractions.SystemPrompt 100% 100% 2
LlamaShears 65% 25% 11
LlamaShears.Core.Abstractions.PromptContext 89% 100% 2
LlamaShears.Provider.Onnx.Embeddings 4% 0% 68
LlamaShears.Plugins.Host 34% 24% 36
LlamaShears.Core.Abstractions.Agent 73% 100% 11
LlamaShears.Api 9% 1% 344
LlamaShears.Plugins 0% 100% 1
LlamaShears.Provider.OpenAI 2% 0% 209
LlamaShears.Core.Eventing.Extensions 100% 100% 1
LlamaShears.Core.Abstractions.Context 100% 100% 4
LlamaShears.Provider.Ollama 44% 24% 166
LlamaShears.Core 47% 44% 879
LlamaShears.Core.Abstractions.Content 0% 100% 1
LlamaShears.Core.Abstractions.Caching 100% 100% 1
LlamaShears.Core.Eventing 91% 84% 53
LlamaShears.Core.Abstractions.Commands 0% 100% 3
LlamaShears.Core.Abstractions.Provider 78% 64% 66
LlamaShears.Core.Abstractions.Memory 100% 100% 3
LlamaShears.Api.Web 1% 1% 348
LlamaShears.Hosting 33% 21% 27
LlamaShears.Core.Abstractions.Events 15% 3% 79
LlamaShears.Core.Abstractions.SystemPrompt 100% 100% 2
LlamaShears.Core.Abstractions.PromptContext 89% 100% 2
LlamaShears.Provider.Onnx.Embeddings 33% 36% 68
LlamaShears.Core.Abstractions.Agent 86% 100% 11
LlamaShears.Api 27% 29% 344
LlamaShears.Provider.OpenAI 60% 57% 209
LlamaShears.Core.Eventing.Extensions 100% 100% 1
LlamaShears.Analyzers 89% 76% 199
LlamaShears.Analyzers.CodeFixes 85% 69% 60
Summary 48% (8724 / 24055) 36% (1729 / 6790) 7401
Package | Line Rate | Branch Rate | Complexity | Health -------- | --------- | ----------- | ---------- | ------ LlamaShears.Core.Abstractions.Context | 100% | 100% | 4 | ✔ LlamaShears.Provider.Ollama | 3% | 1% | 166 | ❌ LlamaShears.IntegrationTests | 87% | 73% | 71 | ✔ LlamaShears.Core | 46% | 31% | 879 | ❌ LlamaShears.Core.Abstractions.Content | 0% | 100% | 1 | ❌ LlamaShears.Core.Abstractions.Caching | 100% | 100% | 1 | ✔ LlamaShears.Core.Eventing | 90% | 73% | 53 | ✔ StrangeSoft.Plugins.Host | 20% | 21% | 87 | ❌ LlamaShears.Core.Abstractions.Commands | 64% | 100% | 3 | ➖ LlamaShears.Core.Abstractions.Provider | 32% | 20% | 66 | ❌ LlamaShears.Core.Abstractions.Memory | 0% | 100% | 3 | ❌ LlamaShears.Api.Web | 37% | 20% | 348 | ❌ LlamaShears.Hosting | 26% | 8% | 27 | ❌ LlamaShears.Core.Abstractions.Events | 21% | 6% | 79 | ❌ LlamaShears.Core.Abstractions.SystemPrompt | 100% | 100% | 2 | ✔ LlamaShears | 65% | 25% | 11 | ➖ LlamaShears.Core.Abstractions.PromptContext | 89% | 100% | 2 | ✔ LlamaShears.Provider.Onnx.Embeddings | 4% | 0% | 68 | ❌ LlamaShears.Plugins.Host | 34% | 24% | 36 | ❌ LlamaShears.Core.Abstractions.Agent | 73% | 100% | 11 | ➖ LlamaShears.Api | 11% | 3% | 344 | ❌ LlamaShears.Plugins | 0% | 100% | 1 | ❌ LlamaShears.Provider.OpenAI | 2% | 0% | 209 | ❌ LlamaShears.Core.Eventing.Extensions | 100% | 100% | 1 | ✔ LlamaShears.Core.Abstractions.Context | 100% | 100% | 4 | ✔ LlamaShears.Provider.Ollama | 3% | 1% | 166 | ❌ LlamaShears.Core | 45% | 31% | 879 | ❌ LlamaShears.Core.Abstractions.Content | 0% | 100% | 1 | ❌ LlamaShears.Core.Abstractions.Caching | 100% | 100% | 1 | ✔ LlamaShears.Core.Eventing | 90% | 73% | 53 | ✔ StrangeSoft.Plugins.Host | 20% | 21% | 87 | ❌ LlamaShears.Core.Abstractions.Commands | 64% | 100% | 3 | ➖ LlamaShears.Core.Abstractions.Provider | 32% | 20% | 66 | ❌ LlamaShears.Core.Abstractions.Memory | 0% | 100% | 3 | ❌ LlamaShears.Api.Web | 25% | 12% | 348 | ❌ LlamaShears.Hosting | 26% | 8% | 27 | ❌ LlamaShears.Core.Abstractions.Events | 21% | 6% | 79 | ❌ LlamaShears.Core.Abstractions.SystemPrompt | 100% | 100% | 2 | ✔ LlamaShears | 65% | 25% | 11 | ➖ LlamaShears.Core.Abstractions.PromptContext | 89% | 100% | 2 | ✔ LlamaShears.Provider.Onnx.Embeddings | 4% | 0% | 68 | ❌ LlamaShears.Plugins.Host | 34% | 24% | 36 | ❌ LlamaShears.Core.Abstractions.Agent | 73% | 100% | 11 | ➖ LlamaShears.Api | 9% | 1% | 344 | ❌ LlamaShears.Plugins | 0% | 100% | 1 | ❌ LlamaShears.Provider.OpenAI | 2% | 0% | 209 | ❌ LlamaShears.Core.Eventing.Extensions | 100% | 100% | 1 | ✔ LlamaShears.Core.Abstractions.Context | 100% | 100% | 4 | ✔ LlamaShears.Provider.Ollama | 44% | 24% | 166 | ❌ LlamaShears.Core | 47% | 44% | 879 | ❌ LlamaShears.Core.Abstractions.Content | 0% | 100% | 1 | ❌ LlamaShears.Core.Abstractions.Caching | 100% | 100% | 1 | ✔ LlamaShears.Core.Eventing | 91% | 84% | 53 | ✔ LlamaShears.Core.Abstractions.Commands | 0% | 100% | 3 | ❌ LlamaShears.Core.Abstractions.Provider | 78% | 64% | 66 | ✔ LlamaShears.Core.Abstractions.Memory | 100% | 100% | 3 | ✔ LlamaShears.Api.Web | 1% | 1% | 348 | ❌ LlamaShears.Hosting | 33% | 21% | 27 | ❌ LlamaShears.Core.Abstractions.Events | 15% | 3% | 79 | ❌ LlamaShears.Core.Abstractions.SystemPrompt | 100% | 100% | 2 | ✔ LlamaShears.Core.Abstractions.PromptContext | 89% | 100% | 2 | ✔ LlamaShears.Provider.Onnx.Embeddings | 33% | 36% | 68 | ❌ LlamaShears.Core.Abstractions.Agent | 86% | 100% | 11 | ✔ LlamaShears.Api | 27% | 29% | 344 | ❌ LlamaShears.Provider.OpenAI | 60% | 57% | 209 | ➖ LlamaShears.Core.Eventing.Extensions | 100% | 100% | 1 | ✔ LlamaShears.Analyzers | 89% | 76% | 199 | ✔ LlamaShears.Analyzers.CodeFixes | 85% | 69% | 60 | ✔ **Summary** | **48%** (8724 / 24055) | **36%** (1729 / 6790) | **7401** | ❌ <!-- Sticky Pull Request Commentcoverage -->
copilot-pull-request-reviewer[bot] (Migrated from github.com) reviewed 2026-05-07 17:55:37 -04:00
copilot-pull-request-reviewer[bot] (Migrated from github.com) left a comment

Pull request overview

Adds a new OpenAI-compatible chat provider (/v1/chat/completions) intended to work across multiple backends by allowing pass-through “extras” to be merged into request bodies, and wires it into the API host.

Changes:

  • Introduces LlamaShears.Provider.OpenAI with an SSE-based streaming implementation and a provider factory that lists models via /v1/models.
  • Adds DI registration (AddOpenAIProvider) and wires it into AddApi().
  • Adds unit tests covering request shape, extras injection, tool flattening, streaming fragments, tool-call accumulation, and usage → completion.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/LlamaShears.UnitTests/Provider/OpenAI/OpenAILanguageModelTests.cs New unit tests for OpenAI-compatible request/stream behavior.
src/LlamaShears/appsettings.json Adds Providers:OpenAI:BaseUri configuration.
src/LlamaShears.Provider.OpenAI/OpenAIToolCallFragment.cs Tool-call fragment type for provider streaming.
src/LlamaShears.Provider.OpenAI/OpenAIThoughtFragment.cs Thought/reasoning fragment type for provider streaming.
src/LlamaShears.Provider.OpenAI/OpenAIServiceCollectionExtensions.cs DI registration for the OpenAI provider + named HttpClient.
src/LlamaShears.Provider.OpenAI/OpenAIResponseFragment.cs Text fragment type for provider streaming.
src/LlamaShears.Provider.OpenAI/OpenAIProviderOptions.cs Host/agent-configurable provider options, including extras blob.
src/LlamaShears.Provider.OpenAI/OpenAIProviderFactory.cs Provider factory + /v1/models model listing implementation.
src/LlamaShears.Provider.OpenAI/OpenAILanguageModel.cs Core /v1/chat/completions request building + SSE streaming parser and tool-call accumulation.
src/LlamaShears.Provider.OpenAI/OpenAICompletionFragment.cs Completion/usage fragment type for provider streaming.
src/LlamaShears.Provider.OpenAI/LlamaShears.Provider.OpenAI.csproj New provider project definition and dependencies.
src/LlamaShears.Api/WebApplicationBuilderExtensions.cs Registers OpenAI provider in the API host.
src/LlamaShears.Api/LlamaShears.Api.csproj Adds project reference to the new OpenAI provider.
LlamaShears.slnx Includes the new provider project in the solution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Pull request overview Adds a new OpenAI-compatible chat provider (`/v1/chat/completions`) intended to work across multiple backends by allowing pass-through “extras” to be merged into request bodies, and wires it into the API host. **Changes:** - Introduces `LlamaShears.Provider.OpenAI` with an SSE-based streaming implementation and a provider factory that lists models via `/v1/models`. - Adds DI registration (`AddOpenAIProvider`) and wires it into `AddApi()`. - Adds unit tests covering request shape, extras injection, tool flattening, streaming fragments, tool-call accumulation, and usage → completion. ### Reviewed changes Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments. <details> <summary>Show a summary per file</summary> | File | Description | | ---- | ----------- | | tests/LlamaShears.UnitTests/Provider/OpenAI/OpenAILanguageModelTests.cs | New unit tests for OpenAI-compatible request/stream behavior. | | src/LlamaShears/appsettings.json | Adds `Providers:OpenAI:BaseUri` configuration. | | src/LlamaShears.Provider.OpenAI/OpenAIToolCallFragment.cs | Tool-call fragment type for provider streaming. | | src/LlamaShears.Provider.OpenAI/OpenAIThoughtFragment.cs | Thought/reasoning fragment type for provider streaming. | | src/LlamaShears.Provider.OpenAI/OpenAIServiceCollectionExtensions.cs | DI registration for the OpenAI provider + named HttpClient. | | src/LlamaShears.Provider.OpenAI/OpenAIResponseFragment.cs | Text fragment type for provider streaming. | | src/LlamaShears.Provider.OpenAI/OpenAIProviderOptions.cs | Host/agent-configurable provider options, including extras blob. | | src/LlamaShears.Provider.OpenAI/OpenAIProviderFactory.cs | Provider factory + `/v1/models` model listing implementation. | | src/LlamaShears.Provider.OpenAI/OpenAILanguageModel.cs | Core `/v1/chat/completions` request building + SSE streaming parser and tool-call accumulation. | | src/LlamaShears.Provider.OpenAI/OpenAICompletionFragment.cs | Completion/usage fragment type for provider streaming. | | src/LlamaShears.Provider.OpenAI/LlamaShears.Provider.OpenAI.csproj | New provider project definition and dependencies. | | src/LlamaShears.Api/WebApplicationBuilderExtensions.cs | Registers OpenAI provider in the API host. | | src/LlamaShears.Api/LlamaShears.Api.csproj | Adds project reference to the new OpenAI provider. | | LlamaShears.slnx | Includes the new provider project in the solution. | </details> --- 💡 <a href="/jasoncouture/llama-shears/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
@ -0,0 +82,4 @@
// when finish_reason hits.
var toolCallAccumulator = new Dictionary<int, ToolCallAccumulator>();
int? totalTokens = null;
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2026-05-07 17:55:36 -04:00

Tool-call delta accumulation is keyed only by tool_calls[].index and ignores the OpenAI choices[].index. If a backend ever streams multiple choices (or a caller sets n > 1 via extras), tool-call deltas from different choices can collide (both commonly use tool index 0), producing corrupted tool calls. Consider tracking accumulators per choice index (e.g., Dictionary<int /*choice*/, Dictionary<int /*tool*/,...>>) or otherwise rejecting/forcing single-choice responses.

Tool-call delta accumulation is keyed only by `tool_calls[].index` and ignores the OpenAI `choices[].index`. If a backend ever streams multiple choices (or a caller sets `n > 1` via extras), tool-call deltas from different choices can collide (both commonly use tool index 0), producing corrupted tool calls. Consider tracking accumulators per choice index (e.g., `Dictionary<int /*choice*/, Dictionary<int /*tool*/,...>>`) or otherwise rejecting/forcing single-choice responses.
@ -0,0 +316,4 @@
// becomes part of an outgoing request; sharing nodes risks
// double-parenting on the next call.
target[key] = value?.DeepClone();
}
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2026-05-07 17:55:36 -04:00

MergeExtras currently performs a shallow overwrite (target[key] = ...) rather than a deep merge. This means extras that target nested objects (e.g., partially overriding stream_options or response_format) will replace the entire object, which conflicts with the documented/expected deep-merge behavior for option layering. Consider implementing a recursive merge when both existing and extra values are JsonObject (reusing the same semantics as AgentProviderOptions.Merge).

`MergeExtras` currently performs a shallow overwrite (`target[key] = ...`) rather than a deep merge. This means extras that target nested objects (e.g., partially overriding `stream_options` or `response_format`) will replace the entire object, which conflicts with the documented/expected deep-merge behavior for option layering. Consider implementing a recursive merge when both existing and extra values are `JsonObject` (reusing the same semantics as `AgentProviderOptions.Merge`).
@ -0,0 +331,4 @@
if (turn.ToolCall is { CallId: { Length: > 0 } callId })
{
message["tool_call_id"] = callId;
}
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2026-05-07 17:55:36 -04:00

For ModelRole.Tool turns, OpenAI’s chat-completions schema requires tool_call_id. This code omits tool_call_id when turn.ToolCall (or its CallId) is missing, which can make requests invalid when replaying older persisted tool turns (noted elsewhere in the codebase as possible). Consider skipping such tool turns when building messages, or throwing/logging clearly rather than sending an invalid role: tool message.

For `ModelRole.Tool` turns, OpenAI’s chat-completions schema requires `tool_call_id`. This code omits `tool_call_id` when `turn.ToolCall` (or its `CallId`) is missing, which can make requests invalid when replaying older persisted tool turns (noted elsewhere in the codebase as possible). Consider skipping such tool turns when building messages, or throwing/logging clearly rather than sending an invalid `role: tool` message.
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2026-05-07 17:55:37 -04:00

This appsettings entry hard-codes a non-localhost OpenAI-compatible endpoint. If appsettings.json is intended to be a safe default/sample configuration, consider using http://localhost:8080/ (or leaving it unset) to avoid unexpected outbound traffic in default deployments.

This appsettings entry hard-codes a non-localhost OpenAI-compatible endpoint. If `appsettings.json` is intended to be a safe default/sample configuration, consider using `http://localhost:8080/` (or leaving it unset) to avoid unexpected outbound traffic in default deployments.
Sign in to join this conversation.
No description provided.