feat(provider): OpenAI-compatible chat provider with extras pass-through #43
No reviewers
Labels
No labels
bug
commercial
documentation
duplicate
enhancement
feature
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
jasoncouture/llama-shears!43
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/provider-openai"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
A single OpenAI-shaped provider that works against api.openai.com, llama-server, vLLM, LM Studio, TabbyAPI — anything that speaks
/v1/chat/completions. The two-birds-one-stone trick:OpenAIProviderOptions.ExtraRequestParams(aJsonObject) gets deep-merged into every request body, so vendor knobs (cache_prompt,slot_id,samplers,n_probs,min_p, vLLM'sguided_choice, etc.) round-trip without forking the provider per backend. Per-agent overrides layer on top via the existingAgentProviderOptions.Resolvemerge.Replaces the originally-planned standalone "llama.cpp native" provider since llama-server's
/v1/chat/completionsendpoint accepts both the OpenAI-spec fields and llama-server's native extras as the same call. Cuts a project in half and keeps tool calls working out of the box.Surface
OpenAILanguageModel : ILanguageModel. Hand-rolled SSE reader against/v1/chat/completions— no SDK dependency, so we can ship arbitrary extras without fighting a typed schema.indexand flush onfinish_reason: tool_calls, with an end-of-stream drain for backends that don't emit a finish reason.source__nameflatten convention as the Ollama provider so multi-MCP routing is identical across backends.content: [{type:text}, {type:image_url}]).delta.reasoning_content(or the olderreasoningfield) becomesIModelThoughtResponsefragments.usageblock on the trailer chunk produces a singleIModelCompletionResponsefragment.AddApi()next toAddOllamaProvider()so a host can use either via the agent'smodel.id.providername.Tests
8 new unit tests under
tests/LlamaShears.UnitTests/Provider/OpenAI/cover request shape, extras merge, tool flatten, streaming text fragments, reasoning routing, accumulated tool-call assembly, and usage → completion fragment. StubHttpMessageHandlercaptures the request and returns canned SSE.405/405 tests pass.
Out of scope
/completion" provider — the design discussion landed on "use/v1/chat/completionswith extras instead" and the TASKS entry will be revised post-merge.Test plan
dotnet test— green.OPENAI/gemma-4-26B-A4B-it-MXFP4_MOE.gguf, host pointed atcrapple.alertr.info:8080, end-to-end chat through the existing UI.🤖 Generated with Claude Code
Pull request overview
Adds a new OpenAI-compatible chat provider (
/v1/chat/completions) intended to work across multiple backends by allowing pass-through “extras” to be merged into request bodies, and wires it into the API host.Changes:
LlamaShears.Provider.OpenAIwith an SSE-based streaming implementation and a provider factory that lists models via/v1/models.AddOpenAIProvider) and wires it intoAddApi().Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
Providers:OpenAI:BaseUriconfiguration./v1/modelsmodel listing implementation./v1/chat/completionsrequest building + SSE streaming parser and tool-call accumulation.💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
@ -0,0 +82,4 @@// when finish_reason hits.var toolCallAccumulator = new Dictionary<int, ToolCallAccumulator>();int? totalTokens = null;Tool-call delta accumulation is keyed only by
tool_calls[].indexand ignores the OpenAIchoices[].index. If a backend ever streams multiple choices (or a caller setsn > 1via extras), tool-call deltas from different choices can collide (both commonly use tool index 0), producing corrupted tool calls. Consider tracking accumulators per choice index (e.g.,Dictionary<int /*choice*/, Dictionary<int /*tool*/,...>>) or otherwise rejecting/forcing single-choice responses.@ -0,0 +316,4 @@// becomes part of an outgoing request; sharing nodes risks// double-parenting on the next call.target[key] = value?.DeepClone();}MergeExtrascurrently performs a shallow overwrite (target[key] = ...) rather than a deep merge. This means extras that target nested objects (e.g., partially overridingstream_optionsorresponse_format) will replace the entire object, which conflicts with the documented/expected deep-merge behavior for option layering. Consider implementing a recursive merge when both existing and extra values areJsonObject(reusing the same semantics asAgentProviderOptions.Merge).@ -0,0 +331,4 @@if (turn.ToolCall is { CallId: { Length: > 0 } callId }){message["tool_call_id"] = callId;}For
ModelRole.Toolturns, OpenAI’s chat-completions schema requirestool_call_id. This code omitstool_call_idwhenturn.ToolCall(or itsCallId) is missing, which can make requests invalid when replaying older persisted tool turns (noted elsewhere in the codebase as possible). Consider skipping such tool turns when building messages, or throwing/logging clearly rather than sending an invalidrole: toolmessage.This appsettings entry hard-codes a non-localhost OpenAI-compatible endpoint. If
appsettings.jsonis intended to be a safe default/sample configuration, consider usinghttp://localhost:8080/(or leaving it unset) to avoid unexpected outbound traffic in default deployments.