Skip to main content

API Compatibility

GenAI Smart Router exposes OpenAI-compatible and Anthropic-compatible HTTP surfaces so clients can keep familiar SDKs while routing, provider credentials, policy, quotas, and accounting stay server-side.

The router endpoint is deployment-specific. Use the base URL and model groups issued by your administrator or Metrum-managed instance.

Supported Surfaces

EndpointCompatibility targetTypical clients
/v1/chat/completionsOpenAI Chat Completions-style requestsOpenAI SDK chat clients, Warp-style OpenAI-compatible agents
/v1/responsesOpenAI Responses-style requestsCodex CLI, Responses-compatible agent frameworks
/v1/messagesAnthropic Messages-style requestsClaude Code CLI, Anthropic-compatible clients
/v1/modelsOpenAI-style model discoveryClient setup and allow-list discovery
/v1/usageRouter usage lookupCaller quota and usage checks
/readyz, /healthz, /versionRouter operational endpointsLoad balancers and operators
/admin/auth/checkBrowser-admin Basic Auth validation stubOperators enabling browser-admin surfaces
/admin/auth/login, /admin/auth/callback, /admin/auth/me, /admin/auth/logoutBrowser-admin OIDC session routesOperators enabling OIDC browser admin surfaces
/admin/reports/*Router-specific admin reportsAuthorized administrators

/metrics is an operator telemetry API. It requires a caller token whose subject is authorized for metrics read; existing metrics_admin: true caller entries receive compatible Casbin grants at startup.

Caller tokens are checked by SHA-256 hash. Unknown or missing tokens return 401 unauthorized. Configured inactive keys return safe status-specific 403 errors after token match, including key-disabled, key-suspended, key-expired, and key-rotated. Config validation requires every enabled key to reference an active owner_user, active project, and active project membership, so inactive users/projects/memberships are caught before startup.

Content-capture maintenance endpoints are administrative APIs, not model APIs. DELETE /v1/content-captures/<request_id> requires content:capture delete authorization in the captured row's caller project/environment domain; POST /v1/content-captures/purge-expired requires content:capture purge. Existing content_admin: true caller entries receive compatible Casbin grants for their own domain. These endpoints never return captured content.

/admin/auth/check is not a model API. It is available only when server.admin_auth.basic.enabled: true; missing or invalid HTTP Basic credentials return 401, valid credentials without the route permission return 403 admin-forbidden, and valid credentials with admin:auth:read return safe subject metadata.

OIDC admin auth routes are not model APIs. They are available only when server.admin_auth.oidc.enabled: true. Login redirects to the IdP, callback creates a server-side session after OIDC verification, /admin/auth/me returns safe subject metadata, and logout invalidates the session. Excess pending login starts from one client return 429 oidc-login-rate-limited responses.

/admin/reports/* is not a model API. It is disabled unless server.admin_reports.enabled: true, uses Basic Auth or OIDC sessions for browser-admin identity, and uses Casbin policy decisions for read/export access. Report data is scoped to the admin's Casbin domain unless an explicit * policy domain grants deployment-wide access. Ordinary caller tokens receive 403 reports-forbidden.

Conformance Test Matrix

Router API compatibility is protected by a deterministic conformance suite in addition to live provider smokes. Run the focused suite before changing request parsing, upstream encoding, tool routing, structured outputs, reasoning controls, streaming behavior, max-token handling, or provider-hosted tool policy:

go test ./internal/router -run 'TestAPIDialectConformanceMatrix|TestOpenAIChatConformance|TestResponsesConformance'
SurfaceWhat the conformance suite proves
OpenAI Chat CompletionsPlain text, caller streaming normalization, max_tokens, max_completion_tokens, same-dialect tool passthrough, tool_choice, JSON-schema response_format, and reasoning_effort forwarding when the selected target supports reasoning.
OpenAI Responsesmax_output_tokens, same-dialect function/namespace tool passthrough, JSON-schema text.format, generic hosted search/image descriptor stripping, and remote provider-hosted tool rejection before upstream.
Anthropic MessagesMessage payload encoding, caller max_tokens, and thinking forwarding when the selected target supports Anthropic token-budget reasoning.

This suite uses mock upstreams and does not prove a real provider/model is entitled, fast, accurate, or compatible with every workload. Activating an upstream still requires direct provider smokes and router-level smokes for the exact provider, model, dialect, tools, images, structured-output, reasoning, and max-token behavior being advertised.

Agent compatibility should also be validated with realistic synthetic request shapes. The production-derived smoke matrix exercises Codex Responses reasoning/tools, Cursor Chat tools and bridge shapes, Claude Code Messages thinking/tools, opencode/aider Chat flows, large tool schemas, provider-skin mismatch, no-eligible diagnostics, and upstream error classification. Run it against a dedicated smoke group, for example reasoning-bridge-smoke, with a caller token that is explicitly allowed to that group:

python3 scripts/prod_smoke_regressions.py --mode prod --fixture all --model-group reasoning-bridge-smoke

The smoke emits safe scalar proof only: request IDs, API surface, status, selected provider/model/dialect, bridge direction when recorded, translated reasoning control when recorded, and request-shape buckets.

Compatibility Matrix

CapabilityChat CompletionsResponsesMessages
Text input/outputSupportedSupportedSupported
StreamingSupported when the selected target supports the provider pathSupported when the selected target supports the provider pathSupported when the selected target supports the provider path
Tool callsRequires tool_support.openai_chatRequires tool_support.openai_responsesRequires tool_support.anthropic_messages
Structured outputsresponse_format requires tool_support.openai_chat: [structured_outputs]text.format requires tool_support.openai_responses: [structured_outputs]No OpenAI structured-output equivalent
Reasoning/thinkingreasoning_effort requires target reasoning metadatareasoning requires target reasoning metadatathinking requires target reasoning metadata or validated target default thinking
Image inputRequires image in target input_modalitiesRequires image in target input_modalitiesRequires image in target input_modalities
Caller max-token capsmax_tokens and max_completion_tokens are enforced against configured target metadatamax_output_tokens is enforced against configured target metadatamax_tokens is enforced against configured target metadata
Cache eligibilityEligible only for deterministic non-tool, non-image requestsEligible only for deterministic non-tool, non-image requestsEligible only for deterministic non-tool, non-image requests
Usage and cost rowsRecordedRecordedRecorded

If a request includes tools, structured-output fields, images, or an explicit max-token cap, the router filters the model group's target list before policy selection. Targets that do not satisfy the request shape are skipped. If no compatible target remains, the router returns 502 no-eligible-target before sending an upstream request.

API Bridges

Deployments can expose a validated Chat Completions upstream to /v1/responses callers through an explicit stateless Responses-to-Chat bridge. This is useful for Codex or Responses-compatible clients when a model is only validated through OpenAI Chat Completions.

The bridge is never automatic. A target must keep dialect openai-chat and opt in with responses_to_chat metadata. The first supported slice is non-streaming text and basic function tools. The router maps Responses input and instructions to Chat messages, function tools to Chat tools, tool_choice only when validated, and max_output_tokens to the Chat output cap field configured for the target. If responses_to_chat.reasoning: true and the target's reasoning metadata is compatible, Responses reasoning.effort maps to Chat reasoning_effort; otherwise the target is skipped before upstream. The Chat response is returned to the caller as a Responses-shaped object.

Unsupported Responses features are rejected or skipped before upstream for Chat-bridged targets. Stateless bridge targets do not support previous_response_id; provider-hosted tools such as file search, code interpreter, computer use, MCP/SSE, hosted search, and image generation are not sent through the bridge. Images, reasoning, structured output, and streaming require separate bridge flags and validation before use.

Usage and diagnostics show both sides: inbound_dialect = openai-responses, target_dialect = openai-chat, and request_translation_shapes.bridge_direction = responses_to_chat. Reasoning-preserving attempts also record a safe translated_reasoning_control value such as reasoning_effort.

For OpenAI Chat Completions requests, both max_tokens and max_completion_tokens are treated as explicit output caps. If a Chat request sends both fields, max_tokens takes precedence for router eligibility and normalized upstream forwarding.

The same model group can therefore expose different effective upstream pools to different API surfaces. A Chat client can use only active Chat-compatible targets, a Responses client can use only active Responses-compatible targets, and a Messages client can use only active Anthropic-compatible targets unless the deployment has configured and documented an explicit bridge. Provider catalog metadata for another skin is not enough by itself; the active target's resolved skin controls eligibility.

Chat To Responses Bridge

Deployments can opt a target into an OpenAI Chat Completions to OpenAI Responses bridge. This lets a caller keep using POST /v1/chat/completions while the router calls a selected openai-responses upstream target. The bridge is config-driven and target-specific; Chat requests never route to Responses targets unless bridges.chat_to_responses.enabled: true is present on the resolved target metadata.

The first supported bridge slice covers non-streaming text and basic function-tool requests. The router maps Chat messages into Responses input, system/developer messages into instructions, Chat function tools into Responses function tools, max_tokens or max_completion_tokens into max_output_tokens, and Responses text/function-call output back into Chat completion shape. If bridges.chat_to_responses.reasoning: true and the target's reasoning metadata is compatible, Chat reasoning_effort maps to Responses reasoning.effort; otherwise the target is skipped before upstream with a bounded filter reason. Usage rows keep inbound_dialect = openai-chat and target_dialect = openai-responses, and translation diagnostics record bridge_direction = chat_to_responses.

By default the bridge is stateless: every Chat request is translated as a complete Responses request. Deployments can opt a target into in-memory stateful sessions with bridges.chat_to_responses.stateful_sessions.enabled: true. When the caller sends the configured session header, for example X-Router-Session: case-123, the router stores the successful upstream Responses id in memory and injects it as previous_response_id on the next request for the same caller, model group, target, and session header value. If the upstream returns a compatible stale-state 4xx for that injected ID, the router deletes the hashed mapping and retries that same request once without previous_response_id. Requests without the header remain stateless. The router does not expose or persist the raw session header value, and response caching is bypassed for stateful bridge requests.

Streaming bridge requests are rejected before upstream unless the target explicitly validates and enables bridge streaming. Images, structured outputs, reasoning controls, forced or parallel tool modes, and other advanced fields require matching bridge metadata and target capability metadata. Stateful sessions currently use the memory backend and are intended for single-process deployments; multi-replica deployments need sticky routing or a future shared backend before enabling this feature. The inverse Responses-to-Chat bridge has separate metadata and behavior.

Common safe filter reasons include chat-to-responses-bridge-disabled, chat-to-responses-streaming-unsupported, chat-to-responses-tools-unsupported, chat-to-responses-tool-choice-unsupported, chat-to-responses-structured-output-unsupported, and chat-to-responses-image-unsupported.

Reasoning And Bridge Compatibility

Caller endpointNative reasoning fieldSame-dialect targetBridge target
/v1/chat/completionsreasoning_effortRequires active OpenAI Chat target reasoning metadata.Chat-to-Responses reasoning requires bridges.chat_to_responses.reasoning: true plus Responses target reasoning metadata.
/v1/responsesreasoningRequires active OpenAI Responses target reasoning metadata.Responses-to-Chat reasoning is unsupported unless responses_to_chat.reasoning is explicitly validated for the target.
/v1/messagesthinkingRequires active Anthropic Messages target reasoning/default-thinking metadata.No general Messages bridge is implied by Chat or Responses bridge metadata.

Tools and reasoning are filtered together. A request with tools and reasoning needs a target that supports both the caller's tool dialect and the requested reasoning control, or an explicitly validated bridge for both features. If no candidate remains, the router returns 502 no-eligible-target; candidate/filter diagnostics should show bounded reasons such as reasoning, chat-to-responses-reasoning-unsupported, or responses-to-chat-reasoning rather than an upstream attempt with the reasoning field stripped.

Bridge requests remain stateless unless a Chat-to-Responses target enables stateful sessions. A stateless bridge does not synthesize previous_response_id continuity for reasoning workflows. OpenAI Responses callers that send previous_response_id to a Chat-bridged target should expect a bridge filter reason unless that exact stateful behavior is documented for the target.

Quotas And Output Caps

Before an upstream call, the router reserves the estimated input tokens plus the requested output budget for token-based admission. Chat Completions requests use max_tokens or max_completion_tokens, Responses requests use max_output_tokens, and Messages requests use max_tokens. Messages requests without a caller cap reserve the router default output cap when the router injects one.

TPM, daily token, monthly token, and lifetime key budgets include in-flight reservations. This prevents several concurrent large-cap requests from collectively exceeding a caller's budget. When a request completes, the reservation is reconciled to the actual usage reported by the upstream. Failed or canceled upstream requests release the reservation, and cache hits do not consume persisted token quota.

Optional caller traffic shaping can also smooth short bursts before upstream calls. It is separate from hard TPM/RPM/quota admission and can return 429 traffic-shaped with Retry-After and a safe bucket label when request-start or token-reservation throughput is exceeded.

Use realistic output caps in examples and clients. A small prompt with a very large output cap can be rejected near a token budget because the caller asked the router to reserve that much possible output.

Model Names

The model field is a router model group, not necessarily a provider model ID. Model group names are deployment-defined. Names shown in examples are examples only.

If a compatible API request omits model, the router uses server.default_model_group when configured. If no default is configured, the router returns 400 missing-model.

Discover Allowed Model Groups

Call /v1/models with the same router token that the client will use for completions. The response is filtered to that token's allow list, so it shows the deployment-defined model groups the caller can request.

curl "$ROUTER_BASE_URL/v1/models" \
-H "Authorization: Bearer $ROUTER_TOKEN"

Example response:

{
"object": "list",
"data": [
{
"id": "default",
"object": "model",
"owned_by": "smart-llmrouter"
},
{
"id": "vision",
"object": "model",
"owned_by": "smart-llmrouter"
}
]
}

Use one of the returned id values as the model field in /v1/chat/completions, /v1/responses, or /v1/messages. If a group is not listed, that token is not allowed to use it. Requests for unlisted groups fail with 403 model-not-allowed before any upstream provider is called.

The returned IDs are router model groups, not a full inventory of every upstream provider model. Platform teams can change the upstream provider/model mix behind a group without changing the caller-facing group name.

For caller-facing troubleshooting and administrator handoff guidance, see Available Models And Access.

Tool Calls

Tool requests only route to upstream targets that explicitly advertise support for the caller's API dialect and tool mode.

Caller shapeRequired target metadata
OpenAI Chat toolstool_support.openai_chat
OpenAI Responses function toolstool_support.openai_responses
Anthropic Messages client toolstool_support.anthropic_messages

Provider skins are part of this contract. A target configured through an OpenAI Chat provider is not a Responses target unless the resolved provider or target dialect is openai-responses. For deployments that validate the same upstream model through multiple APIs, use separate provider skins such as minimax, minimax_responses, and minimax_anthropic rather than relying on the upstream model name alone.

Tool-bearing requests bypass response caching because tool results depend on external shell, filesystem, browser, or client tool state.

OpenAI Responses provider-hosted tools such as Fireworks-documented mcp and sse tools are not the same as client-executed function or namespace tools. By default, the router rejects caller-supplied remote provider-hosted entries such as mcp, sse, file-search, code-interpreter, and computer-use tools with 400 provider-hosted-tools-forbidden before any upstream call. Generic hosted search or image-generation descriptors from compatible clients are stripped unless the deployment explicitly exposes those hosted services. Deployments should expose provider-hosted tools only after a separate security design covers allowlisted hosts, timeouts, network egress, and data-retention expectations.

The router controls upstream persistence policy for OpenAI-compatible requests. Same-dialect Chat Completions and Responses passthrough strip caller-supplied provider metadata; they send store:false upstream only when the resolved target sets force_store_false: true. Translated Responses calls use the same flag. Chat passthrough also honors target encoding metadata such as output_token_field: max_completion_tokens for upstreams that require max_completion_tokens instead of max_tokens.

Large OpenAI Chat coding-agent payload compatibility is a separate claim from ordinary text or tool support. Before a provider/model/dialect joins broad IDE or agent routes, validate representative request bytes, message count, serialized tool-schema size, explicit output cap, token scale, and router translation path with a synthetic fixture. If the target is not validated for that shape, keep it in a smoke group or configure request_shape_support limits so incompatible large requests skip it before upstream.

Reasoning And Thinking

The router detects explicit reasoning requests in all supported caller dialects:

  • OpenAI Chat Completions: reasoning_effort.
  • OpenAI Responses: reasoning.
  • Anthropic Messages: thinking.

Reasoning is handled inside the requested model group. The router does not switch callers to another group and does not silently drop explicit reasoning controls. If no configured target in that group can satisfy the requested reasoning shape together with tools, images, structured outputs, and max-token cap behavior, the response is 502 no-eligible-target.

For compatible targets, the router translates safe controls where configured. For example, an Anthropic budget can map to an OpenAI effort level, and an OpenAI effort can map to an Anthropic token budget. Targets that reject max_tokens for reasoning traffic can be configured so the router sends max_completion_tokens instead.

For /v1/models, reasoning metadata is effective group metadata, not a catalog dump. The router should expose supported_reasoning_levels only when at least one active target in the requested group can actually serve that reasoning shape for the caller's API surface. Some upstreams also enforce minimum output budgets for reasoning requests; a tiny cap failure does not invalidate the reasoning capability, but it must be documented and tested with realistic budgets.

For OpenAI Chat, OpenAI Responses, and Anthropic Messages reasoning examples, see Reasoning Routing.

Structured Outputs

Structured-output requests are routing contracts, not router-side schema execution. The router detects OpenAI Chat response_format and OpenAI Responses text.format, selects only targets with explicit dialect-matching structured_outputs metadata, and forwards the schema payload to the selected upstream. It does not validate arbitrary JSON Schema subsets or repair provider output unless a separate implementation adds that behavior. Unsupported schemas, strictness settings, or provider-specific JSON Schema subsets may produce upstream/provider errors.

Structured-output support is dialect-specific. Passing Chat Completions response_format does not prove Responses text.format, and Anthropic Messages has no OpenAI structured-output equivalent unless a deployment adds and documents an explicit compatible behavior.

Chat Completions JSON Schema example:

curl "$ROUTER_BASE_URL/v1/chat/completions" \
-H "Authorization: Bearer $ROUTER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "example-structured",
"messages": [{"role": "user", "content": "Extract the ticket id and priority from: INC-1234 high"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "ticket_extract",
"strict": true,
"schema": {
"type": "object",
"properties": {
"ticket_id": {"type": "string"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["ticket_id", "priority"],
"additionalProperties": false
}
}
}
}'

Responses JSON Schema example:

curl "$ROUTER_BASE_URL/v1/responses" \
-H "Authorization: Bearer $ROUTER_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "example-structured",
"input": "Extract the ticket id and priority from: INC-1234 high",
"text": {
"format": {
"type": "json_schema",
"name": "ticket_extract",
"strict": true,
"schema": {
"type": "object",
"properties": {
"ticket_id": {"type": "string"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]}
},
"required": ["ticket_id", "priority"],
"additionalProperties": false
}
}
}
}'

Use a deployment-defined model group returned by /v1/models; example-structured is only a placeholder group name.

Image Inputs

Image-bearing requests are accepted through Chat Completions, Responses, and Messages shapes. The router selects only targets with image in input_modalities.

Text-only and image-capable work do not need separate user workflows. A deployment can put text-capable and vision-capable upstreams behind the same model group, as long as each request is routed only to targets that satisfy its actual requirements.

Router-Only Endpoints

Router-only endpoints are not part of OpenAI or Anthropic compatibility:

  • /readyz and /healthz report service health and runtime build metadata for operational checks.
  • /version returns the running binary version, build timestamp, Go runtime version, OS, architecture, and internal build identifiers for administrators.
  • /v1/usage returns usage/quota information for the authenticated caller.
  • /metrics returns Prometheus telemetry only for metrics-admin tokens.

Use SDKs for the compatible provider-style APIs they support. Router-only endpoints such as /readyz, /version, and /v1/usage are best called with ordinary HTTP clients.