Model Group Contracts

Model group contracts let an operator publish stable, deployment-defined model groups while changing the provider/model mix behind them. A contract describes the workloads, API shapes, minimum capabilities, validation quality floor, and operational targets that a group is intended to satisfy.

Callers still request a model group they are allowed to use. /v1/models remains the caller-facing source of truth for allowed groups. Contract metadata does not let a caller route into another group.

Example

models:
  support-chat:
    strategy: weighted
    contract:
      display_name: Support chat
      caller_visible_notes: Low-latency support-answering group.
      intended_workloads: [support_chat]
      supported_api_shapes: [openai_chat]
      required_capabilities:
        input_modalities: [text]
        output_modalities: [text]
        honors_max_tokens_when_caller_capped: true
      quality_floor:
        require_tags: [validated]
        min_eval_quality_score: 0.90
        min_eval_pass_rate: 0.95
        max_eval_age_days: 30
        allowed_validation_status: [passed]
      operational_targets:
        max_p95_latency_ms: 10000
        max_error_rate: 0.03
        max_timeout_rate: 0.02
      reporting:
        expose_workload_labels: true
        expose_quality_floor_bucket: true
    targets:
      - provider: private-gpu
        model_ref: support-balanced
        weight: 70
        tags: [validated, low_latency]
        validation:
          status: passed
          workload: support_chat
          validated_at: "2026-06-25"
          quality_score: 0.94
          pass_rate: 0.98
          harness: golden-support-set
      - provider: hosted
        model_ref: support-fallback
        weight: 30
        tags: [validated, fallback]
        validation:
          status: passed
          workload: support_chat
          validated_at: "2026-06-25"
          quality_score: 0.92
          pass_rate: 0.96
          harness: golden-support-set

Names such as support-chat, private-gpu, and hosted are examples. Use names that match your deployment policy.

Enforcement Order

The router enforces contracts after caller authorization and ordinary request eligibility:

Authenticate the caller.
Confirm the requested model group is allowed for that caller.
Filter only that group’s targets for API dialect, tools, structured outputs, input modalities, and explicit output-cap safety.
Apply the optional group contract and target validation quality floor.
Run static, weighted, failover, dynamic_score, script, or external on the remaining targets.

Contract enforcement is group-local. A cheaper, faster, or more capable target in another group is never selected unless the caller requested that other allowed group.

Contract Fields

supported_api_shapes accepts openai_chat, openai_responses, and anthropic_messages.

required_capabilities can require tools, forced tool choice, structured outputs, input/output modalities, minimum context tokens, and safe handling of explicit caller output caps.

quality_floor can require target tags, target validation status, minimum quality score, minimum pass rate, and a maximum validation age.

operational_targets can enforce observed error rate, timeout rate, p95 latency, and output-token throughput when the router has observations for a target.

Target validation metadata is safe scalar metadata about how a target was tested. Keep validation notes free of prompts, images, tool outputs, bearer tokens, provider keys, token hashes, private headers, and full config snippets.

Strategy Interaction

Contracts are not a separate strategy. They filter the target list before the configured strategy runs.

dynamic_score can use target tags and target validation quality/pass-rate metadata as scoring hints after hard contract floors have passed.

TypeScript and external policy strategies receive safe contract and validation metadata plus the contract-filtered eligible target list. Returned decisions are validated against that list and fail closed if they choose an ineligible target.

Errors And Reports

If no target satisfies the request and contract, callers receive 502 no-eligible-target. The error details include safe requirement buckets such as:

contract-required-api-shape
contract-required-modality
contract-required-tools
contract-quality-floor
contract-validation-expired
contract-no-validated-target

Usage and admin reports expose safe scalar buckets: contract present, pass/fail bucket, failure reason, optional workload label, validation status, validation workload, and validation age bucket. They do not include raw prompts, raw images, tool outputs, bearer tokens, provider keys, token hashes, private upstream headers, or full config values.

Rollout

Start with a deployment-defined test group. Add validation metadata to each intended target, run representative text, tool-agent, VLM/OCR, structured-output, and low-token-cap requests that match the declared contract, then confirm selected targets stay inside the requested group.

Rollback is a config change: remove or relax the contract block, loosen a specific quality floor or operational threshold, refresh stale validation metadata after retesting, or restore the previous strategy and weights.

Example​

Enforcement Order​

Contract Fields​

Strategy Interaction​

Errors And Reports​

Rollout​