Skip to main content

Customer-Controlled Routing

GenAI Smart Router is not a black-box global model chooser. A caller requests one allowed, deployment-defined model group, and the deployment owner controls which providers, private upstreams, capabilities, policies, validation gates, and fallback paths exist inside that group.

Use this page as the routing-policy ownership map. Start with the Routing Strategy Decision Tree when choosing between static, failover, weighted, dynamic score, TypeScript, external, or contract-backed routing. The detailed reference pages remain the source of truth for router config, model metadata, dynamic score routing, TypeScript routing, external policy services, model-group contracts, PII filtering, usage reporting, and deployment readiness.

Model group names are deployment-defined. Examples on this page use names such as regulated-review, agent-coding, adaptive-support, and team-local-coding only as examples. Callers should discover their allowed groups from /v1/models.

Routing Contract

Every request follows the same ownership boundary:

  1. The caller authenticates with a router token.
  2. Token, user, project, and environment policy determine which model groups the caller may request.
  3. The caller requests one model group.
  4. The router loads only that group's configured targets.
  5. Request-shape filters remove ineligible targets for API shape, tools, modalities, structured outputs, reasoning or thinking controls, explicit output-cap safety, and any model-group contract.
  6. The group's configured strategy chooses from the remaining targets.
  7. Usage and reporting record request-time provider, model, cost, latency, throughput, attempt, fallback, and policy evidence without exposing raw provider keys or router tokens.

The router does not route across unauthorized groups and does not activate provider catalog entries unless they appear under models.<group>.targets[].

Strategy Selection

NeedStrategy / mechanismGood fitCaution
Always use one targetstaticRegulated workload, fixed-model evaluation, smoke groupNo provider diversity
Ordered fallbackfailoverReliability with a preferred provider firstWatch total timeout and retry budget
Weighted mixweightedGradual rollout, cost mix, provider diversityNeeds monitoring and rollback criteria
Fast configurable scoringdynamic_scoreCost, latency, reliability, request-shape, and evaluation scoring with scalar termsKeep terms lightweight and group-local
Custom logic in configTypeScript scriptDeployment-specific routing using safe contextScripts are trusted deployment code
Separate policy serviceexternalEnterprise policy engine, sidecar, or ML scorerSecure network, auth, timeout, and fail-closed behavior are required
Hard capability promiseModel-group contractQuality and capability floor per groupStale validation or strict floors can remove all targets

These mechanisms can be combined carefully. For example, a group can use dynamic_score after a model-group contract filters stale validation, or an external policy service can receive only contract-filtered eligible targets.

Customer-Owned Controls

ControlWhere it livesOwnership decision
Group membership and weightsmodels.<group>.targets[]Which validated provider/model targets can serve the group and at what relative weight
Provider keys and private upstreamsproviders, deployment secrets, private base URLsWhich external or enterprise-hosted services are reachable
Caller/user/project accesscallers[].allow, users, projects, memberships, admin policyWhich teams can see and request each group through /v1/models
Quotas and rate limitsCaller and token budget configHow much traffic a key, owner, project, or environment may send
Capabilities and validation metadataProvider catalog, target overrides, validation, contractWhich request shapes and quality gates a target may satisfy
PII filteringmodels.<group>.pii_filterWhether text is redacted, restored, or blocked before routing policy and upstream calls
Cache behaviorServer cache config and request cache controlsWhich responses may be reused and which requests bypass cache
Retention policyUsage DB, logs, rollups, retention settingsHow long safe scalar telemetry and reports are kept
Reports and evaluation labelsUsage reporting, decision telemetry, validation harness labelsWhich evidence supports promotion, rollback, and chargeback
License and feature boundariesSigned license plus runtime configWhich product capabilities the deployment may enable

Examples

The snippets below use real config field names but are intentionally abbreviated. Add the corresponding providers, caller access, pricing, metadata, validation details, and secrets in the full deployment config.

Fixed-Control Group

Use static when a team wants one upstream target for evaluation, regulated review, or a known-good rollback path.

models:
regulated-review:
strategy: static
targets:
- provider: enterprise_openai_compatible
model_ref: reviewed-model

Failover Group

Use failover when target order is the policy. The first eligible target is tried first; later targets are retry fallbacks for retryable upstream failures.

models:
support-primary:
strategy: failover
targets:
- provider: private_vllm
model_ref: support-model
- provider: hosted_openai_compatible
model_ref: support-fallback
- provider: backup_openai_compatible
model_ref: support-backup

Weighted Coding-Agent Group

Use weighted when every active target has passed the workload gate and the deployment wants a controlled provider mix. Tool-only target overrides let ordinary text and agent tool traffic use different eligible targets inside the same group.

models:
agent-coding:
strategy: weighted
targets:
- provider: hosted_openai_compatible
model_ref: coding-balanced
weight: 65
- provider: private_gpu
model_ref: coding-private
weight: 25
- provider: hosted_responses_skin
model_ref: tool-capable-model
weight: 10
dialect: openai-responses
tool_only: true
- provider: hosted_anthropic_skin
model_ref: tool-capable-model
weight: 10
tool_only: true

Dynamic-Score Group

Use dynamic_score when a group should adapt within its own eligible target list using configured scalar signals. Keep scoring terms understandable and tied to the group contract.

models:
adaptive-support:
strategy: dynamic_score
targets:
- provider: hosted_openai_compatible
model_ref: support-balanced
weight: 60
tags: [validated, support]
validation:
status: passed
workload: support_chat
validated_at: "2026-06-25"
quality_score: 0.94
pass_rate: 0.98
harness: golden-support-set
- provider: private_vllm
model_ref: support-low-latency
weight: 40
tags: [validated, low_latency]
validation:
status: passed
workload: support_chat
validated_at: "2026-06-25"
quality_score: 0.92
pass_rate: 0.97
harness: golden-support-set
routing_policy:
dynamic_score:
cold_start_policy: configured_weight
min_observations: 20
observation_window_seconds: 600
max_score_adjustment_percent: 50
signals:
request_shape: { enabled: true }
observed_performance: { enabled: true }
cost: { enabled: true }
evaluation_metadata: { enabled: true }
score_terms:
- name: cheapest_fast_enough
expression: "0.40 * cost_score + 0.30 * latency_score + 0.20 * reliability_score + 0.10 * eval_quality_score"

TypeScript Script Policy

Use strategy: script when policy should run inside the router process and can be packaged with deployment config. This tested pattern routes large normalized request text to a heavier target inside the requested group.

models:
script-sized:
strategy: script
script: scripts/router.ts
targets:
- { provider: hosted_openai_compatible, model_ref: compact, tier: compact, weight: 70 }
- { provider: hosted_openai_compatible, model_ref: long-context, tier: long_context, weight: 30 }
export function route(ctx) {
const preferredTier = ctx.text.length > 8000 ? "long_context" : "compact";
const entries = ctx.targets.map((target, index) => ({ target, index }));
const preferred = entries.find((entry) => entry.target.tier === preferredTier) || entries[0];
return {
targetIndex: preferred.index,
fallbackIndexes: entries.filter((entry) => entry.index !== preferred.index).map((entry) => entry.index),
classLabel: `prompt-size:${preferredTier}`,
};
}

Scripts receive safe caller and target metadata, not raw provider keys, raw router tokens, or token hashes. If a group enables pii_filter, the script context is built from the redacted request.

External Policy Service

Use strategy: external when policy belongs in a trusted standalone service. The service receives already eligible targets and returns a target decision; the router validates that decision before any upstream call.

models:
policy-service-group:
strategy: external
external_policy:
url: https://routing-policy.internal.example/route
allow_hosts: [routing-policy.internal.example]
timeout_ms: 500
max_response_bytes: 65536
headers:
Authorization: ${ROUTING_POLICY_AUTH_HEADER}
on_error: fail_closed
include_request: false
targets:
- { provider: hosted_openai_compatible, model_ref: compact, tier: compact, weight: 70 }
- { provider: private_vllm, model_ref: long-context, tier: long_context, weight: 30 }

Abbreviated policy request shape:

{
"group": "policy-service-group",
"context": {
"estimatedTokens": 2200,
"textChars": 9200,
"imageCount": 0,
"toolCount": 1,
"maxTokens": 512,
"stream": false
},
"caller": {
"id": "team-prod",
"project": "product",
"environment": "prod",
"allow": ["policy-service-group"]
},
"targets": [
{"provider": "hosted_openai_compatible", "modelRef": "compact", "tier": "compact", "weight": 70},
{"provider": "private_vllm", "modelRef": "long-context", "tier": "long_context", "weight": 30}
]
}

Response shape:

{
"targetIndex": 1,
"fallbackIndexes": [0],
"classLabel": "prompt-size:long-context"
}

By default, the policy request does not include prompt text, messages, image URLs/data, tool schemas, tool outputs, or request.raw. Enable external_policy.include_request: true only for a trusted service that is allowed to receive request content.

Hierarchical Topology

Some enterprises separate central governance from team-local strategy by running more than one router instance. This is topology guidance, not a one-command feature.

application or agent
-> team router group: team-local-coding
- local private GPU target
- central enterprise router as an OpenAI-compatible upstream
-> central router group: enterprise-approved-coding
- centrally approved hosted and private targets

The team router still exposes deployment-defined local groups to its callers. The central router still enforces its own caller tokens, allowed groups, provider keys, routing strategy, telemetry, and model-group contracts.

Proof Workflow

Policy changes should be evidence-driven:

StepEvidence
Pre-change baselineCurrent selected providers/models, cost/request, p95 latency, fallback rate, error rate, and outcome score
Safe config summaryGroup-local diff showing changed targets, weights, strategies, contracts, policy URL, or script path without secrets
Representative evaluationWorkload-specific tests such as unit tests, extraction accuracy, OCR targets, tool-call correctness, browser-control tasks, golden datasets, product acceptance tests, or agent benchmarks
Direct upstream smokesText, tool, image, reasoning, streaming, and output-cap checks for the exact provider/model/dialect/skin being claimed
Router smokesAuthenticated requests through each caller API shape plus negative no-eligible-target and forbidden-group checks
Report windowUsage report or admin report showing selected target mix, latency, throughput, cost, attempts, fallbacks, and safe policy labels
Rollback triggerConcrete threshold for disabling a target, reducing weight, relaxing a contract, or reverting to a simpler strategy

See Model Group Quality Criteria, Evaluate Smart Router, Deployment Readiness, Operational Acceptance, and Cost Governance for rollout and acceptance planning.

What Not To Do

  • Do not treat provider catalog entries as active routing weights. Catalog metadata becomes traffic only when a target is listed under models.<group>.targets[].
  • Do not route across unauthorized groups. Caller allow lists and /v1/models are the caller-facing boundary.
  • Do not claim tool, image, reasoning, structured-output, or max-token behavior from provider marketing copy alone. Validate the exact upstream and router skin before adding active traffic.
  • Do not hardcode reference names such as default, fast, small, medium, high, big-coder, or vision as product-required group names in public docs or client code.
  • Do not store raw prompts, responses, tool outputs, images, bearer tokens, provider keys, token hashes, full policy payloads, or full deployment config as diagnostics.
  • Do not optimize only for cost. Promote cheaper routes only when the group still meets the workload outcome, latency, reliability, and security criteria.