Hosted Quickstart

Use the router endpoint for your deployment with a router-issued caller token. The token authenticates the caller to Metrum GenAI Smart Router; provider keys stay server-side. The product can run on prem, in an enterprise cloud account, or as a Metrum-managed instance.

From Getting Started, you might be looking for CLI guides: see Codex CLI or Claude Code CLI.

These docs are built into the hosted GenAI Smart Router server delivered for your deployment. Examples that show the router base URL use this browser origin, so on this deployment they render as https://your-router.example.com and https://your-router.example.com/v1.

Need an on-prem deployment, enterprise cloud deployment, or Metrum-managed instance? Email contact@metrum.ai.

Environment

export ROUTER_BASE_URL="https://your-router.example.com"
export ROUTER_TOKEN="rtr_metrum_<user>_<project>_<env>_<key>_<secret>"
export ROUTER_MODEL="<allowed-model-group>"

Model group names are deployment-defined. If this hosted deployment exposes example names such as default, fast, small, medium, high, big-coder, or vision, treat them as deployment policy names, not product-required names.

Model Discovery

If you are not sure which ROUTER_MODEL value to use, call /v1/models with your router token first. The response is filtered by that token's allow list and returns the deployment-defined model groups you can request.

curl "$ROUTER_BASE_URL/v1/models" -H "Authorization: Bearer $ROUTER_TOKEN"

Use one returned id as the model value in chat, Responses, Messages, or CLI requests. If a caller token is limited to a smaller set of groups, other groups will not be listed and cannot be requested. See Available Models And Access for response examples and troubleshooting.

Successful output is a JSON list whose data[].id values are the model groups you may request.

OpenAI-Compatible Chat

curl "$ROUTER_BASE_URL/v1/chat/completions" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "messages": [
    {"role": "user", "content": "Reply with exactly: router ok"}
  ]
}'

Successful output includes one assistant message with router ok.

OpenAI Responses

Use /v1/responses for Responses-compatible clients and agent frameworks.

curl "$ROUTER_BASE_URL/v1/responses" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "input": "Reply with exactly: router ok",
  "max_output_tokens": 32
}'

Successful output includes text equivalent to router ok.

Anthropic-Compatible Messages

curl "$ROUTER_BASE_URL/v1/messages" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "max_tokens": 32,
  "messages": [
    {"role": "user", "content": "Reply with exactly: router ok"}
  ]
}'

Successful output includes one content block with router ok.

Image Input

Image requests use the same deployment-defined router model-group names. The router filters image-bearing requests to targets that have passed vision validation and advertise image in input_modalities.

The examples below use a realistic VLM response budget because image acceptance tests should not be judged with a tiny output cap. Use smaller caps only when explicitly testing cap enforcement.

OpenAI-compatible chat:

curl "$ROUTER_BASE_URL/v1/chat/completions" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Read the receipt. Reply with only the merchant name."},
      {"type": "image_url", "image_url": {"url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}}
    ]
  }],
  "max_tokens": 512,
  "stream": false
}'

OpenAI Responses:

curl "$ROUTER_BASE_URL/v1/responses" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "input": [{
    "role": "user",
    "content": [
      {"type": "input_text", "text": "Read the receipt. Reply with only the merchant name."},
      {"type": "input_image", "image_url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}
    ]
  }],
  "max_output_tokens": 512,
  "stream": false
}'

Anthropic-compatible Messages:

curl "$ROUTER_BASE_URL/v1/messages" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
  "model": "'"$ROUTER_MODEL"'",
  "max_tokens": 512,
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Read the receipt. Reply with only the merchant name."},
      {"type": "image", "source": {"type": "url", "url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}}
    ]
  }]
}'

Image-bearing requests bypass response caching. Usage logs include image count, image-token counts when the upstream reports them, calculated VLM cost, and upstream-reported billed cost when available.

For dedicated vision route configuration, pricing metadata, and CLI image smokes, see Image Analysis And VLM Routing.

Python Client

The router accepts OpenAI-compatible SDK traffic. This example creates an isolated uv project and calls the hosted router through /v1/chat/completions. Install uv first if it is not available in your shell.

mkdir smart-router-python-example
cd smart-router-python-example
uv init --bare
uv add openai

cat > smoke.py <<'PY'
import os
from openai import OpenAI

router_base_url = os.environ["ROUTER_BASE_URL"].rstrip("/")
router_token = os.environ["ROUTER_TOKEN"]
router_model = os.environ["ROUTER_MODEL"]

client = OpenAI(
  base_url=f"{router_base_url}/v1",
  api_key=router_token,
)

response = client.chat.completions.create(
  model=router_model,
  messages=[
      {"role": "user", "content": "Reply with exactly: router ok"},
  ],
  max_tokens=32,
)

print(response.choices[0].message.content)
PY

export ROUTER_BASE_URL="https://your-router.example.com"
export ROUTER_TOKEN="rtr_metrum_<user>_<project>_<env>_<key>_<secret>"
export ROUTER_MODEL="<allowed-model-group>"
uv run python smoke.py

The same token and model-group rules apply to SDK calls: /v1/models shows only the groups that the token is allowed to use, and provider credentials remain server-side.

Common First-Run Issues

Symptom	Meaning	Next step
`401` or `403` before model discovery	Token or endpoint mismatch	Confirm `ROUTER_BASE_URL` and `ROUTER_TOKEN` with the deployment administrator.
Expected group missing from `/v1/models`	The token allow list does not include that group	Use a listed group or request access.
`403 model-not-allowed`	Requested model group is not allowed for the token	Use one of the returned `/v1/models` IDs.
`502 no-eligible-target` for tools or images	The group is allowed but lacks a validated target for that request shape	Ask for a compatible group or target.
`503 upstream-quota-exhausted`	All eligible upstream providers reported exhausted balance, credits, quota, billing, or payment state	Retry after the administrator restores provider quota or include the request ID when escalating.
`503 upstream-rate-limited`	All eligible upstream providers were rate limited	Retry with backoff or include the request ID when escalating.
`503 upstream-capacity-throttled`	Provider/model/target shared shaping or adaptive backoff temporarily removed all otherwise eligible targets	Retry after the `Retry-After` window when present or include the request ID when escalating.
`504 upstream-timeout`	The selected upstream did not finish before timeout	Retry a smaller task or ask the administrator to inspect attempts.

See Error Reference for the full structured error list.

Environment​

Model Discovery​

OpenAI-Compatible Chat​

OpenAI Responses​

Anthropic-Compatible Messages​

Image Input​

Python Client​

Common First-Run Issues​