Hosted Quickstart
Use the router endpoint for your deployment with a router-issued caller token. The token authenticates the caller to Metrum GenAI Smart Router; provider keys stay server-side. The product can run on prem, in an enterprise cloud account, or as a Metrum-managed instance.
From Getting Started, you might be looking for CLI guides: see Codex CLI or Claude Code CLI.
These docs are built into the hosted GenAI Smart Router server delivered for your deployment. Examples that show the router base URL use this browser origin, so on this deployment they render as https://your-router.example.com and https://your-router.example.com/v1.
Need an on-prem deployment, enterprise cloud deployment, or Metrum-managed instance? Email contact@metrum.ai.
Environment
export ROUTER_BASE_URL="https://your-router.example.com"
export ROUTER_TOKEN="rtr_metrum_<user>_<project>_<env>_<key>_<secret>"
export ROUTER_MODEL="<allowed-model-group>"
Model group names are deployment-defined. If this hosted deployment exposes example names such as default, fast, small, medium, high, big-coder, or vision, treat them as deployment policy names, not product-required names.
Model Discovery
If you are not sure which ROUTER_MODEL value to use, call /v1/models with your router token first. The response is filtered by that token's allow list and returns the deployment-defined model groups you can request.
curl "$ROUTER_BASE_URL/v1/models" -H "Authorization: Bearer $ROUTER_TOKEN"
Use one returned id as the model value in chat, Responses, Messages, or CLI requests. If a caller token is limited to a smaller set of groups, other groups will not be listed and cannot be requested. See Available Models And Access for response examples and troubleshooting.
Successful output is a JSON list whose data[].id values are the model groups you may request.
OpenAI-Compatible Chat
curl "$ROUTER_BASE_URL/v1/chat/completions" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"messages": [
{"role": "user", "content": "Reply with exactly: router ok"}
]
}'
Successful output includes one assistant message with router ok.
OpenAI Responses
Use /v1/responses for Responses-compatible clients and agent frameworks.
curl "$ROUTER_BASE_URL/v1/responses" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"input": "Reply with exactly: router ok",
"max_output_tokens": 32
}'
Successful output includes text equivalent to router ok.
Anthropic-Compatible Messages
curl "$ROUTER_BASE_URL/v1/messages" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"max_tokens": 32,
"messages": [
{"role": "user", "content": "Reply with exactly: router ok"}
]
}'
Successful output includes one content block with router ok.
Image Input
Image requests use the same deployment-defined router model-group names. The router filters image-bearing requests to targets that have passed vision validation and advertise image in input_modalities.
The examples below use a realistic VLM response budget because image acceptance tests should not be judged with a tiny output cap. Use smaller caps only when explicitly testing cap enforcement.
OpenAI-compatible chat:
curl "$ROUTER_BASE_URL/v1/chat/completions" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Read the receipt. Reply with only the merchant name."},
{"type": "image_url", "image_url": {"url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}}
]
}],
"max_tokens": 512,
"stream": false
}'
OpenAI Responses:
curl "$ROUTER_BASE_URL/v1/responses" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"input": [{
"role": "user",
"content": [
{"type": "input_text", "text": "Read the receipt. Reply with only the merchant name."},
{"type": "input_image", "image_url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}
]
}],
"max_output_tokens": 512,
"stream": false
}'
Anthropic-compatible Messages:
curl "$ROUTER_BASE_URL/v1/messages" -H "Authorization: Bearer $ROUTER_TOKEN" -H "Content-Type: application/json" -d '{
"model": "'"$ROUTER_MODEL"'",
"max_tokens": 512,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Read the receipt. Reply with only the merchant name."},
{"type": "image", "source": {"type": "url", "url": "https://cdn.learnopencv.com/wp-content/uploads/2018/06/04100007/receipt.png"}}
]
}]
}'
Image-bearing requests bypass response caching. Usage logs include image count, image-token counts when the upstream reports them, calculated VLM cost, and upstream-reported billed cost when available.
For dedicated vision route configuration, pricing metadata, and CLI image smokes, see Image Analysis And VLM Routing.
Python Client
The router accepts OpenAI-compatible SDK traffic. This example creates an isolated uv project and calls the hosted router through /v1/chat/completions. Install uv first if it is not available in your shell.
mkdir smart-router-python-example
cd smart-router-python-example
uv init --bare
uv add openai
cat > smoke.py <<'PY'
import os
from openai import OpenAI
router_base_url = os.environ["ROUTER_BASE_URL"].rstrip("/")
router_token = os.environ["ROUTER_TOKEN"]
router_model = os.environ["ROUTER_MODEL"]
client = OpenAI(
base_url=f"{router_base_url}/v1",
api_key=router_token,
)
response = client.chat.completions.create(
model=router_model,
messages=[
{"role": "user", "content": "Reply with exactly: router ok"},
],
max_tokens=32,
)
print(response.choices[0].message.content)
PY
export ROUTER_BASE_URL="https://your-router.example.com"
export ROUTER_TOKEN="rtr_metrum_<user>_<project>_<env>_<key>_<secret>"
export ROUTER_MODEL="<allowed-model-group>"
uv run python smoke.py
The same token and model-group rules apply to SDK calls: /v1/models shows only the groups that the token is allowed to use, and provider credentials remain server-side.
Common First-Run Issues
| Symptom | Meaning | Next step |
|---|---|---|
401 or 403 before model discovery | Token or endpoint mismatch | Confirm ROUTER_BASE_URL and ROUTER_TOKEN with the deployment administrator. |
Expected group missing from /v1/models | The token allow list does not include that group | Use a listed group or request access. |
403 model-not-allowed | Requested model group is not allowed for the token | Use one of the returned /v1/models IDs. |
502 no-eligible-target for tools or images | The group is allowed but lacks a validated target for that request shape | Ask for a compatible group or target. |
503 upstream-quota-exhausted | All eligible upstream providers reported exhausted balance, credits, quota, billing, or payment state | Retry after the administrator restores provider quota or include the request ID when escalating. |
503 upstream-rate-limited | All eligible upstream providers were rate limited | Retry with backoff or include the request ID when escalating. |
503 upstream-capacity-throttled | Provider/model/target shared shaping or adaptive backoff temporarily removed all otherwise eligible targets | Retry after the Retry-After window when present or include the request ID when escalating. |
504 upstream-timeout | The selected upstream did not finish before timeout | Retry a smaller task or ask the administrator to inspect attempts. |
See Error Reference for the full structured error list.