Observability

GenAI Smart Router exposes operational signals for health, readiness, metrics, logs, usage reporting, and request-level diagnostics. These surfaces are designed to help operators answer two questions: whether callers are receiving reliable service, and which upstream provider/model paths explain latency, cost, errors, or fallback pressure.

Surfaces

Surface	Path or tool	Access model	Use
Health	`/healthz`	Deployment-controlled network access	Process liveness.
Readiness	`/readyz`	Deployment-controlled network access	License, config, and dependency readiness.
Version	`/version`	Deployment-controlled network access	Safe release version and build timestamp.
Metrics	`/metrics`	Caller subject with metrics authorization	Prometheus-style operational telemetry.
Usage reports	`router-usage-report`	Secure admin shell with DB access	Markdown usage, cost, latency, throughput, and rollup reports.
Browser reports	`/admin/reports/`	Admin authentication plus authorization	Authenticated cost, performance, cache, fallback, security, and drilldown views.
Request diagnostics	usage DB child tables	Admin/report authorization or DB access	Request attempts, trace events, terminal errors, request-shape and translation-shape telemetry, traffic-shaping events, and optional decision telemetry.

Ordinary application caller tokens must not receive /metrics or admin report data. They should receive 403 metrics-forbidden or 403 reports-forbidden when they are not authorized for those surfaces.

Request IDs

Every response includes X-Request-Id. Structured error bodies also include the request ID. Use it as the join key across logs, usage rows, attempts, trace events, terminal errors, and admin browser drilldown.

curl -i "$ROUTER_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $ROUTER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replace-with-allowed-model-group",
    "messages": [{"role": "user", "content": "Reply OK only."}],
    "max_tokens": 16
  }'

Preserve the X-Request-Id value when opening support tickets. Do not include raw prompts, images, tokens, provider keys, or full config files in tickets.

Metrics

Configure a dedicated metrics-admin caller and restrict /metrics to that subject. Metrics are global operational telemetry, not a tenant-scoped caller endpoint. Rejected or unknown caller-supplied model names are collapsed to bounded labels such as rejected_model; authorized model groups keep their configured group labels.

Example check:

curl -i -H "Authorization: Bearer $METRICS_ADMIN_TOKEN" \
  "$ROUTER_BASE_URL/metrics"

Expected for an ordinary application caller:

curl -i -H "Authorization: Bearer $ROUTER_TOKEN" \
  "$ROUTER_BASE_URL/metrics"

Response: 403 metrics-forbidden.

Logs

Router logs should be collected by the deployment logging system and retained under the customer's operational policy. Logs are for metadata and diagnostics; they must not contain raw bearer tokens, provider API keys, token hashes, raw prompts, raw images, raw tool outputs, full upstream headers, or full config files.

Useful log dimensions include:

request ID;
caller-safe identity fields;
requested model group;
selected provider/model/dialect;
status code and error class;
latency and upstream attempt count;
fallback state;
cache state;
safe license status;
safe quota or budget outcome.

Usage And Performance Reports

Use router-usage-report or /admin/reports/ to separate downstream caller experience from upstream provider behavior.

router-usage-report \
  --driver postgres \
  --dsn "$ROUTER_USAGE_DB_DSN" \
  --since 24h \
  --out usage-24h.md

Start investigations with:

errors and fallbacks;
downstream latency and throughput by caller, project, model group, and client;
upstream duration, TTFB, throughput, and error rate by provider/model/dialect;
quota, TPM/RPM, context, and max-token troubleshooting buckets;
recent request drilldown by request ID.

For report details, see Usage Reporting and Admin Browser Reports.

Alerting Checklist

At minimum, alert on:

/readyz failure;
elevated 5xx responses;
elevated router-side 429 quota/rate-limit responses;
elevated upstream provider 429/5xx attempts;
fallback rate above the deployment baseline;
request latency above the deployment service target;
metrics scrape failures;
license status entering grace, denied, expired, missing, or invalid states;
usage database write failures.

Tune thresholds by workload. Developer tools and agentic clients can produce large-context bursts that look different from chat or extraction workloads.

Troubleshooting Entry Points

Request Troubleshooting for one failed or slow request.
Licensing Troubleshooting for license-* errors or readiness failures.
Operational Troubleshooting for install, routing, quota, metrics, and provider issues.

Surfaces​

Request IDs​

Metrics​

Logs​

Usage And Performance Reports​

Alerting Checklist​

Troubleshooting Entry Points​