Request Troubleshooting

Every request returns an X-Request-Id header. Use that ID to join caller symptoms to logs, usage rows, upstream attempts, trace events, terminal errors, and browser report drilldown. The Diagnostics Schema documents the safe columns available for request-level triage.

1. Capture The Caller View

Record these safe fields:

UTC timestamp;
request ID;
HTTP status;
router error code, if present;
requested model group;
API shape, such as Chat Completions, Responses, or Anthropic Messages;
client name, such as Codex CLI, Claude Code, Cursor, or an internal service;
whether the request used streaming, tools, images, large input context, or a large output cap.

Do not record raw prompts, image payloads, bearer tokens, provider keys, tool outputs, or full request bodies unless a governed content-capture process is explicitly enabled for the deployment.

2. Check Caller Access

curl -i -H "Authorization: Bearer $ROUTER_TOKEN" \
  "$ROUTER_BASE_URL/v1/models"

Expected: the requested model group appears in the response. If it is absent, the caller token is not allowed to use that group or the group is not configured.

Common access outcomes:

Status	Meaning	Operator action
`401`	Missing, malformed, expired, or invalid caller token.	Issue or rotate the caller token.
`403`	Caller is authenticated but not authorized for the surface or model group.	Review caller access, admin policy, or metrics/report role.
`403 metrics-forbidden`	Ordinary caller attempted `/metrics`.	Use a metrics-admin token only for metrics scraping.
`403 reports-forbidden`	Ordinary caller attempted admin reports.	Use an authorized admin report identity.

3. Open Request Evidence

With an authorized admin report identity, open the safe evidence bundle:

curl -u admin:<password> \
  "$ROUTER_BASE_URL/admin/reports/api/request-evidence?request_id=<request_id>"

The bundle shows what the router safely knew and recorded: caller/project/client labels, requested model group, resolved group, selected target, stored request-time token and cost fields, latency/throughput, quota/key/cache state, traffic-shaping state, target candidate/filter summaries, attempts, sanitized upstream errors, and trace rows when those sections exist.

Use diagnosticCompleteness and evidenceSections to interpret gaps. missing on a failed request means a diagnostic section expected for that phase was not recorded; not_applicable means the request path did not reach that phase or the feature was disabled.

Evidence bundles do not expose raw prompts, image URLs or payloads, tool schemas, tool outputs, provider API keys, router bearer tokens, token hashes, full upstream headers, unsanitized upstream bodies, cookies, OIDC tokens, or full config.

4. Check Quota And Token Admission

Router-side quota, traffic-shaping, and admission failures usually return 429. Distinguish them from upstream provider 429 attempts:

Router hard-limit failures such as rpm-exceeded, tpm-exceeded, concurrency-exceeded, and quota-exhausted appear as the terminal caller response before any upstream attempt.
traffic-shaped appears as a terminal caller response with a safe bucket and Retry-After when a configured caller/server shaping bucket limits the burst.
Upstream 429 attempts may be followed by fallback to another target.
Terminal upstream failures include safe X-Router-Error-Class, X-Upstream-Status, error.details.error_class, and error.details.upstream_status fields. Use them to separate upstream 400/429 provider responses from router-side 429 policy responses.
Large-context developer tools can exhaust TPM through in-flight reservations even when daily or monthly budget remains available.

Use usage reports or admin browser troubleshooting buckets for quota, TPM/RPM, concurrency, traffic-shaping bucket, input-token, and max-token signals. For exact field names and retention classes, see the Diagnostics Schema.

Use Traffic tuning advisor when the question is whether to increase burst, change queueing, slow a caller, tune provider capacity, or route around an incompatible target:

router-usage-report \
  --driver postgres \
  --dsn "$ROUTER_USAGE_DB_DSN" \
  --since 24h \
  --traffic-tuning-advisor \
  --caller-user <owner-user>

Examples:

User sees errors but all shaping buckets were admitted or absent: treat route_around_incompatible_target as a request-shape/provider compatibility issue. Inspect upstream failures and request-shape failures instead of increasing burst or queue depth.
User is being queued and cancellations increased: treat disable_queue_for_latency_sensitive_client as a signal to lower queue wait or fail fast for that client.
Provider 429s affect multiple users: treat investigate_provider_429_capacity as shared capacity or entitlement work. Tune provider/model shaping, adaptive backoff, route weights, or upstream account limits before increasing one caller's burst.
Provider 401/403/404 access failures are different from caller-token errors. Invalid provider credentials are terminal for the request; target-specific entitlement, region/project, policy, or model-access failures may route around to another eligible target. If all attempts fail this way, callers receive 503 upstream-access-denied with a request ID.
Large Cursor, Codex, Claude Code, or opencode payloads fail on selected upstreams: compare request-shape failure buckets, output-cap buckets, tool/modality metadata, and provider/model/dialect rows, then route around targets that cannot handle that shape.

5. Check Upstream Attempts

For a slow or failed request, inspect:

selected provider/model/dialect;
upstream status code;
upstream duration and TTFB;
timeout or cancellation flags;
retryability;
fallback transitions;
sanitized terminal error class.

If only one provider/model/dialect is failing, isolate that upstream before changing the broader model group. If all targets are failing, inspect shared config, network, license, database, or caller request shape.

6. Check Request Shape

Common request-shape causes:

requested model group does not support the API skin used by the client;
tool calls are sent to a target without validated tool support;
image input is sent to a text-only target;
estimated input plus output cap exceeds target context limits;
request bytes or tool schema bytes exceed configured target request-shape limits;
a forced tool-choice shape is unsupported by the selected upstream;
streaming behavior differs from the caller expectation.

Model-group contracts and provider catalog metadata should describe validated modalities, tools, dialects, pricing, and max-token behavior.

For missing reasoning controls, first call /v1/models with the same caller token. A reasoning-enabled group advertises supported_reasoning_levels and default_reasoning_level. If those fields are absent, check whether the caller is allowed to the group, whether the reasoning target is active under models.<group>.targets[], and whether the active target skin matches the client surface: OpenAI Chat reasoning_effort, OpenAI Responses reasoning, or Anthropic Messages thinking. Catalog-only metadata does not make a group reasoning-capable.

For “small requests work but large Cursor/Codex/Claude Code requests fail,” inspect the request drilldown or usage DB rows for request_token_estimates, request_target_candidates, and request_target_filter_reasons. Safe fields to compare are estimated total input tokens, requested output cap, total reserved tokens, request bytes, target context_tokens, context headroom, request_bytes_fit, tool_schema_fit, and bounded reasons such as request-shape-context-exceeded, request-shape-max-request-bytes, or request-shape-tool-schema-bytes. Operators can replay a sanitized production-derived shape with scripts/prod_smoke_regressions.py when the deployment has a safe smoke caller and report DB access. The reference config uses large-openai-chat-tools-smoke for this validation path; grant authorized validation callers access to that smoke group for production or staging reruns instead of changing a broad production coding group. These diagnostics intentionally do not contain raw prompts, raw tool schemas, images, bearer tokens, token hashes, provider keys, or full config.

For agent compatibility regressions, run the production-derived fixture matrix against a dedicated smoke group that the smoke caller is allowed to use:

python3 scripts/prod_smoke_regressions.py \
  --mode prod \
  --fixture all \
  --model-group reasoning-bridge-smoke

The fixture matrix covers Codex Responses reasoning/tools, Cursor Chat tools and bridge shapes, Claude Code Messages thinking/tools, opencode/aider Chat flows, large tool schemas, provider-skin mismatch, no-eligible diagnostics, upstream entitlement/fallback, and upstream error classification. Use the emitted request IDs to compare selected target, bridge direction, translated reasoning control, attempts, fallback, and sanitized error class in reports. If the deployment lacks a smoke group, caller access, or report DB access, record that as the blocker rather than changing an active production group solely for the test.

For Cursor-style OpenAI Chat requests with tools and an image, check whether any target in the requested group supports both OpenAI Chat tools and image input. If not, the router should return 502 no-eligible-target with zero upstream attempts. Candidate/filter rows should show safe reasons such as input-modality-image or dialect-tool-passthrough.

For OpenAI Chat requests that are intended to use a Responses-only target, confirm the selected group has a target with dialect: openai-responses and bridges.chat_to_responses.enabled: true. Text requests need only the bridge opt-in. Tool requests also need bridge tools: true plus target tool_support.openai_responses; tool_choice, parallel tool calls, structured output, images, reasoning, and streaming each need matching bridge metadata. Reasoning requests also need compatible target reasoning metadata. If stateful sessions are enabled, the caller must send the configured session header and the deployment should be single-process or sticky-routed because the current backend is in-memory. Common filter reasons include chat-to-responses-bridge-disabled, chat-to-responses-streaming-unsupported, chat-to-responses-tools-unsupported, chat-to-responses-tool-choice-unsupported, chat-to-responses-reasoning-unsupported, and chat-to-responses-image-unsupported.

When the bridge is selected, request evidence should show inbound_dialect as openai-chat, target/attempt dialect as openai-responses, bridge_direction = chat_to_responses, and a translation-shape row for /v1/responses. Stateful session validation should show a first upstream request without previous_response_id and a second same-session upstream request with the prior Responses id; operational traces use safe event names such as bridge_session_requested and bridge_session_previous_response_applied. If an injected continuation is stale, expected trace events are bridge_session_previous_response_stale_purged and bridge_session_stateless_retry. Use translation field events for safe field names and actions only; they intentionally do not store prompts, tool schemas, tool outputs, images, session header values, bearer tokens, token hashes, provider keys, or full config.

If every target is skipped, callers receive 502 no-eligible-target before upstream with a request ID. Recovery is usually a config change: add accurate context_tokens or request_shape_support, remove a too-small target from the affected group, or keep the target in a smoke group until a large-payload validation passes.

For /v1/responses requests that should be able to use Chat-only upstreams, check whether the target has explicit responses_to_chat metadata for the requested shape. Chat-bridged targets skip unsupported fields before upstream. Common safe filter reasons include responses-to-chat-bridge-disabled, responses-to-chat-previous-response-id, responses-to-chat-hosted-tools, responses-to-chat-tool-choice, responses-to-chat-image, responses-to-chat-reasoning, responses-to-chat-structured-output, and responses-to-chat-streaming. A successful bridge keeps the caller-facing response in Responses format while usage shows inbound openai-responses, target openai-chat, and bridge direction responses_to_chat; reasoning-preserving bridge attempts also show translated_reasoning_control = reasoning_effort.

For reasoning bridge failures, use the request ID to compare four safe fields: inbound dialect, selected or skipped target dialect, bridge_direction, and translated_reasoning_control. A native Chat reasoning pass should show reasoning_effort; native Responses should show reasoning; native Messages should show thinking. Chat-to-Responses reasoning should additionally show bridge_direction = chat_to_responses and requires bridges.chat_to_responses.reasoning: true. Responses-to-Chat reasoning should fail with responses-to-chat-reasoning unless the target explicitly validates responses_to_chat.reasoning.

Examples of request-shape confusion:

Codex usually sends /v1/responses; a group with only Chat targets needs a validated Responses-to-Chat bridge, and previous_response_id is not supported by the stateless bridge.
Claude Code sends /v1/messages; a Chat or Responses reasoning target does not satisfy Anthropic thinking unless an Anthropic-compatible target is active.
Cursor, opencode, and aider may use OpenAI Chat or Anthropic-compatible shapes depending on client configuration. If a client sends a Responses-style reasoning object to /v1/chat/completions, troubleshoot the stored Chat shape and bridge metadata rather than assuming a Responses target was used.

7. Verify Recovery

After a config, credential, quota, or upstream fix:

curl -fsS "$ROUTER_BASE_URL/readyz"

curl -fsS "$ROUTER_BASE_URL/v1/chat/completions" \
  -H "Authorization: Bearer $ROUTER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "replace-with-allowed-model-group",
    "messages": [{"role": "user", "content": "Reply OK only."}],
    "max_tokens": 16
  }'

Then confirm the request appears in usage reports with the expected provider/model, status, latency, cost, and fallback state.

1. Capture The Caller View​

2. Check Caller Access​

3. Open Request Evidence​

4. Check Quota And Token Admission​

5. Check Upstream Attempts​

6. Check Request Shape​

7. Verify Recovery​