Deploy To Kubernetes
Use Kubernetes when GenAI Smart Router needs to run inside a customer-managed cluster with cluster-native ingress, Secrets, external Postgres, and operational controls. This is the canonical Kubernetes installation page; post-deployment topology guidance lives in Enterprise Deployment Patterns.
Metrum maintains raw Kustomize-friendly manifests under deploy/kubernetes/ as a production-oriented starting point. The manifests are examples. Review them against your cluster's ingress controller, network policy engine, storage class, registry, and secret-management process before production rollout. If you are installing from a package that does not include deploy/kubernetes/, obtain the matching Kubernetes manifest bundle from Metrum or from the reviewed source artifact for that release.
Prerequisites
- A Kubernetes cluster with an ingress controller and TLS automation or a separate TLS termination plan.
- A private registry image tag such as
registry.example.com/smart-llmrouter:<version>-linux-amd64. - External Postgres for the usage database.
- A Metrum-issued
license.json. - Provider credentials stored in a Kubernetes Secret or external secret manager.
- A router config reviewed for the deployment's model groups, callers, admin auth, and reporting settings.
Image And Architecture
Release Docker packages include per-architecture image tarballs:
| Node architecture | Package image tag example |
|---|---|
amd64 / x86_64 | <version>-linux-amd64 |
arm64 / Graviton | <version>-linux-arm64 |
For private clusters, load the image tar into nodes or push it to a private registry:
docker load -i images/smart-llmrouter-<version>-linux-amd64.tar
docker tag smart-llmrouter:<version>-linux-amd64 registry.example.com/smart-llmrouter:<version>-linux-amd64
docker push registry.example.com/smart-llmrouter:<version>-linux-amd64
Mixed-architecture clusters need separate per-architecture tags or a registry-managed multi-architecture manifest. Do not use a per-architecture tarball as if it were a multi-architecture image.
Runtime Contract
A Kubernetes deployment needs:
| Area | Required design |
|---|---|
| Namespace | A deployment-owned namespace with least-privilege RBAC. |
| Router config | A reviewed ConfigMap or mounted config artifact for config.yaml, depending on the customer's config-handling policy. |
| Provider keys | A Secret or external secret integration that injects provider credentials as env vars or env.json. |
| License | A Secret containing the Metrum-issued license.json, mounted at the path configured in server.license.path. |
| State | Durable license state and router state when the deployment design requires file-backed state. |
| Usage database | External Postgres is recommended for production Kubernetes deployments. |
| Workload | A Deployment for stateless router pods unless the state design requires a different controller. |
| Network | Service, Ingress or Gateway, TLS, and NetworkPolicy for clients, admin surfaces, database, and upstream providers or private model services. |
| Health | Readiness on /readyz and liveness on /healthz. |
| Resources | Requests and limits sized for request concurrency, streaming traffic, and admin report queries. |
Do not put provider keys, raw router tokens, token hashes, private signing material, or full production configs into public manifests, public docs, issue comments, or screenshots.
Manifests
When the release includes Kubernetes examples, the base manifests live in:
deploy/kubernetes/base/
deploy/kubernetes/overlays/example/
They include:
NamespaceandServiceAccountwith service account token mounting disabled;ConfigMapfor non-secret router config;- placeholder
Secretexample for provider env, license JSON, and Postgres DSN; Deploymentwith/readyzreadiness/startup probes and/healthzliveness probe;Service, exampleIngress,NetworkPolicy,PersistentVolumeClaim, andPodDisruptionBudget;- an example overlay for image and ingress replacement.
The suggested layout is:
namespace/
router-config ConfigMap or mounted config artifact
provider-keys Secret or external secret reference
license Secret
router Deployment
router Service
router Ingress or Gateway
NetworkPolicy
The router container should run the packaged image tag for the target release, not latest. The config should point to mounted paths such as:
server:
listen: ":8080"
license:
enabled: true
path: /app/config/license.json
state_path: /app/state/license-state.json
usage_db:
enabled: true
driver: postgres
dsn: ${ROUTER_USAGE_DB_DSN}
state_path: /app/state/router-state.json
Secrets And Config
Create deployment-owned secrets before applying the router workload. Do not commit real secret files.
kubectl create namespace smart-llmrouter
kubectl -n smart-llmrouter create secret generic smart-llmrouter-secrets \
--from-literal=ROUTER_USAGE_DB_DSN='postgres://llmrouter:replace-with-password@postgres.example.internal:5432/llmrouter?sslmode=require' \
--from-file=env.json=./env.json \
--from-file=license.json=./license.json
The router config is mounted at /app/config/config.yaml. Provider keys are mounted at /app/config/env.json. The signed license is mounted read-only at /app/config/license.json. Durable license and router state are written under /app/state.
For production, use an external secret manager or sealed-secret workflow if that is the cluster standard. Keep raw provider keys, router tokens, token hashes, license files, and DSNs out of tickets, screenshots, and public docs.
Usage Database And State
Use external Postgres for production usage reporting:
server:
usage_db:
enabled: true
driver: postgres
dsn: ${ROUTER_USAGE_DB_DSN}
The example uses a PVC for file-backed router and license state. Back up the PVC or move state to a deployment-approved durable store if that is supported by your router version. Keep the initial deployment at one replica unless the state, quota, and license behavior has been validated for the chosen scaling design.
Deploy
The base kustomization intentionally does not apply secret.example.yaml; create smart-llmrouter-secrets through your secret-management process first, then render and inspect the manifests before applying them:
kubectl kustomize deploy/kubernetes/overlays/example > /tmp/smart-llmrouter.yaml
kubectl apply --dry-run=server -f /tmp/smart-llmrouter.yaml
kubectl apply -f /tmp/smart-llmrouter.yaml
If server-side dry-run is unavailable, use client-side dry-run as a syntax check:
kubectl apply --dry-run=client -f /tmp/smart-llmrouter.yaml
Check rollout:
kubectl -n smart-llmrouter rollout status deploy/smart-llmrouter
kubectl -n smart-llmrouter get pods,svc,ingress
Network Policy
The example policy allows ingress from a placeholder ingress controller namespace, HTTPS egress for external model providers, DNS egress, and Postgres egress to an example private CIDR. Update it for:
- the actual ingress controller namespace labels;
- approved provider endpoints or private upstream ranges;
- external Postgres address ranges;
- internal observability endpoints if required.
Network policy enforcement depends on the cluster CNI. Validate both allowed and denied flows in staging.
Smoke Tests
Port-forward for an internal smoke before exposing ingress:
kubectl -n smart-llmrouter port-forward svc/smart-llmrouter 18080:80
curl -fsS http://127.0.0.1:18080/readyz
curl -fsS http://127.0.0.1:18080/docs/
curl -fsS http://127.0.0.1:18080/version
Then test with a deployment-issued caller token:
kubectl -n <namespace> rollout status deploy/<router-deployment>
kubectl -n <namespace> get pods,svc,ingress
export ROUTER_BASE_URL="https://router.example.com"
export ROUTER_TOKEN="replace-with-router-token"
curl -fsS "$ROUTER_BASE_URL/readyz"
curl -fsS "$ROUTER_BASE_URL/docs/"
curl -fsS -H "Authorization: Bearer $ROUTER_TOKEN" \
"$ROUTER_BASE_URL/v1/models"
Run one small request through each client API shape that callers use:
- OpenAI Chat for
/v1/chat/completions; - OpenAI Responses or Codex CLI when enabled;
- Anthropic Messages or Claude Code when enabled;
- tool-call and image smokes for groups that advertise those capabilities.
When admin reports are enabled, verify browser-admin authentication and authorization separately. Ordinary caller tokens must not access /admin/reports/ or /metrics.
Upgrade And Rollback
Use immutable image tags and reviewed config changes. Before rollout:
- Back up the router ConfigMap, Secret references, PVC or state backup, and usage database.
- Push the new per-architecture image tag to the private registry.
- Update the overlay image patch.
- Run
kubectl apply --dry-run=server. - Apply and wait for rollout.
- Smoke
/readyz,/docs/,/v1/models, one chat request, and admin reports if enabled.
Rollback uses the previous image tag and previous ConfigMap/Secret versions:
kubectl -n smart-llmrouter rollout undo deploy/smart-llmrouter
kubectl -n smart-llmrouter rollout status deploy/smart-llmrouter
If a database migration or config change caused the failure, restore from the pre-upgrade backup before resuming traffic.
Troubleshooting
- Pod not ready: check license path, Postgres DSN, provider env file mount, and
/readyzlogs. CrashLoopBackOff: runkubectl logs deploy/smart-llmrouterand verify the mounted config parses./v1/modelsempty: confirm the caller token allow-list and model group config.- Provider errors: validate cluster egress, provider keys, and direct upstream smokes.
- Admin reports unavailable: confirm
server.admin_reports.enabled, browser-admin auth, Casbin grants, and usage DB connectivity.
The router never needs raw provider keys or router tokens in support screenshots. Share request IDs, status codes, sanitized logs, and safe configuration summaries instead.
Related pages: