Report Examples
This page shows anonymized production-derived report excerpts. Names, public token IDs, request IDs, IP addresses, provider names, upstream endpoint names, model names, and deployment-specific group names have been replaced with generic labels. The examples preserve the shape of the reporting output so platform teams can see how GenAI Smart Router supports cost governance and performance triage.
The savings examples compare stored router cost against a documented reference baseline:
| Baseline | Input price | Output price |
|---|---|---|
| Reference baseline A | $5.00 / 1M input tokens | $30.00 / 1M output tokens |
Savings are calculated as:
reference_cost = input_tokens / 1,000,000 * reference_input_price
+ output_tokens / 1,000,000 * reference_output_price
savings = reference_cost - stored_router_cost
Router cost is the stored request-time cost from usage rows. It is not recalculated from current provider metadata.
Graphical Summary
Traffic
Daily request volume
Savings
Router cost vs reference baseline
User Experience
Average latency by day
Caller Cohorts
Savings by anonymized team
Caller Cohorts
Average latency by anonymized team
Upstream Performance
Slowest anonymized upstream endpoints
Daily Usage And Savings
This excerpt answers: how much traffic ran each day, what did it cost through the router, what would it have cost under the reference baseline, and what user-facing latency did the deployment see?
| Day UTC | Calls | Errors | Input tokens | Output tokens | Router cost | Reference cost | Savings | Avg latency ms | Max latency ms | Avg upstream output tok/s |
|---|---|---|---|---|---|---|---|---|---|---|
| 2026-06-17 | 2,277 | 236 | 78,767,512 | 1,058,468 | $14.62 | $425.59 | $410.97 | 13,390 | 297,537 | 38.57 |
| 2026-06-18 | 2,773 | 372 | 61,897,631 | 1,520,263 | $45.20 | $355.10 | $309.90 | 11,782 | 303,098 | 50.52 |
| 2026-06-19 | 2,782 | 134 | 137,590,878 | 1,991,097 | $56.54 | $747.69 | $691.15 | 11,966 | 267,865 | 56.89 |
| 2026-06-20 | 1,725 | 71 | 86,558,339 | 708,877 | $31.75 | $454.06 | $422.31 | 10,665 | 121,148 | 52.59 |
| 2026-06-21 | 1,333 | 28 | 200,204,234 | 420,117 | $70.47 | $1,013.62 | $943.15 | 7,765 | 120,462 | 43.60 |
| 2026-06-22 | 3,142 | 349 | 116,476,313 | 1,037,342 | $135.54 | $613.50 | $477.97 | 8,224 | 259,202 | 47.22 |
| Total | 14,032 | 1,190 | 681,494,907 | 6,736,164 | $354.12 | $3,609.56 | $3,255.44 | 10,632 | 303,098 | 48.23 |
The daily view is useful for spend reviews, quota planning, rollout comparisons, and incident timelines. A rising error count or max latency spike can be investigated with the upstream endpoint and per-request sections in the generated report.
Caller Usage And Savings
This excerpt answers: which caller cohorts are driving spend, savings, token volume, and latency?
| Caller cohort | Project | Client | Calls | Errors | Input tokens | Output tokens | Router cost | Reference cost | Savings | Avg latency ms | Max latency ms | Avg downstream write output tok/s |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Team A | Project A | Client A | 1,634 | 65 | 187,520,679 | 802,818 | $64.50 | $961.69 | $897.19 | 13,442 | 121,148 | 509,579.56 |
| Team B | Project B | Client B | 1,014 | 21 | 175,094,108 | 292,520 | $63.80 | $884.25 | $820.44 | 7,849 | 120,462 | 291,263.34 |
| Team C | Project C | Client C | 1,595 | 124 | 50,312,249 | 900,358 | $41.48 | $278.57 | $237.09 | 16,988 | 303,098 | 696,380.87 |
| Team D | Project D | Client D | 470 | 51 | 42,551,897 | 222,225 | $12.96 | $219.43 | $206.47 | 10,643 | 121,009 | 528,595.47 |
| Team E | Project E | Client E | 498 | 27 | 36,619,786 | 146,357 | $86.28 | $187.49 | $101.21 | 5,363 | 61,406 | 78,517.37 |
| Team F | Project F | Client F | 784 | 64 | 32,061,835 | 330,830 | $5.30 | $170.23 | $164.93 | 14,971 | 295,382 | 460,312.09 |
The caller view supports chargeback, quota reviews, and support triage. For example, a team with moderate cost but high average latency may need a different model-group policy, a higher timeout, or a provider mix with stronger streaming behavior.
Upstream Endpoint Performance
This excerpt answers: which anonymized upstream endpoint/API combinations are slow, error-prone, or fallback-heavy?
| Endpoint | API shape | Calls | Errors | Attempts | Fallbacks | Total tokens | Router cost | Avg upstream ms | Max upstream ms | Avg upstream output tok/s | Avg latency ms |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Endpoint A | API A | 5 | 1 | 11 | 1 | 6,742 | $0.01 | 42,969 | 132,688 | 46.75 | 42,972 |
| Endpoint B | API B | 561 | 57 | 700 | 47 | 33,287,316 | $1.31 | 36,077 | 303,067 | 22.66 | 36,208 |
| Endpoint C | API C | 221 | 6 | 232 | 3 | 19,435,221 | $4.04 | 25,410 | 119,132 | 76.69 | 25,908 |
| Endpoint D | API D | 10 | 0 | 10 | 0 | 212,619 | $0.09 | 22,855 | 122,641 | 38.75 | 22,935 |
| Endpoint E | API E | 52 | 0 | 94 | 30 | 2,201,353 | $0.24 | 21,015 | 86,371 | 122.02 | 21,019 |
| Endpoint F | API F | 262 | 3 | 375 | 111 | 1,870,726 | $1.98 | 20,581 | 205,667 | 55.28 | 19,419 |
| Endpoint G | API G | 37 | 0 | 36 | 0 | 1,938,639 | $2.65 | 20,572 | 116,179 | 20.43 | 20,441 |
| Endpoint H | API H | 12 | 0 | 12 | 0 | 288,474 | $0.10 | 19,272 | 64,730 | 34.41 | 19,459 |
The endpoint view is the fastest place to identify upstreams that compromise user experience. High average upstream duration, low token throughput, elevated attempts, or fallback pressure can justify lowering a target weight, changing timeout policy, isolating a provider to lower-priority groups, or opening a provider incident.
Report Generation Commands
Generate a usage report for a recent window:
router-usage-report \
--driver postgres \
--dsn "$ROUTER_USAGE_DB_DSN" \
--since 7d \
--out usage-7d.md
Generate a report for one caller cohort or rollout:
router-usage-report \
--driver postgres \
--dsn "$ROUTER_USAGE_DB_DSN" \
--caller-user <owner-user> \
--caller-project <project> \
--caller-environment <environment> \
--out usage-project.md
Generated markdown reports include usage by internal key, caller, project, environment, caller IP, model group, provider/model, status, cache behavior, hour, day, downstream user performance, upstream endpoint performance, and per-request throughput. Savings tables are produced by comparing the generated report totals with a separately documented reference baseline.