The Cost Complexity Problem
LLM pricing is not simple. OpenAI charges per token with different rates for input and output. Anthropic has tiered pricing based on model size. Google Vertex AI offers both per-token and provisioned throughput pricing. AWS Bedrock has on-demand, provisioned, and batch pricing modes. Some providers charge per request, others per minute of audio, others per image generated.
When your application routes to models across a dozen providers, building a unified cost tracking system from scratch is a significant engineering effort. Pricing changes frequently, new models launch weekly, and edge cases abound (cached tokens, batched requests, fine-tuned model multipliers).
NemoRouter solves this by computing the exact cost on every request and reporting it back to you, so your application, your dashboard, and your credit balance all agree on one number.
One Authoritative Cost Per Request
NemoRouter maintains a comprehensive pricing database that covers every supported model across every supported provider. After each request, the gateway computes the exact cost based on the model, token counts, and current pricing, and reports that cost back on the response.
This is a deliberate design decision: NemoRouter computes cost in one place and everything reads from that single value. Your application, your credit ledger, and your dashboard all see the same number.
Why a single source of truth? Because parallel cost calculation creates reconciliation nightmares. When provider pricing changes, every system that recomputes cost would have to update simultaneously, and any drift between them means disagreements between what your application reports and what your credit balance is debited — a billing dispute waiting to happen.
The Credit System
Users purchase credits in dollar amounts. Credits map 1:1 to USD for simplicity. When you buy $100 in credits, you get $100 of LLM usage. The platform fee is charged on top at purchase time, not deducted from the credit amount.
Tier-Based Platform Fees
NemoRouter offers three pricing tiers, each with a different platform fee charged at credit purchase:
| Tier | Platform Fee | Minimum |
|---|---|---|
| Tier 1 (Pay As You Go) | 4% | $0/mo |
| Tier 2 | 2% | $100/mo |
| Tier 3 | 0% | $1,200/yr |
When a Tier 1 user buys $100 in credits, they pay $104 and receive $100 in their balance. A Tier 3 user pays exactly $100 for the same $100 in credits. This keeps per-request cost deduction simple — every request deducts the exact cost reported by the gateway, with no additional fee calculation at request time.
Platform fee tiers are set automatically by Stripe webhooks when a subscription is created or changed. There is no manual configuration.
Reserve and Settle
The critical challenge in a credit system is preventing overspend when multiple requests execute concurrently. If a user has $1.00 remaining and sends ten requests simultaneously, naive balance checking would approve all ten before any deduction occurs.
NemoRouter uses a reserve-and-settle pattern:
- Reserve — Before forwarding to the provider, estimate the maximum cost and reserve that amount from the credit balance in a single atomic operation, so concurrent requests can't all spend the same headroom.
- Forward — Send the request through the gateway to the chosen provider. The reservation ensures the balance cannot be spent by concurrent requests.
- Settle — When the call completes with its exact cost known, settle the reservation: deduct the actual cost and release the unused portion.
- Release on failure — If the request fails, release the full reservation. Every error path releases — a leaked reservation is frozen credits.
This pattern guarantees that credit balances never go negative, even under high concurrency. (See reserve-and-settle for the full pattern.)
Cost Attribution and Observability
Every LLM response from NemoRouter reports the exact cost of that call back to your application, alongside the organization and key it was attributed to — so you can track spend at whatever granularity you need without building your own accounting layer.
NemoRouter also tags each upstream call so that provider dashboards (OpenAI Usage, Google Cloud Console) show spend broken down by NemoRouter organization. This lets you reconcile what NemoRouter reports against what the providers charge.
The dashboard provides multiple views into cost data: per-model breakdowns, per-key spend, daily/weekly/monthly trends, and exportable CSV reports. All cost data flows from the same authoritative source — the per-request cost the gateway computed and reported.
Handling Edge Cases
Several scenarios require special attention:
Streaming responses — Cost is only known after the full response completes. The reservation holds an estimated amount until the stream finishes and the final cost is reported.
Cached responses — NemoRouter can cache responses to reduce cost. Cached hits report zero or reduced cost, and the settlement reflects that.
Failed requests — Provider errors (rate limits, content filters, timeouts) must release the reservation immediately. Any non-2xx outcome is a trigger to release.
Provisioned throughput — Models running on Azure PTU or GCP GSU have different cost characteristics. The gateway handles this in its pricing calculation; the credit settlement treats the reported cost identically regardless of the underlying pricing model.
The principle is consistent: one authoritative cost per request, one credit ledger reading from it. That separation of concerns keeps both systems reliable and reconcilable.


