Platform Fee
4% Markup
Platform Fee
0% Tier 3
Cost Honesty: We Read the Number, We Don't Invent It
The most important number in an LLM gateway is the cost, and the easiest one to fudge. Our principle: read the provider's authoritative cost, never compute a convenient one. Here is why that shapes everything.
Every number a customer sees on NemoRouter eventually traces back to one value: what a request actually cost. It's the most important number in the product and the easiest one to quietly get wrong — in your favor. So we made a principle of it: we read the provider's authoritative cost, we never compute our own. This sounds like an implementation detail. It's actually the foundation everything else stands on, and it's worth saying out loud.
The temptation to invent the number
A gateway could compute cost itself: count tokens, multiply by a price table, done. It's easy, and it's quietly corruptible in three ways:
- Stale prices drift from reality — and a stale-low price under-charges, a stale-high over-charges. Either way the number is fiction.
- Rounding "conveniences" that always round one direction add up to a margin nobody agreed to.
- Hidden markup baked into a computed rate is invisible by construction — you can't see a cut you can't decompose.
The thing is, a computed cost is plausible. It looks right. That's exactly what makes inventing the number dangerous: the customer can't tell a computed estimate from a real charge, so the incentive to fudge is enormous and the detection is near zero.
The principle: read, don't compute
request → provider serves it → provider reports the real cost
│
we read x-…-response-cost ← the only number we trust
│
settle credits to THAT, exactlyThe inference layer reports the settled cost of the call it actually served. We read that header and bill it, full stop. We don't have a price table to go stale, because we don't price the call — the provider does, and we pass it through. (The mechanics: reading the cost header, even while streaming.)
The honest number is also the simpler one
Reading the provider's cost isn't just more honest than computing it — it's less code. No price table to maintain, no per-model rate map to update when a provider changes pricing, no rounding policy to argue about. The honest design and the simple design are the same design, which is the happiest kind of principle.
Why honesty here makes everything else trustworthy
Cost honesty isn't one feature — it's the precondition for the rest of them:
- Budgets cap real spend, so a cap means what it says.
- Cost attribution sums to the actual bill, so per-customer economics reconcile.
- Markup-free credits are only credible if the per-call cost has no hidden cut — which it can't, because we don't set it.
- Multimodal floors exist precisely so a missing price reads as a conservative charge, not a dishonest $0.
Pull the honest-cost foundation out and every one of those becomes a number you'd have to take on faith. Keep it, and they're all just arithmetic on a real value.
The same principle, everywhere
Cost honesty is one instance of a broader stance you'll find across the product: no mocks or demo mode (don't fake the system), real tenants and real RLS (don't fake isolation), the ledger as source of truth (don't fake the balance). The through-line is a refusal to build a convenient fiction next to the truth. Cost is just where it matters most, because cost is where the incentive to fudge is strongest.
The takeaway
The most important number in an LLM gateway is the cost, and the most important decision is whether to read it or invent it. We read it — the provider's authoritative value, passed through exactly, never a computed estimate we could quietly tilt. That choice is more honest and simpler, and it's what makes budgets, attribution, and markup-free credits trustworthy instead of faith-based. When the number is real, everything built on it can be too.