Markup-Free LLM Credits: You Keep 100%
Most gateways quietly take a cut of every token. NemoRouter charges a platform fee on top at purchase and gives you 100% of your credits. Here is why that pricing model is more honest — and cheaper at scale.
There are two ways a gateway can make money, and they are not equally honest. One adds a markup to every token you spend, forever, invisibly. The other charges a clear platform fee once, at purchase, and then gets out of the way. NemoRouter does the second: you pay a fee on top when you buy credits, and 100% of your purchase becomes spendable credits. This post is why that distinction matters more than it first appears.
Two pricing models
PER-TOKEN MARKUP (typical aggregator)
you spend $1.00 of model cost → gateway bills $1.10 → markup hidden in every call
FEE-ON-TOP (NemoRouter)
you buy $100 of credits → pay e.g. $104 at checkout (4% fee on top)
→ $100 of credits, every cent spent at real provider cost, 0% per-call markupIn the markup model, the gateway's cut scales with your usage forever — the more you succeed, the more you pay them, on every single call. In the fee-on-top model, the platform fee is a one-time, visible line at purchase, and your actual inference is billed at the provider's real cost header with nothing added.
Why fee-on-top is more honest
The problem with per-token markup isn't just the money — it's that it's invisible. You can't see it in any single call; it's baked into a rate you can't decompose. You never quite know what the model cost versus what the gateway took. Fee-on-top makes the gateway's cut a number you see at checkout and never again. Your usage analytics show real provider costs, your budgets cap real spend, and your per-customer economics reconcile to real numbers — because there's no hidden markup distorting them.
Cost tracking only works if the cost is real
Every cost feature on this blog — exact tracking, budgets, attribution, margins — assumes the number you see is the provider's actual cost. A per-token markup breaks that assumption silently: your "cost" reports include someone's cut. Fee-on-top keeps the cost number pure, which is what makes everything built on top of it trustworthy.
It's cheaper exactly when you scale
The two models cross over precisely as you grow. A per-token markup is a percentage of usage with no ceiling — at 10× the volume, you pay 10× the markup. A platform fee is charged on the purchase, and the tiers are designed so the fee drops as you commit more:
| Tier | Platform fee | Per-call markup |
|---|---|---|
| Tier 1 | 4% (on top, at purchase) | 0% |
| Tier 2 | 2% | 0% |
| Tier 3 | 0% | 0% |
At Tier 3 the platform takes nothing on top and nothing per call. The heavier your usage, the worse a per-token markup gets and the better fee-on-top gets — the models diverge in your favor exactly when it matters.
No feature gating, either
The fee is the only thing the tiers change. Every feature — guardrails, budgets, RBAC, observability, fallback, A/B testing — is unlocked on every tier. You're never paying a higher price to access a capability; you're choosing a fee level based on commitment. (We made the case against feature gating separately.) Safety and governance aren't upsells.
The takeaway
How a gateway makes money shapes everything downstream. A per-token markup is an invisible, uncapped cut that quietly distorts every cost number you look at. A platform fee on top — with 100% of your purchase becoming credits and 0% per-call markup — is visible once, honest thereafter, and cheaper precisely as you scale. You keep all your credits, your cost reports show real provider costs, and the tiers compete on fee, not features. See the math on the pricing page.