There are two ways a gateway can make money, and they are not equally honest. One adds a markup to every token you spend, forever, invisibly. The other charges a clear platform fee once, at purchase, and then gets out of the way. NemoRouter does the second: you pay a fee on top when you buy credits, and 100% of your purchase becomes spendable credits. This post is why that distinction matters more than it first appears.

Two pricing models

PER-TOKEN MARKUP (typical aggregator)
  you spend $1.00 of model cost → gateway bills $1.10 → markup hidden in every call

FEE-ON-TOP (NemoRouter)
  you buy $100 of credits → pay e.g. $104 at checkout (4% fee on top)
  → $100 of credits, every cent spent at real provider cost, 0% per-call markup

In the markup model, the gateway's cut scales with your usage forever — the more you succeed, the more you pay them, on every single call. In the fee-on-top model, the platform fee is a one-time, visible line at purchase, and your actual inference is billed at the provider's real cost with nothing added.

Why fee-on-top is more honest

The problem with per-token markup isn't just the money — it's that it's invisible. You can't see it in any single call; it's baked into a rate you can't decompose. You never quite know what the model cost versus what the gateway took. Fee-on-top makes the gateway's cut a number you see at checkout and never again. Your usage analytics show real provider costs, your budgets cap real spend, and your per-customer economics reconcile to real numbers — because there's no hidden markup distorting them.

Cost tracking only works if the cost is real

Every cost feature on this blog — exact tracking, budgets, attribution, margins — assumes the number you see is the provider's actual cost. A per-token markup breaks that assumption silently: your "cost" reports include someone's cut. Fee-on-top keeps the cost number pure, which is what makes everything built on top of it trustworthy.

It's cheaper exactly when you scale

The two models cross over precisely as you grow. A per-token markup is a percentage of usage with no ceiling — at 10× the volume, you pay 10× the markup. A platform fee is charged on the purchase, and Pro drops the fee to zero once your spend is steady:

Plan	Platform fee	Per-call markup
Pay as you go	4% (on top, at purchase)	0%
Pro ($50/mo or $500/yr)	0%	0%

On Pro the platform takes nothing on top and nothing per call. The heavier your usage, the worse a per-token markup gets and the better a flat-fee/0% model gets — the models diverge in your favor exactly when it matters.

No feature gating, either

The fee is the only thing the plan changes. Every feature — guardrails, budgets, RBAC, observability, fallback, A/B testing — is unlocked on every plan. You're never paying a higher price to access a capability; you're choosing Pay as you go or Pro based on your spend. Safety and governance aren't upsells.

The takeaway

How a gateway makes money shapes everything downstream. A per-token markup is an invisible, uncapped cut that quietly distorts every cost number you look at. A platform fee on top — with 100% of your purchase becoming credits and 0% per-call markup — is visible once, honest thereafter, and cheaper precisely as you scale. You keep all your credits, your cost reports show real provider costs, and the tiers compete on fee, not features. See the math on the pricing page.

Markup-Free LLM Credits: You Keep 100%

Two pricing models

Why fee-on-top is more honest

It's cheaper exactly when you scale

No feature gating, either

The takeaway

More from Product

Predictable AI Spend: Budgets That Cannot Be Blown

Access Every AI Model With One API

Multimodal Cost Safety: Image, Video, and Audio Floors