$5 free credits when you sign up
← All posts
Product

Per-Customer LLM Billing for AI Apps

If you sell an AI product, you need to know what each customer costs to serve. Here is how request tags, team scoping, and exact cost tracking turn one gateway bill into per-customer economics.

If you're building an AI product, "the model bill" is the wrong unit. The question that decides whether your business works is per customer: what does it cost to serve this account, is that customer profitable, and can I bill them for what they actually used? A gateway that only gives you one monthly total can't answer any of those. Here's how to get per-customer economics out of your LLM spend without building a metering system yourself.

The unit-economics problem

Your AI feature calls models on behalf of many customers. At month end you have one provider bill and no idea how it splits. That's fine until someone asks:

  • Which customers are unprofitable? (heavy users on a flat plan)
  • Can we offer usage-based pricing? (you'd need per-customer usage)
  • What's our gross margin per account? (cost to serve vs revenue)

Without attribution, you guess. With it, you query. The gateway already sees every call and its exact cost — the trick is tagging each call with who it was for.

Attribute every call to a customer

Tag each request with the customer it serves (and the feature, while you're at it):

client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[...],
    extra_body={"metadata": {"tags": ["customer:acme", "feature:assistant"]}},
)

Now every call's exact cost is indexed by customer. "What did Acme cost us this month" is a filter on the Cost report, summing to the real provider number — not an estimate, not a guess. (Full mechanics in cost attribution by tag.)

Attribution is metadata, authorization is not

Tag-based attribution tells you what a customer cost. It does not decide whose budget gets charged — that's tied to the authenticated key. Keep "measuring spend" and "authorizing spend" separate: a tag is for your books, not for access control.

Two models: shared key vs key-per-customer

There are two ways to structure this, and the right one depends on isolation needs:

ModelHowBest when
Shared key + customer tagOne key, customer: tag per callMany small customers; you just need attribution
Key (or team) per customerA virtual key / team per customerYou need hard per-customer caps + isolation

Tagging is the lightest path and gives you the breakdown. Promoting big or sensitive customers to their own key/team adds hard budget caps and rate limits per customer — so one customer's runaway can't eat another's headroom, and a leaked key is scoped to one account.

From attribution to invoices

Once spend is attributed, usage-based billing is a reporting step, not an engineering project:

1. tag every call with customer:<id>
2. at period end, sum cost per customer (Cost report / export)
3. apply your markup / plan → invoice line per customer
4. reconcile: sum of customer costs == your gateway bill

Because the costs are the provider's settled numbers, step 4 reconciles exactly — there's no drift between "what we billed customers for" and "what we actually paid." That reconciliation is what makes usage-based pricing safe to offer: you're never billing on an estimate that could be wrong in either direction.

Margin, not just cost

The real payoff is margin visibility. With cost-per-customer in hand and revenue-per-customer in your billing system, gross margin per account becomes a join. You find the flat-plan customer whose usage makes them unprofitable, the enterprise account with huge headroom, and the feature that's quietly expensive across everyone (overlay cost-vs-usage to spot it). Those are the inputs to pricing decisions you currently make blind.

The takeaway

Selling AI means caring about per-customer economics, and that starts with attribution. Tag every call with the customer, let the gateway sum exact costs per account, promote your heavy/sensitive customers to their own capped key, and reconcile to the bill. You go from "the model costs went up" to "Acme's margin is 60%, and here's the customer dragging it down" — the difference between running an AI business and hoping one works. Start in the Cost reports.

Written by Nemo TeamEngineering, product, and company posts from the Nemo Router team — code-first, cost-honest, no vendor-marketing fluff.

More from Product

All posts →
Product

Markup-Free LLM Credits: You Keep 100%

Most gateways quietly take a cut of every token. NemoRouter charges a platform fee on top at purchase and gives you 100% of your credits. Here is why that pricing model is more honest — and cheaper at scale.

Nemo Team
7 min
Product

Multimodal Cost Safety: Image, Video, and Audio Floors

Image, video, and audio models don't price like text — and a $0 cost reading is a silent revenue leak. Here is how reserve floors and zero-cost gating keep multimodal spend safe.

Nemo Team
8 min
Product

An LLM Gateway for Coding Agents

Coding agents burst into hundreds of model calls per task across many tools. Here is how a gateway gives them budgets, fallback, and per-task attribution so an autonomous loop can't run up a surprise bill.

Nemo Team
8 min