$5 free credits when you sign up
← All posts
Guides

Routing LLM Traffic by Cost vs Quality

Not every request needs your most expensive model. Here is a decision framework for routing LLM traffic by cost and quality — which tasks to send cheap, which to send premium, and how to prove the split works.

The most common LLM cost mistake is sending every request to your best model "to be safe." It feels prudent and it's quietly expensive — most tasks don't need the flagship, and the ones that do are a minority you can identify. Routing by cost vs quality means matching each task to the cheapest model that's good enough for that task, and proving the match instead of guessing. Here's the framework.

The core idea: quality is per-task, not per-model

There's no such thing as a model that's "better" in the abstract — only better at a task. A small model can be indistinguishable from a flagship at classification, extraction, or short rewrites, while falling apart on multi-step reasoning. So the unit of routing is the task, not the request volume:

task                        needs        route to
─────────────────────────────────────────────────
classify / tag / extract    low          cheap model
short rewrite / format       low          cheap model
summarize (short)            medium       mid model
multi-step reasoning         high         flagship
code generation (complex)    high         flagship

The savings come from the top rows: they're usually high volume and low need, so moving them off the flagship is the single biggest lever on your bill (and exactly the "whale" the cost-vs-usage chart reveals).

A four-question routing decision

For any task, route it down to a cheaper model unless one of these is true:

  1. Does it require multi-step reasoning or planning? → premium.
  2. Is the output hard to verify, so errors slip through? → premium (you can't catch the cheap model's mistakes).
  3. Is it customer-visible and brand-critical? → premium or test carefully.
  4. Is it long-context or multimodal in a way small models handle poorly? → premium.

If all four are "no," it's a cheap-model candidate. Most extraction, classification, and formatting tasks answer "no" four times.

Don't route on price tables alone

Two models with similar per-token prices can differ wildly in quality on your task, and a cheap model that needs three retries to get it right isn't cheap. Route on measured quality-per-dollar for your workload, not on the headline rate. The only way to know is to test — see below.

Prove the split, don't assume it

A routing decision is a hypothesis: "task X is fine on the cheaper model." Validate it with a deterministic A/B test before committing all of task X's traffic:

1. pick a task currently on the flagship
2. A/B it: flagship (A) vs cheaper model (B), deterministic split
3. compare quality signal + cost-per-request across cohorts
4. if B holds quality at lower cost → route task X to B
5. re-check the cost-vs-usage chart next period to confirm savings

This turns "I think we can save money" into "we moved task X to model B, quality held, and the bill dropped 22%." It also protects you from the opposite error — discovering in production that the cheaper model wasn't good enough, when an experiment would have caught it on a controlled slice.

Reliability is a separate axis

Don't conflate cost/quality routing with fallback chains. Cost/quality routing chooses which model should normally serve a task; fallback chooses what to do when that choice fails. Keep them separate: pick the right model for the task, and have a fallback for when its provider has a bad day. Mixing the two (e.g. making the cheapest flaky model your default and your fallback) gives you the worst of both.

The takeaway

Stop sending everything to the flagship. Route by task, send the low-need high-volume work to a cheaper model, reserve premium for genuine reasoning/verification/brand-critical cases, and prove each downgrade with a deterministic A/B test before committing. Quality is per-task, not per-model — and once you route that way, the flagship earns its cost on the requests that actually need it. The models page and Router Settings are where you wire it up.

Written by Nemo TeamEngineering, product, and company posts from the Nemo Router team — code-first, cost-honest, no vendor-marketing fluff.