← All posts
LLM gateway buyer's guide 2026: routing, guardrails, evals, prompt management
Buyer's Guide

LLM gateway buyer's guide 2026: routing, guardrails, evals, prompt management

Eight axes that actually matter when picking an LLM gateway in 2026. Shortlist matrix across OpenRouter, Portkey, LiteLLM, Helicone, NemoRouter. Decision tree by buyer profile, 90-minute evaluation.

Nemo Router team14 min read
llm-gatewaybuyers-guidellm-gateway-comparisonllm-routingllm-guardrails

The wedge claim: NemoRouter is the only LLM gateway that gives every customer all enterprise features — guardrails, A/B tests, prompt management, evals, budgets — free for life, with every major LLM provider behind one API key. Tiers vary the platform fee (4% / 2% / 0%); they never lock features.

If you are evaluating an LLM gateway in 2026, the OpenAI / Anthropic / Vertex / Bedrock direct path has stopped scaling — on cost, attribution, guardrails, or the velocity of swapping models without redeploying half your stack. This guide names the eight evaluation axes that matter and applies them across the five gateways most teams shortlist, with vendor pricing pages cited by URL with a 2026-05-16 audit timestamp.

We are unambiguous about where NemoRouter sits — that is the wedge: every gateway feature you would otherwise pay an enterprise contract for is free on every NemoRouter tier (Tier 1, 2, 3, Enterprise), with the tier varying the platform fee (4% / 2% / 0%) rather than the feature set.

Five-minute path: eight axes, shortlist matrix, decision tree.


What an LLM gateway actually is (and isn't)

An LLM gateway is a single API endpoint between your application and N upstream model providers. It does three things: routes requests by your rules (cost, latency, fallback, A/B), observes them with attribution metadata (team, customer, feature), and governs them with budgets, guardrails (PII / jailbreak / regex), rate limits, and prompt versioning.

A gateway is not a vector DB, a RAG framework (LangChain, LlamaIndex), a fine-tuning platform, or an evaluation harness in isolation. It is the chokepoint where governance, cost, and routing decisions live regardless of upstream model. Swapping model A for model B is now a quarterly event; re-plumbing each application does not scale. Anthropic, OpenAI, Azure OpenAI, and Vertex publish broadly compatible chat APIs, but per-request shape, key rotation, and quota stories differ — exactly the wedge a gateway closes.


The eight evaluation axes

These are the eight axes that determine whether a gateway will hold up in production for the next 18 months. We have ordered them by the rate at which buyers we talk to discover they got them wrong post-switch.

1. Routing & fallback (table stakes)

Expect: model alias resolution, weighted A/B splits, failover chains by latency or 5xx, per-request model override, streaming. The gateway should expose an OpenAI-shaped /chat/completions and an Anthropic-shaped /messages endpoint so existing SDK code works with a 2-line change. Ask: show me a UI that shifts 5% of gpt-4o traffic to claude-3.5-sonnet without a redeploy.

2. Guardrails (PII / jailbreak / regex)

PII redaction, jailbreak detection, custom regex denylists, prompt-injection mitigation. Often the first feature gated behind a paid tier — note where each vendor places this on its pricing page. Ask: can I write a custom regex guardrail today on the tier I am evaluating, without a sales call?

3. Per-team / per-customer budgets + virtual keys

Scoped API keys (per team, per environment, per customer) with hard or soft monthly caps; alert at 80%, enforced cutoff at 100%. Critical for multi-tenant SaaS — without it, one runaway tenant eats the month's budget. Ask: can I issue 50 virtual keys, each with a $200/month cap, and see real-time spend per key?

4. Eval pipelines

Run a fixed prompt set across N models on a schedule, compare output quality (LLM-as-judge, exact match, semantic similarity), gate upgrades on regressions. Often enterprise-only across competitors. Ask: show me an eval comparing gpt-4o vs claude-3.5-sonnet vs gemini-2.5 on my own dataset, with last-run delta visible to engineers.

5. Prompt management & versioning

Centralized prompt registry, versioned, with diffs, environment promotion (dev → staging → prod), and rollback without a code change. Ask: can a non-engineer (PM, CS) edit a production prompt safely, with audit trail and one-click rollback?

6. Observability & attribution

Per-request, per-team, per-customer cost and latency rollups; export to your warehouse (BigQuery, Snowflake, S3) or to LangSmith / Helicone / your own SIEM. Ask: show me $/customer for the last 30 days, broken down by feature.

7. Cost controls + provider economics

The lever buyers most underestimate. The gateway can pool customer traffic into provider reservations — Azure OpenAI Provisioned Throughput Units (PTU), Google Vertex Provisioned Throughput / GSU, AWS Bedrock Provisioned Throughput — that no single mid-market customer can sustain alone. Annual reservations on Azure PTU are documented at up to 70% off PAYG; monthly up to 30%. Full math: the cost teardown post. Ask: do you pool reservations across customers, and how does that show up on my invoice?

8. Compliance posture (RLS, RBAC, residency, audit)

Multi-tenant isolation at the database (Postgres RLS, not app-layer only), SSO/SAML, regional residency (US / EU / customer-private), per-request audit log, SOC 2 / GDPR. Ask: show me the RLS policy that prevents tenant A from reading tenant B's prompt history; show me the residency knob.


The 2026 shortlist matrix

Five gateways come up most in 2026 buyer conversations: OpenRouter, Portkey, LiteLLM, Helicone, and NemoRouter. All have public pricing pages; the rows below are sourced from those pages, with audit timestamps in the Sources section. We are explicit about where features sit on the pricing ladder — that is the whole point of this guide.

AxisOpenRouterPortkeyLiteLLMHeliconeNemoRouter
1. Routing & fallbackRouting-first productYes, Pro+Yes, OSS + CloudLimited; observability-firstYes, every tier
2. Guardrails (PII/jailbreak/regex)Not offered as a product featurePro / EnterpriseSelf-host onlyPro / EnterpriseFree, every tier
3. Per-team / per-customer budgetsNot offeredEnterpriseSelf-host onlyNot offeredFree, every tier
4. Eval pipelinesNot offeredEnterpriseSelf-host onlyPro / EnterpriseFree, every tier
5. Prompt management / versioningNot offeredPro / EnterprisePartial, self-hostLimitedFree, every tier
6. Observability & attributionSpend dashboard onlyYes, Pro+Yes, OSS + CloudYes, primary productYes, every tier
7. Annual prepay → 0% platform fee5% credit feeAnnual contract / salesCloud Enterprise / self-host freeAnnual contract / salesTier 3, $1,200/yr
8. Provider reservation pooling (PTU/GSU/Bedrock PT)Not documentedNot documentedNot at the gateway layerNot at the gateway layerYes, post-$10k ARR

OpenRouter, Portkey, LiteLLM, and Helicone are trademarks of their respective owners. NemoRouter is not affiliated with or endorsed by any of these vendors. Every row is sourced from each vendor's public pricing or documentation page on 2026-05-16; if any has changed, email us and we'll update.

The pattern across the matrix is the same one we wrote up in the OpenRouter alternative comparison: governance features (rows 2–5) are gated by every competitor and free on every NemoRouter tier. The competing wedge is feature-gating — ours is platform-fee-gating, with all features unlocked.


Decision tree: which gateway for which buyer

Three buyer profiles cover roughly 90% of evaluations we see. Map yourself to one and use the decision below as a starting point — then validate with your own list of must-haves.

Buyer A — Indie / hobbyist / single-developer prototype

LLM spend under $500/month, no compliance, no team. Any of the five works; NemoRouter Tier 1 (4% PAYG, free $5 credit) is a strict superset of OpenRouter Pro (5% credits) — same OpenAI shape, lower fee, with the option to grow into per-team budgets without re-platforming. Stop reading; go to /signup.

Buyer B — Mid-market SaaS adding AI (ICP 2)

$5k–$50k/month spend, 50–500 engineers, compliance asking for per-team / per-customer attribution and annual budget predictability. NemoRouter Tier 3 ($1,200/yr prepay, 0% platform fee, all features) dominates. Breakeven math in the cost teardown: above $5k/month, Tier 3 wins on absolute dollars while keeping the same feature set as every other tier. Portkey and Helicone solve observability, but their Pro / Enterprise gating on guardrails, evals, and budgets requires multi-year contract negotiation NemoRouter does not.

Buyer C — Platform / infra eng at AI-native startup (ICP 3)

$10k–$200k+/month, multiple product lines, multiple regions; internal platform team needs OSS for self-host or air-gap. Evaluate LiteLLM (OSS) for the air-gap path and NemoRouter for the managed path side-by-side. The reservation-pooling lever is one LiteLLM cannot replicate without aggregating customers across companies — a gateway-provider economic, not a feature flag. Self-host: LiteLLM. Annual prepay → reservations: NemoRouter Tier 3 / Enterprise.


The 90-minute evaluation

You do not need a 2-week vendor cycle. Run this:

  1. (15 min) Mark the eight axes above as MUST, SHOULD, or DEFER. Be honest — SOC 2 is a MUST for healthcare; for a 4-person startup it is a DEFER.
  2. (20 min) For each gateway, find the tier where every MUST first appears (pricing page or docs). If unclear inside 5 minutes, mark "unknown" and assume worst case.
  3. (20 min) For each gateway, write annual cost at the lowest tier covering all MUSTs (12 × monthly minimum + per-seat + platform-fee-on-$X). That is the real annual cost in your scenario, not the headline.
  4. (15 min) Run a 5-prompt smoke test on the two cheapest gateways that cover the MUSTs. Same prompt, model, temperature. Compare latency p50/p95, error rate, and any guardrail wired up in the trial.
  5. (20 min) Read the What this is not section of the cost teardown. Decide whether the 60% / 70% reservation lever is actually available to you (annualized spend > $10k/month, willing to prepay) or whether you should stay PAYG. Either answer is fine; the wrong answer is pretending you are somewhere on the curve you aren't.

If you want help running this on your own numbers, that is what the 30-minute walk-through on /community is for.


Switching cost: the two-line diff

The migration into NemoRouter (or any OpenAI-shaped gateway) is two lines for an existing OpenAI integration:

# Before — direct to OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After — through NemoRouter
client = OpenAI(
    api_key=os.environ["NEMOROUTER_API_KEY"],
    base_url="https://api.nemorouter.ai/v1",
)

The Anthropic SDK accepts the same shape via its base_url parameter. Typical scoping: half a day for one engineer plus 24–72 hours of dual-run. Add it to your spreadsheet alongside the annual platform fee in step 3 above.


What this is not

A few buyers should not switch to a gateway at all in 2026:

  • Single-provider committed accounts with negotiated direct-deal discounts that beat reservation pricing — rare above $1M/yr but possible.
  • FedRAMP High / IL5+ workloads where the gateway provider is not on the relevant authorization boundary. Direct endpoints inside the boundary, gateway outside.
  • Sub-$500/month spend with zero compliance load — any of the five works; pick the cleanest signup and move on.

We would rather you skip the switch than switch on a math error.


How NemoRouter is different in one paragraph

Every other gateway in the matrix uses feature gating as its monetization wedge: guardrails on Pro, evals on Enterprise, budgets on call-sales. We use platform-fee gating instead — the same complete feature set on every tier (Tier 1 free PAYG, Tier 2 $100/mo, Tier 3 $1,200/yr, Enterprise custom), with only the percentage we take on top of provider cost varying (4% / 2% / 0% / 0%). We then pool customer traffic into provider reservations (Azure PTU, Vertex GSU, Bedrock PT) once aggregated revenue clears the reservation minimum, and pass the savings back through Tier 3 / Enterprise rather than as a promo. That is the entire wedge. It is in the first sentence of this post and the first sentence of every post we write — because if it stops being true, we have failed.


Try it on your own numbers

We auto-grant $5 in API credits on signup, no card required. Enough to route 5–10 production prompts across Tier 1, exercise the guardrails + per-team budgets UI on your real traffic, and decide whether the Tier 3 math holds before you sign anything.

Start free at nemorouter.ai/signup — Tier 1, $5 credit, no card. Mid-market SaaS or larger? Bring your last 90 days of LLM invoices to a 30-min walk-through (book through /community) and we'll redo the cost teardown against your actual spend.


See also


Sources

Vendor pricing/docs pages, verified 2026-05-16:

Written by Nemo Router teamEngineering, product, and company posts from the NemoRouter team — code-first, cost-honest, no vendor-marketing fluff.