Use Cases

Build it on one managed gateway.

RAG pipelines, AI agents, support bots, coding assistants, document processors, voice apps — they all need routing, guardrails, cost control, and observability. Nemo Router gives you all four behind one key across 97+ models.

Get started Browse models

nemo · one key, every workload

Same gateway, different jobs

RAG pipelineembed + chat

Agent runtimefallback chains

Support botPII redacted

Doc processingvision · batch

Voice assistantlatency-routed

Provider confignone

one keyone billno BYOK

One key: 97+
Every feature: All tiers
Gateway overhead: 95 ms
Platform fee: 0%

Pick your workload

Six use cases, one gateway

Each page below is tailored to how that workload actually runs — which Nemo capabilities matter, and how the request flow looks end to end.

RAG pipelines

Retrieval-augmented generation, one key for embed + chat

A RAG pipeline calls two model families — embeddings to index and query, a chat model to synthesize the answer. Route both through one endpoint, cache the repeats, and track cost per pipeline stage.

Embed + chat routing
Response caching
Per-pipeline cost tracking

Explore rag pipelines

AI agents

Autonomous agents that stay up and stay in budget

Agents fan out dozens of LLM calls per task. Auto-failover keeps a long run alive when a provider degrades, per-agent budgets cap spend, and every tool call lands in the request log for audit.

Fallback chains
Per-agent budgets
Tool-call observability

Explore ai agents

Customer support bots

Support bots with PII redaction and injection defense built in

Support traffic carries real customer data and hostile inputs. Guardrails redact PII and block prompt injection on every request; versioned prompt templates keep tone consistent across the fleet.

PII redaction
Injection defense
Versioned prompt templates

Explore customer support bots

Code generation

Coding assistants with model choice and per-seat budgets

Coding tools mix fast autocomplete with deep reasoning. Pick the model per task from the catalog, route latency-sensitive calls to the quickest endpoint, and cap spend with per-key, per-seat budgets.

Model catalog choice
Latency routing
Per-seat budgets

Explore code generation

Document processing

Extract, classify, and summarize documents at volume

Document workloads run vision-capable models over scans and PDFs in large batches. Tag-filtered routing keeps every request on a vision model, and cost tracking attributes spend per batch job.

Vision-capable models
Batch-friendly routing
Per-job cost tracking

Explore document processing

Voice & realtime AI

Low-latency routing for voice and realtime assistants

Voice apps live or die on latency. Latency-based routing steers each turn to the quickest healthy endpoint, fallbacks keep the conversation alive, and every turn is logged for observability.

Latency-based routing
Turn-level observability
Model catalog

Explore voice & realtime ai

The common foundation

What every use case shares

The workloads differ, but the gateway underneath is the same. These three guarantees hold whether you ship a RAG bot or a voice agent.

One key, no provider config

Every use case below starts the same way: one Nemo Router key, the OpenAI-compatible endpoint, and the model catalog. No provider accounts, no key vault, no per-model SDK.

Guardrails and budgets, always on

PII redaction, prompt-injection detection, and per-org / per-team / per-key budgets apply to every workload — RAG, agents, or voice. You opt out, never in.

Cost and logs you can attribute

Nemo Router reports the real cost of every call; Nemo logs the strategy, model, latency, and result. Slice spend by pipeline, agent, batch job, or seat.

One key. One bill. Every workload.

Start with the use case that fits your build

Sign up, paste your virtual key, change the base URL. Routing, guardrails, budgets, and observability are unlocked on every plan.

Get started See pricing