FrameworksLiteLLM
LiteLLM
Use Nemo Router with LiteLLM's completion() — direct, no proxy
Last updated
LiteLLM's completion() works out of the box with Nemo Router. Set the api_base to https://api.nemorouter.ai/v1 — guardrails, caching, and rate-limits auto-apply server-side.
Nemo Router itself uses LiteLLM in path, so this pattern is a thin pass-through. If you're using LiteLLM purely as a routing layer, you can drop it and call Nemo Router directly with the Python SDK.
Installation
pip install litellmSetup
import os
from litellm import completion
response = completion(
model="openai/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}],
api_base="https://api.nemorouter.ai/v1",
api_key=os.environ["NEMOROUTER_API_KEY"],
)
print(response.choices[0].message.content)Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
api_base="https://api.nemorouter.ai/v1",
api_key=os.environ["NEMOROUTER_API_KEY"],
)
print(response.choices[0].message.content)
asyncio.run(main())Per-Request Overrides
completion() forwards unknown kwargs into the request body, so nemo_* fields work inline:
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Summarize Q1 earnings..."}],
api_base="https://api.nemorouter.ai/v1",
api_key=os.environ["NEMOROUTER_API_KEY"],
nemo_prompt_template_id="your-summarizer-id",
nemo_prompt_variables={"language": "Spanish"},
nemo_guardrail_ids=["guardrail-uuid-1"],
nemo_cache=False,
)Streaming
stream = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a short poem."}],
api_base="https://api.nemorouter.ai/v1",
api_key=os.environ["NEMOROUTER_API_KEY"],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Response Headers
LiteLLM exposes raw response headers via response._response_headers:
cost = response._response_headers.get("x-nemo-request-cost")
guardrails = response._response_headers.get("x-nemo-guardrails-applied")
print(f"Cost: ${cost} | Guardrails: {guardrails}")Next Steps
- Python SDK — Native Nemo Router SDK (no LiteLLM dependency)
- LangChain — For chain-style orchestration
Was this page helpful?