FrameworksLiteLLM

LiteLLM

Use Nemo Router with LiteLLM's completion() — direct, no proxy

Last updated

LiteLLM's completion() works out of the box with Nemo Router. Set the api_base to https://api.nemorouter.ai/v1 — guardrails, caching, and rate-limits auto-apply server-side.

Nemo Router itself uses LiteLLM in path, so this pattern is a thin pass-through. If you're using LiteLLM purely as a routing layer, you can drop it and call Nemo Router directly with the Python SDK.

Installation

pip install litellm

Setup

import os
from litellm import completion

response = completion(
    model="openai/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
    api_base="https://api.nemorouter.ai/v1",
    api_key=os.environ["NEMOROUTER_API_KEY"],
)
print(response.choices[0].message.content)

Async

import asyncio
from litellm import acompletion

async def main():
    response = await acompletion(
        model="openai/gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
        api_base="https://api.nemorouter.ai/v1",
        api_key=os.environ["NEMOROUTER_API_KEY"],
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Per-Request Overrides

completion() forwards unknown kwargs into the request body, so nemo_* fields work inline:

response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Summarize Q1 earnings..."}],
    api_base="https://api.nemorouter.ai/v1",
    api_key=os.environ["NEMOROUTER_API_KEY"],
    nemo_prompt_template_id="your-summarizer-id",
    nemo_prompt_variables={"language": "Spanish"},
    nemo_guardrail_ids=["guardrail-uuid-1"],
    nemo_cache=False,
)

Streaming

stream = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    api_base="https://api.nemorouter.ai/v1",
    api_key=os.environ["NEMOROUTER_API_KEY"],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Response Headers

LiteLLM exposes raw response headers via response._response_headers:

cost = response._response_headers.get("x-nemo-request-cost")
guardrails = response._response_headers.get("x-nemo-guardrails-applied")
print(f"Cost: ${cost} | Guardrails: {guardrails}")

Next Steps

  • Python SDK — Native Nemo Router SDK (no LiteLLM dependency)
  • LangChain — For chain-style orchestration
Was this page helpful?