Python (OpenAI client)

Use Nemo Router as a drop-in replacement with the OpenAI Python SDK

Last updated

If your codebase already uses the OpenAI Python client, you don't need to switch SDKs. Set base_url to https://api.nemorouter.ai/v1 and use your sk-nemo-... key as the api_key — no other changes.

Looking for the native Nemo Router Python SDK with first-class types? See Python SDK.

Installation

pip install openai

Setup

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NEMOROUTER_API_KEY"],
    base_url="https://api.nemorouter.ai/v1",
)

Chat Completion

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Switching Providers

You don't need new credentials, new SDKs, or new endpoints to change models — just change the model string:

# OpenAI
client.chat.completions.create(model="gpt-4o", messages=[...])

# Anthropic
client.chat.completions.create(model="claude-sonnet-4-20250514", messages=[...])

# Google
client.chat.completions.create(model="gemini-2.5-pro", messages=[...])

Per-Request Overrides

The OpenAI SDK supports extra_body to forward Nemo-specific fields:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize Q1 earnings..."}],
    extra_body={
        "nemo_guardrail_ids": ["guardrail-uuid-1"],
        "nemo_prompt_template_id": "your-summarizer-id",
        "nemo_prompt_variables": {"language": "Spanish"},
        "nemo_cache": False,
    },
)

Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end="", flush=True)

Response Headers

Every successful response carries:

  • x-nemo-request-cost — exact USD spend for this call
  • x-nemo-guardrails-applied — comma-separated guardrail names that ran
  • x-nemo-prompt-version — version of the prompt template applied (if any)

Read them off response._response.headers (synchronous client) or via the with_raw_response accessor.

Next Steps

Was this page helpful?