NemoRouter
API Reference

Chat Completions

Send messages and receive AI-generated responses

Chat Completions

The Chat Completions endpoint is the primary way to interact with LLMs through NemoRouter. It's fully compatible with the OpenAI Chat Completions API, so any existing code or SDK that works with OpenAI will work with NemoRouter.

Public playground — try chat completions interactively

Endpoint

POST https://api.nemorouter.ai/v1/chat/completions

Request Headers

HeaderRequiredDescription
AuthorizationYesBearer <NEMOROUTER_API_KEY>
Content-TypeYesapplication/json

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesThe model to use (e.g., gpt-4o, claude-4-sonnet, gemini-2.5-pro)
messagesarrayYesArray of message objects with role and content
temperaturenumberNo1.0Sampling temperature between 0 and 2. Lower values are more deterministic.
max_tokensintegerNoModel defaultMaximum number of tokens to generate
top_pnumberNo1.0Nucleus sampling parameter. Alternative to temperature.
nintegerNo1Number of completions to generate
streambooleanNofalseWhether to stream the response using server-sent events
stopstring or arrayNonullSequences where the model will stop generating
presence_penaltynumberNo0Penalize tokens based on whether they appear in the text so far (-2.0 to 2.0)
frequency_penaltynumberNo0Penalize tokens based on frequency in the text so far (-2.0 to 2.0)
response_formatobjectNonullSet to {"type": "json_object"} for JSON mode
seedintegerNonullSeed for deterministic generation (best effort)
toolsarrayNonullList of tool/function definitions for function calling
tool_choicestring or objectNo"auto"Controls which tool the model calls

Message Object

Each message in the messages array has the following structure:

FieldTypeRequiredDescription
rolestringYesOne of system, user, assistant, or tool
contentstringYesThe message content
namestringNoOptional name for the message author
tool_callsarrayNoTool calls made by the assistant (for assistant role)
tool_call_idstringNoID of the tool call being responded to (for tool role)

NemoRouter Extra Fields

NemoRouter supports additional fields via extra_body for platform-specific features:

FieldTypeDescription
nemo_guardrail_idsarrayRun only these specific guardrail IDs (overrides org defaults)
nemo_cachebooleanSet to false to skip cache for this request
nemo_prompt_template_idstringApply a saved prompt template
nemo_prompt_variablesobjectVariables to inject into the prompt template

Example Request

cURL

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that provides concise answers."
      },
      {
        "role": "user",
        "content": "Explain quantum computing in 3 sentences."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in 3 sentences."},
    ],
    temperature=0.7,
    max_tokens=256,
)

print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain quantum computing in 3 sentences." },
  ],
  temperature: 0.7,
  max_tokens: 256,
});

console.log(response.choices[0].message.content);

Response

{
  "id": "chatcmpl-abc123def456",
  "object": "chat.completion",
  "created": 1709251200,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously through superposition..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 85,
    "total_tokens": 115
  }
}

Response Fields

FieldTypeDescription
idstringUnique identifier for the completion
objectstringAlways chat.completion
createdintegerUnix timestamp of creation
modelstringThe model used for the completion
choicesarrayArray of completion choices
choices[].indexintegerIndex of the choice
choices[].messageobjectThe generated message
choices[].finish_reasonstringstop, length, tool_calls, or content_filter
usageobjectToken usage statistics
usage.prompt_tokensintegerTokens in the prompt
usage.completion_tokensintegerTokens in the completion
usage.total_tokensintegerTotal tokens used

NemoRouter Response Headers

NemoRouter adds observability headers to every response:

HeaderDescription
x-nemo-org-idYour organization ID
x-nemo-key-aliasThe key alias used for this request

Streaming

Set stream: true to receive the response as server-sent events (SSE):

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a short poem."}],
    "stream": true
  }'

Each SSE event contains a chunk of the response:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1709251200,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}

data: [DONE]

Function Calling

NemoRouter supports OpenAI-compatible function calling with models that support it:

curl https://api.nemorouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in San Francisco?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City and state, e.g. San Francisco, CA"
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'

Error Responses

StatusDescription
400 Bad RequestInvalid request body or unsupported parameter
401 UnauthorizedInvalid or missing API key
402 Payment RequiredInsufficient credit balance
429 Too Many RequestsRate limit exceeded (RPM or TPM)
500 Internal Server ErrorServer-side error — retry with backoff

Next Steps