NemoRouter
API Reference

Embeddings

Generate vector embeddings for text

Embeddings

The Embeddings endpoint generates vector representations of text that can be used for semantic search, clustering, classification, and retrieval-augmented generation (RAG).

Endpoint

POST https://api.nemorouter.ai/v1/embeddings

Request Headers

HeaderRequiredDescription
AuthorizationYesBearer <NEMOROUTER_API_KEY>
Content-TypeYesapplication/json

Request Body

ParameterTypeRequiredDefaultDescription
modelstringYesThe embedding model to use (e.g., text-embedding-3-small, text-embedding-3-large)
inputstring or arrayYesThe text to embed. Can be a single string or an array of strings.
encoding_formatstringNo"float"Format of the returned embeddings: "float" or "base64"
dimensionsintegerNoModel defaultThe number of dimensions for the output embedding (supported by some models)

Example Request

cURL

curl https://api.nemorouter.ai/v1/embeddings \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "NemoRouter is an enterprise LLM gateway."
  }'

Batch Embeddings

Generate embeddings for multiple texts in a single request:

curl https://api.nemorouter.ai/v1/embeddings \
  -H "Authorization: Bearer $NEMOROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": [
      "NemoRouter is an enterprise LLM gateway.",
      "One API key for every major LLM provider.",
      "Built-in guardrails and budget controls."
    ]
  }'

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-nemo-your-key-here",
    base_url="https://api.nemorouter.ai/v1",
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="NemoRouter is an enterprise LLM gateway.",
)

embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.NEMOROUTER_API_KEY,
  baseURL: "https://api.nemorouter.ai/v1",
});

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "NemoRouter is an enterprise LLM gateway.",
});

const embedding = response.data[0].embedding;
console.log(`Embedding dimensions: ${embedding.length}`);

Custom Dimensions

Some models (like text-embedding-3-small and text-embedding-3-large) support reducing the output dimensions:

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="NemoRouter is an enterprise LLM gateway.",
    dimensions=256,  # Reduce from default 3072 to 256
)

Reducing dimensions saves storage and speeds up similarity searches, with a small trade-off in accuracy.

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283,
        -0.005336422,
        0.024047505,
        -0.003570588,
        0.013664783
      ]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Response Fields

FieldTypeDescription
objectstringAlways "list"
dataarrayArray of embedding objects
data[].objectstringAlways "embedding"
data[].indexintegerIndex of the input text this embedding corresponds to
data[].embeddingarrayThe embedding vector (array of floats)
modelstringThe model used
usageobjectToken usage statistics
usage.prompt_tokensintegerNumber of tokens in the input
usage.total_tokensintegerTotal tokens processed

Available Embedding Models

ModelProviderDimensionsDescription
text-embedding-3-smallOpenAI1536Fast, cost-effective embeddings
text-embedding-3-largeOpenAI3072Higher accuracy, supports custom dimensions
text-embedding-ada-002OpenAI1536Legacy model, still widely used

NemoRouter supports additional embedding models from other providers. Use the Models API to list all available embedding models.

Use Cases

Generate embeddings for your document corpus and query to find semantically similar content:

# Embed your documents
docs = [
    "NemoRouter provides a unified LLM gateway.",
    "API keys are created from the dashboard.",
    "Guardrails protect against harmful content.",
]

doc_embeddings = client.embeddings.create(
    model="text-embedding-3-small",
    input=docs,
)

# Embed the search query
query_embedding = client.embeddings.create(
    model="text-embedding-3-small",
    input="How do I create an API key?",
)

# Compare with cosine similarity (using numpy)
import numpy as np

query_vec = np.array(query_embedding.data[0].embedding)
for i, doc_emb in enumerate(doc_embeddings.data):
    doc_vec = np.array(doc_emb.embedding)
    similarity = np.dot(query_vec, doc_vec) / (
        np.linalg.norm(query_vec) * np.linalg.norm(doc_vec)
    )
    print(f"Doc {i}: {similarity:.4f}{docs[i]}")

RAG (Retrieval-Augmented Generation)

Combine embeddings with chat completions for grounded responses:

# 1. Embed the query
query = "How do budget controls work?"
query_emb = client.embeddings.create(
    model="text-embedding-3-small",
    input=query,
)

# 2. Retrieve relevant documents (from your vector store)
relevant_docs = search_vector_store(query_emb.data[0].embedding)

# 3. Generate a grounded response
context = "\n".join(relevant_docs)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"Answer based on this context:\n{context}"},
        {"role": "user", "content": query},
    ],
)

Error Responses

StatusDescription
400 Bad RequestInvalid input or unsupported model
401 UnauthorizedInvalid or missing API key
402 Payment RequiredInsufficient credit balance
429 Too Many RequestsRate limit exceeded

Next Steps