FrameworksLlamaIndex
LlamaIndex
Use Nemo Router with LlamaIndex for RAG, agents, and query engines
Last updated
LlamaIndex's OpenAI LLM and OpenAIEmbedding work directly with Nemo Router. Set api_base to https://api.nemorouter.ai/v1 — guardrails, caching, and rate-limits auto-apply.
Installation
pip install llama-index-llms-openai llama-index-embeddings-openai nemoroutersdkSetup
import os
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
NEMO_KEY = os.environ["NEMOROUTER_API_KEY"]
NEMO_BASE = "https://api.nemorouter.ai/v1"
Settings.llm = OpenAI(
model="claude-sonnet-4-20250514",
api_key=NEMO_KEY,
api_base=NEMO_BASE,
)
Settings.embed_model = OpenAIEmbedding(
model_name="text-embedding-3-small",
api_key=NEMO_KEY,
api_base=NEMO_BASE,
)Chat
from llama_index.core.llms import ChatMessage
response = Settings.llm.chat([
ChatMessage(role="user", content="What is quantum computing?"),
])
print(response.message.content)Per-Request Overrides
Pass nemo_* fields via additional_kwargs:
llm_with_prompt = OpenAI(
model="gpt-4o",
api_key=NEMO_KEY,
api_base=NEMO_BASE,
additional_kwargs={
"nemo_prompt_template_id": "your-summarizer-id",
"nemo_prompt_variables": {"language": "Spanish", "max_length": "100"},
"nemo_guardrail_ids": ["guardrail-uuid-1"],
"nemo_cache": False,
},
)RAG with Query Engine
from llama_index.core import VectorStoreIndex, Document
documents = [
Document(text="Nemo Router is an LLM gateway with 200+ models."),
Document(text="Guardrails protect every request: PII, injection, keywords."),
]
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("How does Nemo Router handle safety?")
print(response)The user query is checked by guardrails before it reaches the model. Prompt-injection attempts on RAG queries are blocked before retrieval starts.
Next Steps
- LangChain — Same pattern for chain-style RAG
- Python SDK — Without LlamaIndex
Was this page helpful?