Stop hardcoding prompts in client code. v1.2 ships server-side prompt management and deterministic A/B testing — both gated through the same single endpoint your SDK already calls.

Prompt templates

Create a template once, reference it by ID from any request. Templates are versioned: editing a template creates a new version automatically, and you can pin a request to a specific version (prompt_version) or always take the latest (default).

Templates support Jinja2 variables. Pass values via extra_body:

{
  "model": "gemini-2.5-flash",
  "extra_body": {
    "nemo_prompt_template_id": "summarizer_v2",
    "nemo_prompt_variables": { "doc_type": "research paper", "max_words": 200 }
  }
}

Per-template cost and token usage roll up in the dashboard so you can compare "summarizer_v2" vs "summarizer_v3" without instrumenting your own analytics.

A/B testing — deterministic

The A/B engine (nemo_backend/prompts/ab_test_engine.py) hashes a stable key (request ID, user ID, or org-scoped seed) and splits traffic by configured percentages. Same hash, same variant, every time — so the same user sees the same variant across requests, and your analytics aren't poisoned by re-bucketing.

Variants can swap the model (gpt-5 vs gemini-2.5-pro), the prompt template, or both. A test is a state machine: draft → running → paused → completed. While running, the per-variant cost, latency, and error rate are visible in real-time on the prompts dashboard.

A/B tests are not overridable per-request — the whole point is determinism. If you need a control-group bypass, create a virtual key outside the experiment scope.

Manage prompts + experiments at /{org}/prompts.