$5 free credits when you sign up
Observability

Every request, fully visible.

Logs, alerts, and callbacks for every LLM call. See what happened, get notified when things break, pipe data to your existing tools. Cost is the provider’s number — never ours, never reconstructed.

request_log · 12:04:31

Latest request

request_idreq_a1b2c3d4
modelgemini-2.5-flash
status200
latency (total)0.6 s
TTFT120 ms
tokens (in / out)142 / 218
cost (header)$0.0008
callbacks firedlangfuse, datadog
90d retentionmetadata onlyPII-redact off
Default log retention
90 days

Financial records retained indefinitely

Cost source
x-nemo-response-cost

Provider header — never reconstructed

Latency percentiles
p50 · p95 · p99

Per model, per key, per org

PII masking
Optional

Per-org policy: zero / meta / full / redacted

Capabilities

Logs, alerts, callbacks, control

Every request is visible, every integration is a toggle, and every cost number comes straight from the provider header — no reconciliation drift.

Request logs — searchable, filterable

Every LLM call captured with model, status, latency, cost, tokens, and full prompt/response (if your data policy allows). Search by request ID, model, status, or time range.

  • Per-row expand: full prompt + completion + provider raw body
  • Filter by model, status, time range, key, team
  • Cost is read from the x-nemo-response-cost header — never recomputed
  • 90-day retention default; longer on Enterprise

Logging callbacks

Pipe logs to Langfuse, Datadog, S3, Slack, or custom HTTP endpoints. One toggle per provider. Each callback receives structured JSON — you build your own dashboards in your existing stack.

  • Langfuse: traces + spans + generations
  • Datadog: metrics + logs + correlation IDs
  • S3: raw JSON archive for compliance + batch analytics
  • Slack: real-time event feed for ops channels
  • Custom HTTPS webhook with signed payloads

Latency metrics (p50 / p95 / p99)

Percentile latency calculated per model, per virtual key, per team, per org. See where your tail latency lives, set SLO alerts, identify the model + provider mix that hits your budget.

  • p50 / p95 / p99 + max, calculated server-side
  • Time-to-first-token and total latency tracked separately
  • Streaming-aware: TTFT measured at first SSE event
  • CSV export for long-tail diagnosis

8 alert types — email, Slack, Teams, webhook

LLM errors, budget thresholds, provider outages, latency spikes, rate-limit hits, guardrail blocks, key expirations, spend anomalies. Each toggleable per org, multi-channel delivery.

  • LLM errors · budget · outage · latency · rate-limit
  • Guardrail blocks · key expiration · spend anomaly
  • Multi-channel: email + Slack + Teams + webhook
  • Per-org thresholds with hysteresis to prevent flapping

Data policy — 4 levels

Choose what is logged per org: nothing, metadata only (default), full prompts and responses, or full-with-PII-redaction. Switch at any time — applies immediately to future requests.

  • Zero: only billing-essential token + cost data retained
  • Metadata (default): model, tokens, cost, latency, status, request ID
  • Full: complete prompt + response stored
  • PII-redacted: full logging with Presidio masking pre-storage

Async — zero added latency

Log ingestion happens after the response is returned to the caller. Callbacks to external services are also processed asynchronously — they never block the request path.

  • < 100 ms ingestion latency to in-product log viewer
  • Callback delivery is fire-and-forget with retry queue
  • Failed callbacks do not affect request success
  • Cost-header passthrough is the only sync invariant
Log Flow

Provider → Backend → Callbacks (async)

Cost is read from the x-nemo-response-cost header on the provider response — the safety invariant for analytics, billing, and the credit-ledger reconcile job. Logs are recorded, callbacks fan out async.

Async log path

  1. Provider

    Vertex / Anthropic / OpenAI

    Returns x-nemo-response-cost header — the source of truth.

  2. Nemo Intelligent Proxy Router

    In-process ASGI

    Computes cost, attaches metrics, emits log event.

  3. Nemo Backend

    :8090 — FastAPI

    Records request_log, fans out to enabled callbacks async.

  4. Callbacks

    Langfuse · Datadog · S3 · Slack

    Fire-and-forget delivery with retry queue.

The Nemo Intelligent Proxy Router runs in-process inside Nemo Backend — there is no separate gateway service to harden, no extra hop where the cost header can be lost or rewritten.

Cost Truth

Provider cost header is the source of truth

We never compute cost ourselves. The Nemo Intelligent Proxy Router owns cost; we read x-nemo-response-cost off the response and write it to the request_log + the credit ledger in the same transaction. If the header is missing, the row is flagged for reconcile — never silently zeroed.

Cost integrity

One number, four surfaces, zero drift

The same x-nemo-response-cost value lands in: (1) the request_log row visible in the dashboard, (2) the credit_transactions ledger entry, (3) any active callbacks (Langfuse, Datadog, S3), and (4) the spend-by-tag analytics aggregation. There is one upstream value and four downstream consumers — no recomputation, no reconciliation drift.

  • Source: provider response header (x-nemo-response-cost)
  • Sinks: request_log, credit ledger, callbacks, analytics
  • Missing-header rows flagged for the gap-hunter scanner
  • Daily ledger parity check: sum(transactions) == balance
  • Zero-cost settlements are the canonical revenue-leak signal
cost · last 60s

Cost-header integrity

Requests with cost header8 421 / 8 421
Missing-header rows0
Settled to ledger8 421
Ledger drift (24h)$0.00
Callbacks firedlangfuse · datadog · s3
Cost sourcex-nemo-response-cost
header truthno recomputationno drift
Data Policy

Four levels of logging — switch any time

Per-org control

From zero logging to full-with-redaction

Choose exactly what gets stored per organization. Every level is available on every plan. Change your policy at any time from the dashboard — updates apply to all future requests immediately. PII redaction is powered by Microsoft Presidio and runs pre-storage.

  • Zero logging — only billing-essential data retained
  • Metadata only (default) — model, tokens, cost, latency, status
  • Full logging — complete prompt + response stored 90 days
  • PII-redacted — full logging, Presidio masks PII pre-storage
  • GDPR + SOC 2 friendly; Enterprise can pin per-region

Zero logging

Nothing stored beyond billing-essential token + cost data.

Metadata only

Model, tokens, cost, latency, status, request ID. The default.

Full logging

Complete prompt + response stored. 90-day retention.

PII-redacted

Full logging with Presidio masking pre-storage.

Integrations

Pipe data to your existing stack

Enable callbacks with a single toggle and start receiving structured log data in seconds. Each callback receives JSON with full request metadata: model, tokens, cost, latency, status.

Langfuse

Open-source LLM observability. Send traces, spans, and generations for prompt engineering, cost tracking, and quality evaluation. Full trace context preserved.

Datadog

Enterprise APM and monitoring. Correlate LLM request metrics with your existing infrastructure dashboards, set up monitors, track SLOs alongside application metrics.

AWS S3

Archive raw request and response logs to S3 for long-term storage, compliance audits, and batch analytics. Configure retention policies and lifecycle rules.

Slack

Real-time alert notifications in your team channels. Errors, budget thresholds, provider outages, and guardrail blocks where your team already communicates.

Email + Teams

Per-user email digests and Microsoft Teams channel notifications. Same alert types, same multi-channel delivery semantics.

Custom Webhook

Send structured JSON to any HTTPS endpoint with signed payloads. Build custom integrations with internal tools, data warehouses, or workflow automation.

FAQ

Common observability questions

90-day retention · 4 callbacks · 8 alerts

See every request — without DIY observability infrastructure

Sign up, enable a callback or two, watch the data flow into your existing stack. No agents to install, no schema to invent.