NemoRouter runs one Postgres database with two migration engines living inside it: Alembic owns the nemo schema, and Prisma (bundled inside the LiteLLM dependency) owns public. Two owners, one database, and a strict rule that they never touch each other's tables. Getting migrations wrong here doesn't mean a slow deploy — it once meant wiped production tables. This post is the discipline that keeps schema changes boring.

The setup that makes this tricky

ONE Postgres
 ├─ nemo schema    ← Alembic owns. Our features.
 └─ public schema  ← Prisma owns (inside LiteLLM). We NEVER create tables here.

The hazard: Prisma, on its own migrations, will drop tables in public that it doesn't know about. So if we ever created a table in public, Prisma would delete it on its next run. The rule that prevents this is absolute — never create tables in public — and it's why all our tables live in nemo.

The incident that set the rule

A Prisma db push once wiped LiteLLM tables in production. The lesson wasn't "be careful with push" — it was "disable destructive auto-migration entirely." use_prisma_db_push is hard-disabled. The safeguard isn't vigilance (which fails); it's removing the dangerous capability so it can't fire.

Additive-only is the zero-downtime trick

Zero-downtime migration has one core principle: a schema change must be compatible with both the old and new code at once, because during a deploy both are running. Additive changes satisfy this; destructive ones don't.

Change	Old code survives?	Safe mid-deploy?
Add a nullable column	yes	✅
Add a table	yes	✅
Add an index (concurrently)	yes	✅
Drop a column	no — old code reads it	❌
Rename a column	no — both names break one side	❌

So we add, we don't drop or rename in the same step. A column removal becomes a multi-deploy dance: stop writing it, deploy; stop reading it, deploy; drop it, deploy. Tedious, and the reason nothing breaks while traffic flows.

Prisma drift and idempotent hotfixes

LiteLLM's schema (public) drifts when we bump the pinned dependency — a new version adds a column our running code expects. We can't let Prisma's destructive push reconcile that, so we apply additive, idempotent hotfixes by hand:

ALTER TABLE "LiteLLM_VerificationToken"
  ADD COLUMN IF NOT EXISTS "new_field" TEXT;

ADD COLUMN IF NOT EXISTS is the whole trick: it's safe to run repeatedly, safe if the column already exists, and never destructive. The pre-startup readiness routine replays every hotfix file before the app boots, so the schema is always ahead of (or equal to) what the code needs. (This is the DB-readiness gate.)

The pre-startup readiness sequence

Before the backend serves a request, three idempotent steps run, in order, without asking:

1. Alembic migrate-up        → nemo schema at head
2. replay prisma-hotfixes/*  → public-schema drift patched (additive)
3. NOTIFY pgrst reload       → PostgREST picks up the new schema

All three are idempotent, so running them on every boot is safe — a no-op if already applied. "Apply, don't ask" is the policy: the readiness gate is standing authorization to bring the schema current, because a backend that boots against a stale schema is the failure we're preventing.

Why "200 OK" isn't proof a migration worked

A subtle operational point: a health check can stay green while a migration is silently failing on the request path. A readiness probe caches for a few seconds and can report healthy while the actual LLM path 5xxs on a missing column, or a background poller spams column does not exist. So a deploy isn't "done" at the first 200 — it's done after a revision-scoped log scan shows no DataError/column does not exist for a minute past boot. Migrations are verified by logs, not by a single green check.

The takeaway

Two schema owners in one database is safe only with discipline: never create tables in the other engine's schema, disable destructive auto-migration outright, make every change additive so old and new code coexist during the deploy, and patch dependency drift with idempotent IF NOT EXISTS hotfixes replayed at startup. Then a migration is a non-event — applied automatically, compatible by construction, and verified in the logs rather than assumed from a 200.

Zero-Downtime Migrations With Two Schema Owners

The setup that makes this tricky

Additive-only is the zero-downtime trick

Prisma drift and idempotent hotfixes

The pre-startup readiness sequence

Why "200 OK" isn't proof a migration worked

The takeaway

More from Engineering

Hydration-Safe Rendering for Money and Time

Canary Deploys and Auto-Rollback by SLO

Platform Fee

Platform Fee

Credit Ledger Parity Checks: Catching Drift Early