Zero-Downtime Migrations With Two Schema Owners
One Postgres, two migration engines — Alembic and Prisma — that must never touch each other. Here is how additive-only migrations and idempotent hotfixes keep deploys safe and downtime-free.
NemoRouter runs one Postgres database with two migration engines living inside it: Alembic owns the nemo schema, and Prisma (bundled inside the LiteLLM dependency) owns public. Two owners, one database, and a strict rule that they never touch each other's tables. Getting migrations wrong here doesn't mean a slow deploy — it once meant wiped production tables. This post is the discipline that keeps schema changes boring.
The setup that makes this tricky
ONE Postgres
├─ nemo schema ← Alembic owns. Our features.
└─ public schema ← Prisma owns (inside LiteLLM). We NEVER create tables here.The hazard: Prisma, on its own migrations, will drop tables in public that it doesn't know about. So if we ever created a table in public, Prisma would delete it on its next run. The rule that prevents this is absolute — never create tables in public — and it's why all our tables live in nemo.
The incident that set the rule
A Prisma db push once wiped LiteLLM tables in production. The lesson wasn't "be careful with push" — it was "disable destructive auto-migration entirely." use_prisma_db_push is hard-disabled. The safeguard isn't vigilance (which fails); it's removing the dangerous capability so it can't fire.
Additive-only is the zero-downtime trick
Zero-downtime migration has one core principle: a schema change must be compatible with both the old and new code at once, because during a deploy both are running. Additive changes satisfy this; destructive ones don't.
| Change | Old code survives? | Safe mid-deploy? |
|---|---|---|
| Add a nullable column | yes | ✅ |
| Add a table | yes | ✅ |
| Add an index (concurrently) | yes | ✅ |
| Drop a column | no — old code reads it | ❌ |
| Rename a column | no — both names break one side | ❌ |
So we add, we don't drop or rename in the same step. A column removal becomes a multi-deploy dance: stop writing it, deploy; stop reading it, deploy; drop it, deploy. Tedious, and the reason nothing breaks while traffic flows.
Prisma drift and idempotent hotfixes
LiteLLM's schema (public) drifts when we bump the pinned dependency — a new version adds a column our running code expects. We can't let Prisma's destructive push reconcile that, so we apply additive, idempotent hotfixes by hand:
ALTER TABLE "LiteLLM_VerificationToken"
ADD COLUMN IF NOT EXISTS "new_field" TEXT;ADD COLUMN IF NOT EXISTS is the whole trick: it's safe to run repeatedly, safe if the column already exists, and never destructive. The pre-startup readiness routine replays every hotfix file before the app boots, so the schema is always ahead of (or equal to) what the code needs. (This is the DB-readiness gate.)
The pre-startup readiness sequence
Before the backend serves a request, three idempotent steps run, in order, without asking:
1. Alembic migrate-up → nemo schema at head
2. replay prisma-hotfixes/* → public-schema drift patched (additive)
3. NOTIFY pgrst reload → PostgREST picks up the new schemaAll three are idempotent, so running them on every boot is safe — a no-op if already applied. "Apply, don't ask" is the policy: the readiness gate is standing authorization to bring the schema current, because a backend that boots against a stale schema is the failure we're preventing.
Why "200 OK" isn't proof a migration worked
A subtle operational point: a health check can stay green while a migration is silently failing on the request path. A readiness probe caches for a few seconds and can report healthy while the actual LLM path 5xxs on a missing column, or a background poller spams column does not exist. So a deploy isn't "done" at the first 200 — it's done after a revision-scoped log scan shows no DataError/column does not exist for a minute past boot. Migrations are verified by logs, not by a single green check.
The takeaway
Two schema owners in one database is safe only with discipline: never create tables in the other engine's schema, disable destructive auto-migration outright, make every change additive so old and new code coexist during the deploy, and patch dependency drift with idempotent IF NOT EXISTS hotfixes replayed at startup. Then a migration is a non-event — applied automatically, compatible by construction, and verified in the logs rather than assumed from a 200.