$5 free credits when you sign up
← All posts
Engineering

Zero-Downtime Migrations With Two Schema Owners

One Postgres, two migration engines — Alembic and Prisma — that must never touch each other. Here is how additive-only migrations and idempotent hotfixes keep deploys safe and downtime-free.

Nemo Team9 min read

NemoRouter runs one Postgres database with two migration engines living inside it: Alembic owns the nemo schema, and Prisma (bundled inside the LiteLLM dependency) owns public. Two owners, one database, and a strict rule that they never touch each other's tables. Getting migrations wrong here doesn't mean a slow deploy — it once meant wiped production tables. This post is the discipline that keeps schema changes boring.

The setup that makes this tricky

ONE Postgres
 ├─ nemo schema    ← Alembic owns. Our features.
 └─ public schema  ← Prisma owns (inside LiteLLM). We NEVER create tables here.

The hazard: Prisma, on its own migrations, will drop tables in public that it doesn't know about. So if we ever created a table in public, Prisma would delete it on its next run. The rule that prevents this is absolute — never create tables in public — and it's why all our tables live in nemo.

The incident that set the rule

A Prisma db push once wiped LiteLLM tables in production. The lesson wasn't "be careful with push" — it was "disable destructive auto-migration entirely." use_prisma_db_push is hard-disabled. The safeguard isn't vigilance (which fails); it's removing the dangerous capability so it can't fire.

Additive-only is the zero-downtime trick

Zero-downtime migration has one core principle: a schema change must be compatible with both the old and new code at once, because during a deploy both are running. Additive changes satisfy this; destructive ones don't.

ChangeOld code survives?Safe mid-deploy?
Add a nullable columnyes
Add a tableyes
Add an index (concurrently)yes
Drop a columnno — old code reads it
Rename a columnno — both names break one side

So we add, we don't drop or rename in the same step. A column removal becomes a multi-deploy dance: stop writing it, deploy; stop reading it, deploy; drop it, deploy. Tedious, and the reason nothing breaks while traffic flows.

Prisma drift and idempotent hotfixes

LiteLLM's schema (public) drifts when we bump the pinned dependency — a new version adds a column our running code expects. We can't let Prisma's destructive push reconcile that, so we apply additive, idempotent hotfixes by hand:

ALTER TABLE "LiteLLM_VerificationToken"
  ADD COLUMN IF NOT EXISTS "new_field" TEXT;

ADD COLUMN IF NOT EXISTS is the whole trick: it's safe to run repeatedly, safe if the column already exists, and never destructive. The pre-startup readiness routine replays every hotfix file before the app boots, so the schema is always ahead of (or equal to) what the code needs. (This is the DB-readiness gate.)

The pre-startup readiness sequence

Before the backend serves a request, three idempotent steps run, in order, without asking:

1. Alembic migrate-up        → nemo schema at head
2. replay prisma-hotfixes/*  → public-schema drift patched (additive)
3. NOTIFY pgrst reload       → PostgREST picks up the new schema

All three are idempotent, so running them on every boot is safe — a no-op if already applied. "Apply, don't ask" is the policy: the readiness gate is standing authorization to bring the schema current, because a backend that boots against a stale schema is the failure we're preventing.

Why "200 OK" isn't proof a migration worked

A subtle operational point: a health check can stay green while a migration is silently failing on the request path. A readiness probe caches for a few seconds and can report healthy while the actual LLM path 5xxs on a missing column, or a background poller spams column does not exist. So a deploy isn't "done" at the first 200 — it's done after a revision-scoped log scan shows no DataError/column does not exist for a minute past boot. Migrations are verified by logs, not by a single green check.

The takeaway

Two schema owners in one database is safe only with discipline: never create tables in the other engine's schema, disable destructive auto-migration outright, make every change additive so old and new code coexist during the deploy, and patch dependency drift with idempotent IF NOT EXISTS hotfixes replayed at startup. Then a migration is a non-event — applied automatically, compatible by construction, and verified in the logs rather than assumed from a 200.

Written by Nemo TeamEngineering, product, and company posts from the Nemo Router team — code-first, cost-honest, no vendor-marketing fluff.

More from Engineering

All posts →
Engineering

Hydration-Safe Rendering for Money and Time

new Date() and Math.random() in a React render body cause hydration mismatches — and on a billing dashboard, a flicker on a number erodes trust. Here is the pattern that keeps server and client agreeing.

Nemo Team
8 min
Engineering

Canary Deploys and Auto-Rollback by SLO

A deploy shouldn't need a human watching a dashboard. Here is how a 5% canary, a fixed observation window, and SLO-gated auto-rollback let changes ship and self-heal without a 3 a.m. page.

Nemo Team
9 min
Engineering

Credit Ledger Parity Checks: Catching Drift Early

If a balance and its ledger ever disagree, money is wrong somewhere. Here is how continuous parity checks compare balance to ledger sum and surface a reservation leak before it becomes a billing incident.

Nemo Team
8 min