Database Multi-Tenancy Deep Dive

# Multi-Tenancy at the Database Level: A Technical Deep Dive When we designed our multi-tenant architecture, the first decision was: where does tenant isolation happen? Application layer? Database schemas? Separate databases? We chose database-level isolation using PostgreSQL's Row-Level Security (RLS), and after running it in production with hundreds of tenants, here's what we learned. ## Why Database-Level Isolation Application-level tenant filtering (adding WHERE tenant_id = ? to every query) is the most common approach. It's also the most fragile. In a codebase with thousands of queries, one missed filter creates a data leak. Code review catches many, but not all. Database-level isolation means the database enforces tenant boundaries regardless of what the application sends. Even if the application has a bug that omits the tenant filter, the database returns only authorized rows. We've seen this save us at least three times in production — queries introduced during refactoring that accidentally missed the tenant scope. Without RLS, those would have been cross-tenant data leaks. With RLS, they returned empty results and we caught the bug in testing. ## The Implementation Pattern ### Step 1: Tenant Column on Every Table Every tenant-scoped table has a tenant_id column. This is a UUID foreign key to the tenants table. Non-tenant tables (like the tenants table itself, or global configuration) are excluded from RLS. ### Step 2: Enable RLS For each tenant-scoped table, enable RLS. In PostgreSQL this is a simple ALTER TABLE statement. Once enabled, the default policy is "deny all" — no rows are visible until you create explicit policies. ### Step 3: Create Policies We define two policies per table: one for SELECT (read access) and one for INSERT/UPDATE/DELETE (write access). Both filter on tenant_id matching the current session's tenant context. The tenant context is set using a session variable at the start of each request. The middleware extracts the tenant ID from the authenticated user's token and sets it as a PostgreSQL configuration parameter. ### Step 4: Application Middleware Every request flows through middleware that: 1. Authenticates the user 2. Resolves their tenant ID 3. Sets the PostgreSQL session variable 4. Proceeds to the route handler By the time any route handler executes a query, the database session is already scoped to the correct tenant. The handler doesn't need to worry about tenant filtering — it's handled. ## Performance Considerations ### Does RLS Slow Queries? The honest answer: marginally. PostgreSQL applies RLS policies as additional WHERE clause predicates. With a proper index on tenant_id, the performance impact is 1-3% per query. On most queries, this is sub-millisecond and unmeasurable in practice. We benchmarked our 50 most frequent queries with and without RLS. Average overhead: 0.8ms. Maximum overhead: 2.3ms. For a web application where API response times are 50-200ms, this is noise. ### Indexing Strategy Every tenant-scoped table needs a composite index starting with tenant_id. For frequently queried tables, this means indexes like (tenant_id, created_at), (tenant_id, status), or (tenant_id, email). PostgreSQL's query planner uses the tenant_id predicate from RLS to prune partitions effectively. With proper indexes, a query against a 100-million-row table with 1,000 tenants performs as if each tenant has their own 100,000-row table. ### Connection Pooling RLS uses session-level variables, which means each database session is scoped to a specific tenant. With connection pooling (PgBouncer), you need transaction-level pooling — the tenant context is set at the start of each transaction and cleared at the end. This is a common gotcha. Session-level pooling would leak tenant context between requests. Transaction-level pooling resets the session cleanly between requests. ## Migrations in a Multi-Tenant World Schema migrations with RLS require care: **Adding new tables:** Create the table, add the tenant_id column, enable RLS, create policies. The migration must include all four steps — a table without RLS in a multi-tenant system is a security gap. **Adding columns:** Standard ALTER TABLE. RLS policies apply to the table, not individual columns, so new columns are automatically covered. **Data migrations:** These are trickier. A migration that transforms data must respect tenant boundaries. Run the migration per-tenant or temporarily elevate to a superuser role that bypasses RLS (with extreme caution). **Backfilling tenant_id:** If you're adding multi-tenancy to an existing single-tenant database, the backfill migration is the hardest part. Every row needs a tenant_id assigned before RLS can be enabled. ## Testing Multi-Tenancy Our test suite includes specific multi-tenant assertions: **Cross-tenant isolation tests:** Create data in Tenant A, authenticate as Tenant B, verify Tenant B can't see Tenant A's data. Run this for every table. **Tenant context tests:** Verify that requests without a tenant context return zero results (not all results). This catches missing RLS policies. **Performance tests:** Ensure that queries with RLS don't exhibit unexpected slowdowns as tenant count increases. **Migration tests:** Every new migration is tested against a multi-tenant dataset to verify it doesn't break RLS policies or create unscoped tables. ## Gotchas We Encountered **Background jobs.** Jobs running outside of a request context don't have a tenant session. We solved this by explicitly setting the tenant context at the start of each job execution. **Database admin tools.** Connecting directly to the database with a superuser account bypasses RLS. This is by design but requires operational awareness. Admin database access must be audited and restricted. **Aggregate queries across tenants.** Generating platform-wide analytics (total users, total records) requires a superuser role that bypasses RLS. We run these queries through a dedicated analytics pipeline with strict access controls. **Foreign keys across tables.** If Table A references Table B, and both have RLS, the foreign key check must also pass the RLS policy. PostgreSQL handles this correctly, but it's worth testing explicitly. ## The Alternative We Considered Schema-per-tenant (each tenant gets their own PostgreSQL schema) provides stronger isolation but introduces operational complexity: each migration runs against every schema, connection pooling becomes per-schema, and the number of database objects grows linearly with tenants. For platforms expecting thousands of tenants, schema-per-tenant doesn't scale operationally. RLS on shared tables scales to millions of tenants with constant operational overhead. For platforms with fewer than 100 tenants where regulatory requirements demand maximum isolation, schema-per-tenant may be the better choice. ## Our Recommendation For most multi-tenant SaaS platforms, PostgreSQL RLS on shared tables is the optimal balance of security, performance, and operational simplicity. The initial setup takes 2-3 days. The ongoing benefit — bulletproof tenant isolation that developers don't need to think about — is permanent. The database should be the last line of defense, not the only line. But knowing that the last line is solid lets your team move faster with confidence.