Full-suite pytest collision: RLS tests incompatible with function-scoped test_db fixture #145

Closed
opened 2026-04-24 14:30:22 +00:00 by chihlasm · 1 comment
Owner

UPDATE (2026-04-24 post-mortem)

The original framing of this issue focused on the test_rls_isolation.py fixture collision. That's real but secondary. The primary bug was much more dangerous and has been fixed in commit dab740d.

What actually happened

Running pytest tests/ inside the resolutionflow_backend container at Phase 9 Task 14 silently DROPPED THE DEV DATABASE SCHEMA. The backend crashed on startup with relation "users" does not exist and docker never restarted it — the dev environment was unusable until a manual container restart. Root cause was two layered bugs:

Bug 1 — conftest.TEST_DATABASE_URL preferred DATABASE_URL:

TEST_DATABASE_URL = os.environ.get(
    "DATABASE_URL",                    # ← production URL read FIRST
    os.environ.get("DATABASE_TEST_URL", "..."),
)

Inside the backend container, DATABASE_URL points at the dev DB. So pytest treated the dev DB as the test DB. The test_db fixture's DROP SCHEMA public CASCADE then wiped it.

Bug 2 — get_admin_db was not overridden in app.dependency_overrides.

conftest.py overrode get_db but not get_admin_db. Endpoints using get_admin_db (register, admin routes, service accounts) bypassed the test session and hit the real admin DB. Before Bug 1 was fixed, both engines happened to point at the same dev DB, so this worked by accident.

Fix (shipped in dab740d)

  1. conftest.py now honors ONLY DATABASE_TEST_URL (or the localhost fallback). DATABASE_URL is never consulted for the test DB.
  2. An assertion at module load refuses to run pytest if TEST_DATABASE_URL's database name doesn't contain "test":
    assert "test" in _test_db_name, f"Refusing to run tests against {_test_db_name!r} — ..."
    
  3. conftest.py now also overrides get_admin_db to yield the same test session (RLS isn't enabled in the create_all-managed test schema, so sharing is safe).
  4. docker-compose.dev.yml now sets DATABASE_TEST_URL=postgresql+asyncpg://postgres:postgres@db:5432/resolutionflow_test so pytest in the container works out of the box (requires CREATE DATABASE resolutionflow_test; once — the DB has been created).

What remains open in this issue

The original full-suite pytest tests/ collision between function-scoped test_db and module-scoped _ensure_rls_schema in test_rls_isolation.py is still unresolved. With the DB-isolation fix in place, it is no longer catastrophic — pytest tests/ can no longer nuke the dev DB — but it still produces noisy failure reports when run end-to-end (the RLS tests error on role/policy mismatch against a create_all-only schema, and cross-module state can leak).

Options from the original issue still apply:

  1. Cheap: exclude test_rls_isolation.py from the default collection; run it as a separate CI step.
  2. Medium: switch conftest.test_db to use alembic upgrade head instead of Base.metadata.create_all.
  3. Deep: tier the suite with session-scoped migration schema + per-test transactions.

Keeping this issue open for whoever takes that on. Title unchanged; severity is now lower.

## UPDATE (2026-04-24 post-mortem) The original framing of this issue focused on the `test_rls_isolation.py` fixture collision. That's real but **secondary**. The primary bug was much more dangerous and has been fixed in commit `dab740d`. ### What actually happened Running `pytest tests/` inside the `resolutionflow_backend` container at Phase 9 Task 14 silently DROPPED THE DEV DATABASE SCHEMA. The backend crashed on startup with `relation "users" does not exist` and docker never restarted it — the dev environment was unusable until a manual container restart. Root cause was two layered bugs: **Bug 1 — `conftest.TEST_DATABASE_URL` preferred `DATABASE_URL`:** ```python TEST_DATABASE_URL = os.environ.get( "DATABASE_URL", # ← production URL read FIRST os.environ.get("DATABASE_TEST_URL", "..."), ) ``` Inside the backend container, `DATABASE_URL` points at the dev DB. So pytest treated the dev DB as the test DB. The `test_db` fixture's `DROP SCHEMA public CASCADE` then wiped it. **Bug 2 — `get_admin_db` was not overridden in `app.dependency_overrides`.** `conftest.py` overrode `get_db` but not `get_admin_db`. Endpoints using `get_admin_db` (register, admin routes, service accounts) bypassed the test session and hit the real admin DB. Before Bug 1 was fixed, both engines happened to point at the same dev DB, so this worked by accident. ### Fix (shipped in `dab740d`) 1. `conftest.py` now honors ONLY `DATABASE_TEST_URL` (or the localhost fallback). `DATABASE_URL` is never consulted for the test DB. 2. An assertion at module load refuses to run pytest if `TEST_DATABASE_URL`'s database name doesn't contain "test": ```python assert "test" in _test_db_name, f"Refusing to run tests against {_test_db_name!r} — ..." ``` 3. `conftest.py` now also overrides `get_admin_db` to yield the same test session (RLS isn't enabled in the create_all-managed test schema, so sharing is safe). 4. `docker-compose.dev.yml` now sets `DATABASE_TEST_URL=postgresql+asyncpg://postgres:postgres@db:5432/resolutionflow_test` so pytest in the container works out of the box (requires `CREATE DATABASE resolutionflow_test;` once — the DB has been created). ### What remains open in this issue The original full-suite `pytest tests/` collision between function-scoped `test_db` and module-scoped `_ensure_rls_schema` in `test_rls_isolation.py` is **still unresolved**. With the DB-isolation fix in place, it is no longer catastrophic — `pytest tests/` can no longer nuke the dev DB — but it still produces noisy failure reports when run end-to-end (the RLS tests error on role/policy mismatch against a `create_all`-only schema, and cross-module state can leak). Options from the original issue still apply: 1. **Cheap**: exclude `test_rls_isolation.py` from the default collection; run it as a separate CI step. 2. **Medium**: switch `conftest.test_db` to use `alembic upgrade head` instead of `Base.metadata.create_all`. 3. **Deep**: tier the suite with session-scoped migration schema + per-test transactions. Keeping this issue open for whoever takes that on. Title unchanged; severity is now lower.
Author
Owner

Closing — both layered bugs from the post-mortem are fixed in main, and the remaining RLS-collision concern was resolved by Option 1 from this issue.

  • dab740d — DATABASE_TEST_URL only (DATABASE_URL no longer consulted), module-load assertion that the DB name must contain test, get_admin_db override added.
  • b14a16a — RLS tests gated behind RUN_RLS_TESTS=1 and auto-deselected from the default collection. This is Option 1 from the issue.
  • Per-worker DB isolation for pytest-xdist also landed since.

Full suite is now a valid regression gate.

Closing — both layered bugs from the post-mortem are fixed in main, and the remaining RLS-collision concern was resolved by Option 1 from this issue. - `dab740d` — DATABASE_TEST_URL only (DATABASE_URL no longer consulted), module-load assertion that the DB name must contain `test`, `get_admin_db` override added. - `b14a16a` — RLS tests gated behind `RUN_RLS_TESTS=1` and auto-deselected from the default collection. This is Option 1 from the issue. - Per-worker DB isolation for pytest-xdist also landed since. Full suite is now a valid regression gate.
Sign in to join this conversation.