Skip to content

fix(spp_api_v2): serialize concurrent fastapi_endpoint sync across workers#175

Open
haklyray wants to merge 1 commit into19.0from
fix/spp-api-v2-fastapi-endpoint-sync-race
Open

fix(spp_api_v2): serialize concurrent fastapi_endpoint sync across workers#175
haklyray wants to merge 1 commit into19.0from
fix/spp-api-v2-fastapi-endpoint-sync-race

Conversation

@haklyray
Copy link
Copy Markdown
Contributor

Why is this change needed?

After Upgrade All Modules (or any other event that triggers a registry reload), every Odoo worker independently invokes the patched ir.http.routing_map(). Each worker enters the FastAPI endpoint sync block and races to UPDATE fastapi_endpoint SET registry_sync = TRUE WHERE id IN (1, 2). Under PostgreSQL's REPEATABLE READ isolation, all but one of those concurrent updates abort with:

ERROR: could not serialize access due to concurrent update

odoo.sql_db logs each failed query at ERROR before the outer try/except Exception swallows it at DEBUG. With N workers, that produces N−1 noisy ERRORs in the SQL log on every registry reload — alarming to operators, and it makes real errors harder to spot. The system still works (one worker succeeds, hence the trailing Synced N FastAPI endpoints info line), but the noise is unnecessary and N−1 transactions are wasted on every cold start.

How was the change implemented?

Gate the entire FastAPI endpoint sync block in spp_api_v2/models/ir_http_patch.py behind a PostgreSQL transaction-scoped advisory lock:

  • Module-scope constant _FASTAPI_SYNC_ADVISORY_LOCK_KEY is a deterministic 64-bit signed int derived from SHA-256("spp_api_v2.fastapi_endpoint_sync") — stable across processes and unlikely to collide with any other module's advisory lock in the same database.
  • Inside the existing with registry.cursor() as cr: block, the cursor first calls SELECT pg_try_advisory_xact_lock(<key>). The non-blocking try_ variant means losers return immediately rather than serialize on the lock and then re-attempt the same losing UPDATE.
  • If the lock is acquired, the existing sync logic runs unchanged (just indented one level deeper inside an else: branch).
  • If the lock is not acquired, the worker logs a DEBUG line (FastAPI endpoint sync skipped — another worker is syncing) and skips the sync.
  • The lock is transaction-scoped, so it is released automatically at COMMIT or ROLLBACK when the cursor block exits — no manual unlock and no risk of leaking state.

This is safe because skipping workers don't end up with a broken routing map: when the lock-holder commits, _handle_registry_sync calls self.env.registry.clear_cache("routing"), and the cache-invalidation signal propagates to other workers through Odoo's standard registry-signaling mechanism. The next routing_map() call on any worker rebuilds with the synced routes.

New unit tests

None. The race is inherently a multi-process / multi-cursor scenario, and reproducing it deterministically inside Odoo's standard test runner (which runs single-process with a single cursor) would require either coordinating two real PostgreSQL connections in a test or mocking the cursor's execute/fetchone. Both add significant test scaffolding for limited value — happy to add one if reviewers feel strongly.

The function-level guarantee (lock acquired ⇒ sync runs; lock not acquired ⇒ sync skips) is a 4-line code path that is straightforward by inspection.

Unit tests executed by the author

  • python3 -m py_compile spp_api_v2/models/ir_http_patch.py → passes (also via ast.parse in development).
  • No existing tests in spp_api_v2/tests/ exercise ir_http_patch.routing_map() (it's a workaround for an Odoo core cache bug, not a normal Odoo model), so the existing test suite is unaffected.

How to test manually

  1. Reproduce the baseline (without this PR). Start the devcontainer and run:

    python -m odoo -c odoo-dev.conf -u all --stop-after-init 2>&1 \
      | grep -E "(serialize access|Synced .* FastAPI)"

    Expect multiple ERROR: could not serialize access due to concurrent update lines (one per losing worker), followed by a single Synced N FastAPI endpoints for database <db>.

  2. Apply this PR and rerun the same command. Expect zero serialize access errors and a single Synced N FastAPI endpoints line.

  3. Confirm the skip path runs (proves it's the lock doing the work, not a coincidence):

    python -m odoo -c odoo-dev.conf --log-level=debug -u all --stop-after-init 2>&1 \
      | grep "FastAPI endpoint sync skipped"

    Expect ≥1 FastAPI endpoint sync skipped … another worker is syncing DEBUG line per registry reload.

  4. Functional smoke test — endpoints must still actually be registered after the upgrade:

    curl -s -o /dev/null -w '%{http_code}\n' http://localhost:8069/api/v2/<any-registered-route>

    Expect a non-404 response (200/401/403 depending on the route's auth — anything except 404 confirms routing is live).

  5. Single-worker regression — start Odoo with --workers=0 and confirm sync still happens (the only worker always acquires the lock):

    python -m odoo -c odoo-dev.conf --workers=0 -u all --stop-after-init 2>&1 \
      | grep "Synced .* FastAPI"

Related links

…rkers

After a registry reload (e.g. -u all) every Odoo worker independently
invokes routing_map(), and each one races to UPDATE the same
fastapi_endpoint rows when calling action_sync_registry(). Under
PostgreSQL's REPEATABLE READ isolation this surfaces as N-1 noisy
"could not serialize access due to concurrent update" ERRORs in the SQL
log before the outer try/except swallows them.

Gate the sync block with a pg_try_advisory_xact_lock keyed off a stable
SHA-256-derived int8 so only one worker performs the sync per reload.
The lock is transaction-scoped and released automatically at COMMIT.
Losers log a DEBUG line and skip; the registry-cache invalidation
signal from the winning worker propagates to them via the standard
Odoo signaling mechanism.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a transaction-scoped advisory lock to serialize FastAPI endpoint synchronization across multiple workers. By using pg_try_advisory_xact_lock with a deterministic 64-bit key derived from the module name, the implementation prevents SerializationFailure errors during concurrent registry reloads. I have no feedback to provide.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.64%. Comparing base (98a45a9) to head (b2cd2e3).

Files with missing lines Patch % Lines
spp_api_v2/models/ir_http_patch.py 50.00% 10 Missing ⚠️

❌ Your patch check has failed because the patch coverage (50.00%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             19.0     #175      +/-   ##
==========================================
+ Coverage   71.48%   71.64%   +0.16%     
==========================================
  Files         932      933       +1     
  Lines       54840    55382     +542     
==========================================
+ Hits        39201    39679     +478     
- Misses      15639    15703      +64     
Flag Coverage Δ
spp_api_v2 80.10% <50.00%> (+<0.01%) ⬆️
spp_api_v2_change_request 66.85% <ø> (ø)
spp_api_v2_cycles 71.12% <ø> (ø)
spp_api_v2_data 64.41% <ø> (ø)
spp_api_v2_entitlements 70.19% <ø> (ø)
spp_api_v2_gis 71.52% <ø> (ø)
spp_api_v2_products 66.27% <ø> (ø)
spp_api_v2_service_points 70.94% <ø> (ø)
spp_api_v2_simulation 71.12% <ø> (ø)
spp_api_v2_vocabulary 57.26% <ø> (ø)
spp_base_common 90.26% <ø> (ø)
spp_dci_client_dr 55.87% <ø> (ø)
spp_dci_client_ibr 60.17% <ø> (ø)
spp_dci_demo 69.23% <ø> (ø)
spp_dci_server 35.68% <ø> (ø)
spp_farmer_registry_demo 54.01% <ø> (+0.62%) ⬆️
spp_mis_demo_v2 74.20% <ø> (+4.18%) ⬆️
spp_programs 64.51% <ø> (ø)
spp_security 66.66% <ø> (ø)
spp_starter_social_registry 0.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
spp_api_v2/models/ir_http_patch.py 76.82% <50.00%> (+0.51%) ⬆️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant