Skip to content

fix(network,sync): eliminate OMP data races on shared working variables#152

Merged
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-30-scan-quorum-omp-race
Apr 13, 2026
Merged

fix(network,sync): eliminate OMP data races on shared working variables#152
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-30-scan-quorum-omp-race

Conversation

@adequatelimited
Copy link
Copy Markdown
Collaborator

Summary

  • Split the shared int result in scan_quorum() into two thread-private variables (weight_cmp, contrib) — eliminates a real data race
  • While in the neighborhood, moved peer and ecode in catchup() to the same function-top + private() clause style, matching the codebase convention

Problem

scan_quorum() used a single shared result variable for two semantically different purposes — weight comparison inside OMP_CRITICAL_() and peer-contribution counting outside any critical section. Thread A writing result = cmp256(...) inside the critical section while Thread B wrote result = 0 / result++ outside produced a genuine C11 data race.

Empirical evidence

Ran a TSan-instrumented build against a live mainnet scan:

  • Before fix: 71+ race reports in the scan phase, including multiple on result
  • After fix: 61 reports, zero on result / weight_cmp / contrib
  • The 10 missing reports correspond exactly to the F-30 access pattern

Remaining reports are separate issues (scanidx/netplistidx/Rplistidx shared-state, peer.c shuffle32/addpeer, resync() at sync.c:212) — out of scope for this PR, could be new audit findings.

Changes

src/network.cscan_quorum()

  • Removed shared int result; at function top
  • Added two thread-private variables: int weight_cmp; int contrib;
  • Added them to the OMP private() clause
  • Replaced all uses of result with the appropriate per-purpose variable

src/sync.ccatchup()

  • Moved word32 peer; and int ecode; from inside-the-OMP-region declarations to function top
  • Added both to the OMP private() clause
  • Assignments (peer = plist[OMP_THREADNUM], ecode = VEOK) now happen at the top of the parallel region body since OMP private() doesn't initialize
  • No behavior change; purely style consistency

Test plan

  • Verify make (mochimo binary) passes
  • Run TSan-instrumented sync: no races flagged on weight_cmp, contrib, peer (catchup), or ecode (catchup)
  • Verify scan_quorum produces same quorum membership as before under non-adversarial conditions

Closes #110

scan_quorum() declared a single shared `int result` used for two
semantically different purposes in the same OMP parallel loop:
  1. Weight comparison inside OMP_CRITICAL_() (writes/reads under lock)
  2. Peer-contribution counter OUTSIDE any critical section (writes
     and reads across threads with no synchronization)

The conflation produced a real data race between threads: Thread A in
the critical section writing result = cmp256(...) while Thread B
outside any critical section executing result = 0 / result++. TSan
flagged the access pattern empirically on a live mainnet scan.

Split the variable into two thread-private variables declared at
function top and added to the OMP private() clause:
  - weight_cmp: used only in the critical section for cmp256 result
  - contrib: used only in the peer-counting loop outside the critical
    section

Style: matches the existing codebase convention for OpenMP thread-local
state (see `node`, `peer`, `len`, `ipstr` in the same function's
private() clause). Block-scope declarations inside the OMP region work
equally well semantically but the function-top + private() form is
what the rest of this file uses.

Also applied the same style to `peer` and `ecode` in catchup() (sync.c)
while in the neighborhood — both were declared inside the OMP parallel
block body in C99 style. Moved to function top + private() clause for
consistency. No behavior change in catchup(); purely convention cleanup.

Closes #110
@adequatelimited adequatelimited merged commit f244aef into audit-fixes Apr 13, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant