Skip to content

fix(tfile): declare ecode volatile in validate_tfile_pow_fp()#151

Merged
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-10-omp-ecode-volatile
Apr 13, 2026
Merged

fix(tfile): declare ecode volatile in validate_tfile_pow_fp()#151
adequatelimited merged 1 commit intoaudit-fixesfrom
audit/F-10-omp-ecode-volatile

Conversation

@adequatelimited
Copy link
Copy Markdown
Collaborator

Summary

  • Declare ecode as volatile int in validate_tfile_pow_fp() so that the unsynchronized read of the loop-exit flag is well-defined on all hardware.

Problem

In src/tfile.c:821 (validate_tfile_pow_fp), ecode is a plain int shared across OMP threads. Writes happen inside OMP_CRITICAL_() blocks (correctly serialized) but reads are outside any critical section:

  • Line 853: while (ecode == VEOK) — loop-exit condition
  • Line 866: if (ecode != VEOK) continue; — early continue

Under the C11 memory model, an unsynchronized read of a variable that another thread may concurrently write (even inside a critical section) is a data race and undefined behavior. ThreadSanitizer flags this as a race.

On x86 it works by accident due to Total Store Order — stores become visible to other cores quickly and the compiler is unlikely to cache the value in a register across an OMP_CRITICAL_ boundary. On weakly-ordered architectures (ARM64, RISC-V, MIPS) the write may not be visible to a reader for an unbounded time.

Fix

Declaring ecode as volatile int:

  1. Prevents the compiler from caching it in a register across loop iterations
  2. Forces each read to go to memory
  3. Combined with the memory barriers implicit in OMP critical-section entry/exit, this gives correct shared-flag semantics on all real hardware

errnum is not affected — it is written inside critical sections and only read AFTER the parallel region ends, where the implicit barrier at parallel-region exit provides the necessary synchronization.

Alternatives considered

_Atomic int via <stdatomic.h> is the C11-purist answer. Not adopted because (a) it requires a new header dependency, (b) the volatile approach is sufficient for correctness on all real hardware given the existing OMP critical sections, and (c) the existing codebase style is plain-C + OpenMP, not stdatomic.

Test plan

  • make test passes
  • TSan run: make test CCARGS="-fsanitize=thread" should no longer flag the while (ecode == VEOK) read as a race (or the warning should be substantially reduced)

Closes #101

In validate_tfile_pow_fp(), ecode is shared across OMP threads:
writes happen inside OMP_CRITICAL_ blocks (correctly serialized)
but reads are outside any critical section (the while-loop condition
at line 853 and the early-continue at line 866). Under the C11
memory model this is a data race and undefined behavior.

On x86 it works by accident due to Total Store Order — stores
become visible quickly and the compiler is unlikely to cache the
value in a register across an OMP_CRITICAL_ boundary. On weakly-
ordered architectures (ARM64, RISC-V, MIPS) the write may not be
visible to a reader for an unbounded time. ThreadSanitizer also
flags this as a data race.

Declaring ecode volatile prevents the compiler from caching it
in a register and forces each read to go to memory. Combined with
the memory barriers implicit in OMP critical-section entry/exit,
this gives correct shared-flag semantics on all real hardware.

errnum is written inside critical sections and read only AFTER
the OMP_PARALLEL_ block ends. The implicit barrier at parallel-
region exit provides the necessary synchronization, so errnum
does not need volatile.

Closes #101
@adequatelimited
Copy link
Copy Markdown
Collaborator Author

Not an issue now for x86 with current compilers, but patching in case a future compiler optimization makes it an issue.

@adequatelimited adequatelimited merged commit 15bba23 into audit-fixes Apr 13, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant