Skip to content

Audit fixes: F-10, F-13, F-14, F-30 (OMP races, tx_init defense, locale-independent reference validation)#155

Merged
adequatelimited merged 8 commits intomasterfrom
audit-fixes
Apr 13, 2026
Merged

Audit fixes: F-10, F-13, F-14, F-30 (OMP races, tx_init defense, locale-independent reference validation)#155
adequatelimited merged 8 commits intomasterfrom
audit-fixes

Conversation

@adequatelimited
Copy link
Copy Markdown
Collaborator

Summary

Promotes four audit fixes from audit-fixes to master:

Fix Issue File(s) Intent
F-10 #101 src/tfile.c Declare ecode volatile in validate_tfile_pow_fp() to fix OMP cross-thread data race on the loop-exit flag
F-30 #110 src/network.c, src/sync.c Split shared result in scan_quorum() into thread-private weight_cmp + contrib; hygiene cleanup of peer/ecode in catchup()
F-13 #90 src/tx.c, src/test/tx-read-unknown-type.c Reject unsupported TXDAT_TYPE / TXDSA_TYPE in tx__init() before any uninitialized read; unit test added
F-14 #104 src/tx.c Replace locale-dependent <ctype.h> isdigit/isupper with locale-free ASCII range helpers

Scope audit

 src/network.c                   |  23 ++-
 src/sync.c                      |  12 +-
 src/test/tx-read-unknown-type.c | 105 ++++++++++
 src/tfile.c                     |  10 +-
 src/tx.c                        |  87 +++++++--
 5 files changed, 205 insertions(+), 32 deletions(-)

All changes localized to the files each audit finding targets. No scope creep.

Artifact sweep

Verified the diff contains NO:

  • PoW bypass (return VEOK at the top of validate_pow())
  • F-.. TRACE / LOCAL TEST ONLY / DO NOT COMMIT instrumentation
  • -ftrivial-auto-var-init flags or other test-only tooling
  • Co-Authored-By: Claude attribution trailers (already prevented by .claude/settings.local.json)

src/tfile.c:validate_pow() begins with the original const word32 peach_trigger[2] declaration, unchanged — no early-return bypass.

Build

make NO_CUDA=1 mochimo passes cleanly on default -Werror -Wextra -Wpedantic with gcc-13 (Ubuntu 24.04 WSL). No -Wno-error workaround needed.

Validation (from the individual PRs)

  • F-10: TSan run confirmed zero races in validate_tfile_pow_fp after the volatile fix. Live sync with -T970000 exercised the OMP loop under real PoW load on ~5000 trailers; returned VEOK, sync reached catchup cleanly.
  • F-30: TSan run before/after the fix showed 71 → 61 race reports, with zero races remaining on result/weight_cmp/contrib.
  • F-13: Unit test src/test/tx-read-unknown-type.c covers 5 crafted cases. Pre-fix run with -ftrivial-auto-var-init=pattern captured 0xfefefefefefefefe poison in uninitialized offsets; post-fix shows clean VERROR on the default branch. Live sync confirmed thousands of legitimate mainnet TXs still parse correctly (no regression).
  • F-14: Live sync under both LC_ALL=C and LC_ALL=en_US.UTF-8 — zero reference rejections in either. Real mainnet captures include ASCII digit strings and an uppercase string (LPPAYMENT) — both accepted identically in both locales, proving locale independence.

Test plan

  • CI builds pass on Ubuntu x64, Ubuntu arm64 (MacOS arm64 has a pre-existing submodule issue unrelated to this PR)
  • CodeQL analysis passes

In validate_tfile_pow_fp(), ecode is shared across OMP threads:
writes happen inside OMP_CRITICAL_ blocks (correctly serialized)
but reads are outside any critical section (the while-loop condition
at line 853 and the early-continue at line 866). Under the C11
memory model this is a data race and undefined behavior.

On x86 it works by accident due to Total Store Order — stores
become visible quickly and the compiler is unlikely to cache the
value in a register across an OMP_CRITICAL_ boundary. On weakly-
ordered architectures (ARM64, RISC-V, MIPS) the write may not be
visible to a reader for an unbounded time. ThreadSanitizer also
flags this as a data race.

Declaring ecode volatile prevents the compiler from caching it
in a register and forces each read to go to memory. Combined with
the memory barriers implicit in OMP critical-section entry/exit,
this gives correct shared-flag semantics on all real hardware.

errnum is written inside critical sections and read only AFTER
the OMP_PARALLEL_ block ends. The implicit barrier at parallel-
region exit provides the necessary synchronization, so errnum
does not need volatile.

Closes #101
fix(tfile): declare ecode volatile in validate_tfile_pow_fp()
scan_quorum() declared a single shared `int result` used for two
semantically different purposes in the same OMP parallel loop:
  1. Weight comparison inside OMP_CRITICAL_() (writes/reads under lock)
  2. Peer-contribution counter OUTSIDE any critical section (writes
     and reads across threads with no synchronization)

The conflation produced a real data race between threads: Thread A in
the critical section writing result = cmp256(...) while Thread B
outside any critical section executing result = 0 / result++. TSan
flagged the access pattern empirically on a live mainnet scan.

Split the variable into two thread-private variables declared at
function top and added to the OMP private() clause:
  - weight_cmp: used only in the critical section for cmp256 result
  - contrib: used only in the peer-counting loop outside the critical
    section

Style: matches the existing codebase convention for OpenMP thread-local
state (see `node`, `peer`, `len`, `ipstr` in the same function's
private() clause). Block-scope declarations inside the OMP region work
equally well semantically but the function-top + private() form is
what the rest of this file uses.

Also applied the same style to `peer` and `ecode` in catchup() (sync.c)
while in the neighborhood — both were declared inside the OMP parallel
block body in C99 style. Moved to function top + private() clause for
consistency. No behavior change in catchup(); purely convention cleanup.

Closes #110
fix(network,sync): eliminate OMP data races on shared working variables
tx__init() previously left dsaoff and tlroff uninitialized when the
TXDAT_TYPE or TXDSA_TYPE switch case did not match a known value, then
proceeded to use those uninitialized values in pointer arithmetic and
size-check computations. A peer-controlled OP_TX packet with a type
byte other than 0x00 produced undefined behavior at every step from
the offset computation through the TXENTRY pointer initialization.

Build with -ftrivial-auto-var-init=pattern and a temporary trace
showed the offsets reading as 0xfefefefefefefefe (the GCC pattern
fill for uninitialized stack locals) on every bad-type case, and the
resulting tx->dsa / tx->tlr / tx_sz values were garbage. tx_read's
post-tx__init size check happened to reject the test cases with a
small buffer, but a larger crafted buffer could pass that check and
propagate garbage pointers into tx_val() or process_tx().

Fix:
- tx__init() return type changed from void to int (VEOK/VERROR)
- Added default: return VERROR to both type switches so uninitialized
  offsets are never read
- Added bounds checks: dsaoff < sizeof(tx->buffer) and
  tlroff + sizeof(TXTLR) <= sizeof(tx->buffer) so declared counts that
  would place pointers outside the buffer are rejected
- tx_read() now checks the return value and propagates VERROR with
  set_errno(EMCM_TXINVAL)

Also added src/test/tx-read-unknown-type.c that exercises tx_read()
with five crafted buffers (valid, unknown TXDAT_TYPE, unknown
TXDSA_TYPE, both unknown, oversized MDST_COUNT) and asserts all five
yield VERROR. Follows the existing src/test/_assert.h pattern.

Mirror protection (verified by static walk of process_tx and confirmed
at runtime): bad TXs cannot reach mirror_tx(); the vulnerability is
strictly a local-crash concern, not a propagation concern.

Closes #90
fix(tx): reject unsupported TXDAT_TYPE / TXDSA_TYPE in tx__init()
mdst_val__reference() used <ctype.h>'s isdigit() / isupper() on a
const char* directly. Two bugs with that:

1. Locale sensitivity. The ctype classifiers consult the process
   locale's character table. A node in en_US.UTF-8 may classify a
   byte differently than one in the C locale -- a consensus split on
   transaction validity.

2. Undefined behavior. The C standard requires the ctype argument be
   representable as unsigned char or equal EOF. Plain char is signed
   on most platforms, so bytes with the high bit set get sign-extended
   to negative int. Any byte >= 0x80 in the reference field is UB.

Fix: two file-local static inline helpers at the top of tx.c that take
an unsigned char and perform a pure ASCII range check:

   static inline int ascii_isdigit(unsigned char c) { return c >= '0' && c <= '9'; }
   static inline int ascii_isupper(unsigned char c) { return c >= 'A' && c <= 'Z'; }

mdst_val__reference() now reads reference[j] into an unsigned char c
once per iteration and uses the helpers. <ctype.h> include removed --
no remaining users.

Behavior for valid ASCII input is unchanged; non-ASCII bytes that were
previously UB are now deterministically rejected in all locales.

The helpers carry a TODO marker to promote them to extlib.h if other
units later need locale-free ASCII classification.

Closes #104
…pper

fix(tx): replace locale-dependent ctype calls with ASCII range checks
@adequatelimited adequatelimited merged commit c6bf9d2 into master Apr 13, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant