Skip to content

Harden mutable-to-immutable copy against overflow, malformed structures, and deep recursion#261

Open
rootvector2 wants to merge 2 commits intoibireme:masterfrom
rootvector2:harden-imut-copy-overflow-and-depth
Open

Harden mutable-to-immutable copy against overflow, malformed structures, and deep recursion#261
rootvector2 wants to merge 2 commits intoibireme:masterfrom
rootvector2:harden-imut-copy-overflow-and-depth

Conversation

@rootvector2
Copy link
Copy Markdown

@rootvector2 rootvector2 commented Apr 12, 2026

This patch now focuses only on integer-arithmetic hardening in the mutable-to-immutable copy path for valid API-constructed values.

The previous version included structural validation of potentially corrupted internals and a fixed recursion-depth limit. Those parts have been removed to align with maintainer guidance.

Current changes:

  • Overflow-safe size accounting for value/string totals in yyjson_mut_stat
  • Overflow-safe object-length shift checks before doubling
  • Overflow-safe allocation size math in yyjson_mut_val_imut_copy
  • Overflow-safe offset-size calculations in yyjson_imut_copy
  • Fail-closed return (NULL) when arithmetic overflow is detected

Not included in this revision:

  • No validation for fake/corrupted yyjson_val or yyjson_mut_val internals
  • No fixed recursion-depth enforcement

Behavior for valid inputs remains unchanged. Existing tests pass with this revision.

@ibireme
Copy link
Copy Markdown
Owner

ibireme commented Apr 13, 2026

There are three main points here:

  1. Validating yyjson_val
    yyjson_val is immutable and normally only comes from yyjson_read_*(). Users can't create a valid yyjson_val from scratch, so imut APIs assume the input is valid to avoid extra checks. Passing a fake/random yyjson_val * is undefined behavior.

  2. Validating yyjson_mut_val
    yyjson_mut_val is built via yyjson_mut_*(). If those APIs are used correctly, the structure should already be valid, so we don't do heavy structural validation to keep cost and complexity low. We mainly validate things like string encoding. If user code bypasses the API and corrupts internal fields (e.g. fake lengths, broken links, invalid pointers), behavior is undefined and we can't guarantee safety across all yyjson_mut APIs.

  3. Recursion and stack overflow
    Stack usage from deep recursion has been documented since v0.7.0 (see: https://ibireme.github.io/yyjson/doc/doxygen/html/api.html#stack-memory-usage), and functions like equals/copy/merge_patch already carry the same warning. We prefer documenting this rather than enforcing a fixed depth limit; applications that need a depth limit should enforce it themselves.

@rootvector2
Copy link
Copy Markdown
Author

rootvector2 commented Apr 13, 2026

Thanks for the detailed guidance. I have revised this PR accordingly Removed structural-validation checks for potentially corrupted internals Removed fixed recursion-depth enforcement and related tests Kept the patch focused on integer-arithmetic overflow hardening in the copy path for valid API-constructed values Please take another look when you have time.

@ibireme
Copy link
Copy Markdown
Owner

ibireme commented Apr 15, 2026

There are a couple of implicit assumptions here:

About yyjson_mut_stat(): If we limit the input to valid yyjson_mut_val values built by the yyjson API, then val_num and str_sum are bounded by what the process can actually allocate. On 32-bit in particular, you will usually hit OOM long before getting close to usize overflow. So the per-step overflow checks in yyjson_mut_stat likely will not trigger in normal usage.

About yyjson_imut_copy(): If the result of yyjson_mut_stat() is trusted, and the val and str pools are allocated based on it, then yyjson_imut_copy() should not fail because the buffers are too small. In that valid input model, the internal checks feel a bit redundant.

What I worry about is untrusted input. For example, wrong string lengths, or a broken list structure like a cycle arr.child = arr. In that case this copy path is not the only problem. Other traversals in the library could also hit UB or infinite loops, and overflow checks alone cannot fix that.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants