Skip to content

feat(normalizr): Add per-field detectCycles and entityDepth for denormalization#3840

Open
jayseo5953 wants to merge 5 commits intoreactive:masterfrom
jayseo5953:feat/detect-cycles
Open

feat(normalizr): Add per-field detectCycles and entityDepth for denormalization#3840
jayseo5953 wants to merge 5 commits intoreactive:masterfrom
jayseo5953:feat/detect-cycles

Conversation

@jayseo5953
Copy link
Copy Markdown

@jayseo5953 jayseo5953 commented Mar 30, 2026

Fixes #3822

Problem

The existing maxEntityDepth on Entity (PR #3823, #3834) mitigates this with a global depth counter, but has issues:

maxEntityDepth doesn't control nesting depth — it controls at what global depth the entity stops resolving. The docs describe it as "limits entity nesting depth," but it checks the global depth counter, not entity-specific nesting. For example:

class Department extends Entity {
  static maxEntityDepth = 10; // "10 levels of Department nesting"?
}

What a user expects: Department can nest 10 levels deep (parent → parent → ... 10 times).

What actually happens: Department won't resolve if the global depth counter is ≥ 10 — regardless of how many of those hops were Department entities. If Department is first encountered at global depth 8 (nested inside Building → Room → Document → ...), it only gets 2 more hops before the limit fires. The same Department encountered at global depth 2 gets 8 hops. The effective nesting depth is unpredictable.

Additional issues:

  • The same depth limit applies to all relationship paths regardless of whether they're cyclic or not
  • With dense bidirectional graphs (20k+ entities), even moderate limits (12-64) create millions of objects from fan-out at each level
  • Depth-limited entities created by depthLimitEntity bypass the cache, creating duplicate shallow copies — with real production data, this produces 3.8M uncached copies for 500 entities

This PR removes maxEntityDepth from Entity and replaces it with per-field controls that accurately describe their behavior. The hardcoded global MAX_ENTITY_DEPTH (64) remains as a safety net.

Solution

Two per-field schema options that target the actual problems:

1. detectCycles: true — schema-type cycle detection

Stops when the same entity type appears twice in the current ancestor path. This targets the root cause — bidirectional back-references creating infinite traversal loops.

class Department extends Entity {
  static schema = {
    buildings: { schema: [Building], detectCycles: true },
  };
}

class Building extends Entity {
  static schema = {
    departments: { schema: [Department], detectCycles: true },
  };
}
Department #1
  └─ Building #A         ← push "Building" to ancestor set
    └─ Department #2     ← push "Department"
      └─ Building #B     ← "Building" already in set → STOP

Why this is better than a global depth counter for bidirectional cycles:

  • Targets the actual problem: back-references are useful — Department → Building and Building → Department both resolve, but the cyclic loop Department → Building → Department → Building → ... is stopped
  • Adapts to any schema shape: the depth at which the cycle occurs doesn't matter — it works regardless of how many entity types sit between the two sides of the cycle
  • No magic numbers: no depth limit to tune as data grows denser or schema changes
  • Performance: with real production data (20k+ entities, 19,504 in one connected component), detectCycles denormalizes 3,000 entities in 2ms vs 17,285ms with maxEntityDepth: 12

2. entityDepth: N — relative depth limit for self-referential fields

For self-referential relationships (parent/children hierarchies) where the entity type repeats legitimately. detectCycles would stop these at 1 hop (too aggressive). entityDepth provides a relative counter from where the field config is set.

class Department extends Entity {
  static schema = {
    buildings: { schema: [Building], detectCycles: true },
    parent: { schema: Department, entityDepth: 2 },
    children: { schema: [Department], entityDepth: 3 },
  };
}
Department #1
  └─ children (entityDepth: 3 starts here)
    └─ Department #2      ← hop 1
      └─ Department #3    ← hop 2
        └─ Department #4  ← hop 3 → STOP

Unlike maxEntityDepth, entityDepth is relative — "allow N hops from this field" means the same thing regardless of where the entity sits in the overall tree.

Limitation: entityDepth is designed for self-referential fields on the same entity. When multiple entities in a traversal chain use entityDepth on cross-type relationships, the outermost one controls the subtree. Use detectCycles for cross-type relationships.

3. Cache-safe depth limiting

Depth-limited entities read from cache (returning fully resolved versions if available) but don't write to cache. This prevents cache poisoning where an entity first encountered deep in one subtree gets cached as a shallow copy and served to top-level consumers that need the fully resolved version.

Usage

Both options can be combined on the same field. Whichever triggers first wins.

// Cross-type bidirectional: use detectCycles
class Building extends Entity {
  static schema = {
    departments: { schema: [Department], detectCycles: true },
    rooms: { schema: [Room], detectCycles: true },
    documents: { schema: [Document], detectCycles: true },
  };
}

// Self-referential + cross-type: use both
class Department extends Entity {
  static schema = {
    buildings: { schema: [Building], detectCycles: true },
    rooms: { schema: [Room], detectCycles: true },
    documents: { schema: [Document], detectCycles: true },
    parent: { schema: Department, entityDepth: 2 },
    children: { schema: [Department], entityDepth: 3 },
  };
}

Why per-field, not per-entity

The decision of when to resolve and when to stop is relative to the relationship, not absolute to the entity. The best information for that decision lives at the field level:

  • A buildings field on Department knows it's a bidirectional back-reference → detectCycles: true
  • A children field on Department knows it's a self-referential hierarchy → entityDepth: 3
  • A parent field on Department knows it's a single hop → entityDepth: 2

A global counter on the entity (like maxEntityDepth) can't distinguish these — it applies the same limit to all paths. The entity doesn't know whether it's being reached through a cyclic back-reference or a legitimate deep hierarchy. The field does.

Tests

The test suite (Depth.test.ts) proves:

detectCycles is predictable — the test "Building at root: resolves Department, stops back-reference" and "Building nested 4 levels deep: same result" show identical behavior whether the Building is accessed directly or through 4 levels of Wrapper entities. A global depth counter would give different results depending on traversal context.

entityDepth does what it says — the test "resolves exactly 3 levels of children" shows Org A → B (hop 1) → C (hop 2) → D (hop 3) → E (depth-limited). Always 3 levels, regardless of where Org A sits in the tree.

Compatibility

  • The global depth safety net (MAX_ENTITY_DEPTH = 64) remains as a fallback
  • detectCycles and entityDepth fire before the global safety net check
  • Fields without config behave exactly as before
  • No changes to normalization — field configs are only read during denormalization
  • TypeScript types: { schema: S, detectCycles?: boolean, entityDepth?: number } resolves to Denormalize<S>

Future consideration

The global safety net (MAX_ENTITY_DEPTH = 64) could be made configurable at the application level (e.g., via DataProvider) to allow different apps to set their own fallback limit without changing the source.

)

* docs: Add v0.16 migration codemod

jscodeshift transform served from dataclient.io/codemods/v0.16.js
handling path-to-regexp v8 syntax, useFetch() .resolved, and
schema namespace to direct imports. Linked from the blog post
migration guide.

Made-with: Cursor

* fix: bugbot

* Fix schema codemod: aliased import skipped when name already exists in same declaration

When a name like Entity is both directly imported and used via schema.Entity
in the same import declaration, the codemod had two bugs:

1. scopeBindings included specifiers from the same import declaration (other
   than the schema specifier itself), causing resolveLocal to incorrectly
   detect a conflict and rename schema.Entity to SchemaEntity. Fixed by
   skipping the entire current import declaration when building scopeBindings,
   since those names come from the same package and won't conflict.

2. The existing-import dedup check compared imported names (e.g. 'Entity')
   rather than local names. When an alias was needed (Entity as SchemaEntity),
   finding 'Entity' already present caused the aliased specifier to be
   skipped, leaving SchemaEntity undefined. Fixed by checking existingLocals
   (s.local.name) against localName instead.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix: scope useFetch truthiness rewrites to enclosing function

transformUseFetch previously collected variable names into a file-global
set, then rewrote every matching identifier in any if/ternary/&& across
the entire file. If two components in the same file both declared
'const data = ...' — one from useFetch() and one from an unrelated
source — the non-useFetch truthiness check would be incorrectly
rewritten to '!data.resolved'.

Fix by walking up to the enclosing function for each useFetch()
declarator and only rewriting truthiness checks within that scope.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): use optional chaining in useFetch truthiness rewrite

The rewrite function transformed `if (promise)` into `if (!promise.resolved)`,
but useFetch returns undefined when called with null args (conditional fetching).
Accessing .resolved on undefined throws a TypeError at runtime.

Use optional chaining (promise?.resolved) to guard against undefined.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): rewrite useFetch vars on both sides of LogicalExpression

The LogicalExpression handler only called rewrite(p.node.left), so a
useFetch variable on the right side of && or || (e.g. 'condition &&
promise') was never transformed. In v0.16, promise is always a truthy
UsablePromise object when args are non-null, so untransformed right-side
checks silently became no-ops.

Now both left and right sides are rewritten, and || expressions are
handled alongside &&.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): preserve schema import when bare identifier references exist

The transformSchemaImports function was unconditionally removing the
schema import specifier after rewriting schema.X member expressions,
but standalone references to 'schema' as a bare identifier (destructuring,
function arguments, typeof) were left unrewritten, causing ReferenceError
at runtime.

Now checks for remaining bare references before removing the import.
If any exist, the schema specifier is kept while still adding the
individual named imports for the rewritten member expressions.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix: bugbot

* fix(codemod): skip schema.Object and schema.Array in v0.16 transform

Object and Array are type-only exports from @data-client/endpoint
(export type { Array, Object }), not value exports. The codemod was
transforming schema.Object/schema.Array into direct imports like
'import { Object as SchemaObject }', which would fail in value
positions (e.g. new SchemaObject(...)). These members must remain
accessed via the schema namespace.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* Fix transformUseFetch to only apply when useFetch is imported from @data-client

transformUseFetch previously matched any call named useFetch without verifying
it was imported from a @data-client package. Since useFetch is a common hook
name exported by other libraries (e.g. use-http), running this codemod on
codebases using a different useFetch would silently corrupt truthiness checks.

Added an import-source guard that checks for a useFetch import from any
@data-client/* package before applying the transformation, consistent with
how transformSchemaImports gates on DATA_CLIENT_PACKAGES.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): skip schema.X rewrites where schema is shadowed by local binding

transformSchemaImports was replacing ALL schema.X member expressions
file-wide without checking whether 'schema' at each usage site actually
referred to the import or to a local variable/parameter that shadows it.

For example, a function parameter named 'schema' (e.g.,
function validate(schema) { return schema.isValid; }) in the same file
as import { schema } from '@data-client/endpoint' would have
schema.isValid incorrectly replaced with a bare isValid identifier.

Add isShadowed() helper that walks up the AST from each schema.X node
to check for shadowing bindings in enclosing scopes (function params,
catch clauses, variable declarations, for-loop bindings). Skip the
replacement when the identifier is shadowed.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): preserve boolean semantics for useFetch conditional fetching

The rewrite function transformed:
- `promise` → `!promise?.resolved`
- `!promise` → `promise?.resolved`

This inverts boolean semantics when useFetch returns undefined (null args
for conditional fetching). `undefined?.resolved` evaluates to `undefined`
via optional chaining, so `!undefined` is `true` — but the original
`if (promise)` with `undefined` was `false`.

Fix by using strict equality against `false`:
- `promise` → `promise?.resolved === false`
- `!promise` → `promise?.resolved !== false`

This preserves the original semantics for all three states: in-flight
promise (resolved=false), resolved promise (resolved=true), and disabled
fetch (undefined).

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): detect top-level var/function/class bindings in scopeBindings

scopeBindings only collected local names from import declarations, missing
top-level variable, function, and class declarations. This caused
resolveLocal to miss naming conflicts (e.g. const Union = someValue),
producing invalid code with duplicate bindings.

Now collects bindings from:
- Top-level VariableDeclarations (including destructured patterns)
- Top-level FunctionDeclarations
- Top-level ClassDeclarations
- Exported variants of all the above

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* fix(codemod): detect FunctionDeclaration/ClassDeclaration shadowing in isShadowed

The isShadowed function only checked VariableDeclaration statements when
scanning BlockStatement/Program nodes. This missed FunctionDeclaration and
ClassDeclaration names that also create bindings in block scope.

When a local function or class with the same name as the import alias
(e.g. 'schema') existed in an enclosing block, isShadowed incorrectly
returned false, causing the codemod to transform references to those
local bindings.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* Fix transformUseFetch rewriting shadowed variables in nested callbacks

transformUseFetch searched the entire enclosing function for truthiness
checks on the useFetch variable name, but never checked whether that
name was shadowed by a parameter or local variable in a nested function
or callback. This caused incorrect rewrites like adding .resolved to a
callback parameter that happened to share the same name.

Fix: reuse the existing isShadowed() helper (already used by
transformSchemaImports) with an optional stopNode parameter to limit the
shadow walk to scopes between the found expression and the function
where useFetch was declared.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* Guard transformPaths with @data-client import check

transformPaths was transforming every StringLiteral with a 'path' property
key in the entire file, regardless of whether the file uses @data-client.
This could silently corrupt path strings from unrelated libraries (React
Router, Express, etc.) like /:param? → {/:param}.

Add the same DATA_CLIENT_PACKAGES import guard that transformUseFetch
already uses, so transformPaths only runs on files that actually import
from @data-client/endpoint or @data-client/rest.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

* Fix resolveLocal collision: verify 'Schema'+name is also free of conflicts

resolveLocal prefixed a name with 'Schema' when it collided with a JS
global or scope binding, but never verified the resulting identifier was
itself free of collisions. If the user had an existing binding named
e.g. SchemaUnion, the codemod would silently produce a conflicting
identifier.

Now the function loops, prepending '_' to the candidate until it finds
an identifier that doesn't collide with JS_GLOBALS or scopeBindings.

Co-authored-by: Nathaniel Tucker <me@ntucker.me>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 30, 2026

⚠️ No Changeset found

Latest commit: a093a70

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@jayseo5953 jayseo5953 marked this pull request as draft March 30, 2026 19:33
@jayseo5953 jayseo5953 marked this pull request as ready for review March 30, 2026 19:34
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

@jayseo5953 is attempting to deploy a commit to the data-client Team on Vercel.

A member of the Team first needs to authorize it.

@jayseo5953
Copy link
Copy Markdown
Author

@ntucker if you can review this, would be awesome!

…malization

Adds two per-field schema options for controlling denormalization depth:

- `detectCycles: true` — stops when the same entity type appears twice
  in the ancestor path. Targets cross-type bidirectional cycles precisely.
- `entityDepth: N` — relative depth limit from the field. For self-referential
  relationships (parent/children) where the type repeats legitimately.

Usage:
  static schema = {
    siteOrganizations: { schema: [SiteOrganization], detectCycles: true },
    children: { schema: [Organization], entityDepth: 3 },
  };

Depth-limited entities read from cache (returning fully resolved versions
if available) but never write to cache, preventing cache poisoning.

Works alongside existing maxEntityDepth as a global safety net.

Fixes reactive#3822

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jayseo5953 and others added 3 commits March 31, 2026 11:09
- detectCycles: same resolution whether entity is at root or nested 4 levels deep
- entityDepth: resolves exactly N levels of self-referential nesting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ntucker
Copy link
Copy Markdown
Collaborator

ntucker commented Apr 1, 2026

@jayseo5953 It would be helpful if you could provide a realistic example of your actual use case. Even better if it was wired into the examples/benchmark-react so we could easily measure it against any solution.

It still escapes me why the Lazy solution does not work for you. It still does not make sense why this specific behavior is desirable over a more controllable Lazy.

There are also challenges with approach, which is why it would be helpful to understand more about the goal to get something that could work and not sacrifice the goals of this project.

Our goals:

  • Extreme performance
    • That includes not regressing other uses, not just measuring your setup
  • Low download size
  • Low mental complexity

One thing about specifying schemas - plain pojos and plain arrays are a specific type. So any configuration must be done as a schema class. But this is also to keep the normalization code extremely clean so it runs very very fast. This is also a problem for AI as it will be very hard for the AI to know it can produce those things, whereas with a new Lazy() or such it will have the context to know what it can generate.

I think the best next step that would help us get to a solution is to get that realistic full example - not just the schema definitions but the way they're used in react as well.

@jayseo5953
Copy link
Copy Markdown
Author

@ntucker Thanks for the detailed response. Working on a realistic React example — will take some time to set up properly. In the meantime, here's the sanitized dataset and schema definitions that reproduce the issue: https://github.com/jayseo5953/dataclient-repro-3822

On Lazy: I agree that Lazy is the right long-term solution. However, adopting it requires significant consumer-side refactoring — every component that accesses a lazy relationship needs to switch from entity.relationship.property to useQuery(). Our entities also have computed getters that traverse relationships (e.g., confidenceFor() which walks siteOrganizations to find a matching org), which all break with Lazy. It's the right direction but it's a major refactor our team isn't ready to take on yet. That's why we've been exploring depth limiting as a practical interim solution.

On what we'd like to see, in priority order:

  1. Global MAX_ENTITY_DEPTH configurable at the app level (e.g., via DataProvider) — so apps can tune the safety net without forking.

  2. maxEntityDepth on Entity should control actual nesting depth, not global depth threshold. Currently maxEntityDepth: 10 means "don't resolve this entity when the global counter is ≥ 10" — not "allow 10 levels of nesting." If a Department is first encountered at global depth 8 (nested inside Building → Room → ...), it only gets 2 more hops. The same Department at depth 2 gets 8 hops. The effective nesting is unpredictable and context-dependent. We think this should mean "this entity can nest N levels from wherever it's encountered" — a relative limit, not an absolute threshold.

  3. Per-field depth configuration — different relationships on the same entity need different limits. A children field (self-referential hierarchy, needs 3-4 levels) and a buildings field (bidirectional back-reference, needs 1-2 levels) shouldn't share the same limit. The entity doesn't know how it's being reached — the field does.

  4. Schema-type cycle detection — for bidirectional relationships specifically, detecting when an entity type repeats in the ancestor path is more precise than any depth number. It adapts to any schema shape, needs no tuning, and targets exactly the cyclic traversal that causes the overflow.

On the schema config format — understood that plain POJOs have special meaning. Happy to use a schema class approach if that fits better with the architecture. The object format in the PR was just for illustration.

Will follow up with the benchmark-react example showing the end-to-end use case.

@jayseo5953
Copy link
Copy Markdown
Author

jayseo5953 commented Apr 1, 2026

If any one of the above can be delivered in the meantime, it would unblock our team and I would greatly appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maximum call stack size exceeded during denormalization with large bidirectional entity graphs

2 participants