Skip to content

Extend IdRef serialization from game events to protocol method args (delta branch)#17

Draft
MostCromulent wants to merge 7 commits intoNetworkPlay/mainfrom
fix-protocol-serialization-delta
Draft

Extend IdRef serialization from game events to protocol method args (delta branch)#17
MostCromulent wants to merge 7 commits intoNetworkPlay/mainfrom
fix-protocol-serialization-delta

Conversation

@MostCromulent
Copy link
Copy Markdown
Owner

@MostCromulent MostCromulent commented Apr 5, 2026

Cherry-pick of fix-protocol-serialization from master applied against NetworkPlay/main

What changed from fix-protocol-serialization

The original PR (fix-protocol-serialization, based on master) added encoder-level IdRef replacement for protocol method args. When rebased onto NetworkPlay/main, several design changes were made:

GameEventProxy removed

Events travel as raw GameEvent objects inside DeltaPacket — no byte-array wrapping. Testing confirmed per-event isolation is unnecessary (0 dropped events across 10+ game batch tests). Event wrapping was also tested and works, but is not needed.

TrackableSerializer replaces TrackableRef + GameEventProxy

Single class with: IdRef, StaleCardRef, type tags, replace()/resolve(), and bandwidth measurement utilities.

Encoder-level replacement enabled with findByView fallback

Infrastructure (shouldReplaceTrackables, ReplacingObjectOutputStream, tracker on encoder/decoder) is enabled. shouldReplaceTrackables excludes setGameView, openView, and applyDelta from replacement. See "Client→server path" below for the stale tracker issue and how it's handled.

Bandwidth measurement

measureReplacedSize() (with IdRef replacement) for delta path, measurePlainSize() (no replacement) for full-state comparison. Important: these must use different serialization modes because the encoder doesn't replace in setGameView messages.

Client→server protocol arg path

The original fix-protocol-serialization PR (master-based) only added encoder replacement for server→client messages. Client→server replacement is new in this branch. Initial testing revealed that enabling it broke client card selection after zone changes — the client could play a land from hand, but could not tap that land for mana on the battlefield.

How master works (no encoder replacement)

On master, the client sends full serialized CardView/PlayerView objects in protocol args (e.g. selectCard(CardView, ...)). The server deserializes these into fresh object instances whose properties reflect the client's current state (zone, controller, etc. — updated by prior setGameView calls). Game.findByView(CardView) uses the view's zone and controller properties to narrow the search to a specific player's zone, then matches by ID. Because the deserialized view carries the client's up-to-date zone info, the search finds the correct Card.

How this branch works (encoder IdRef replacement)

With encoder replacement enabled, the client encoder replaces CardView/PlayerView with lightweight IdRef(typeTag, id) markers. The server decoder resolves these from the server's tracker (gameView.getTracker()), returning the CardView registered in the tracker for that ID.

The stale tracker problem: TrackableTypes.TrackableObjectType.updateObjLookup only registers a new object if the tracker doesn't already have an entry for that ID (!tracker.hasObj(this, newObj.getId())). When a card changes zones, the game engine creates a new Card+CardView with the same ID, but the tracker skips it because the original CardView is already registered. The server decoder therefore returns a stale CardView with the old zone (e.g. zone=Hand after the card moved to Battlefield). findByView searches the wrong zone and misses.

Observed behavior: First client click works (card is still in the zone the tracker knows). After a zone change (e.g. playing a land), subsequent clicks silently fail — findByView returns null, selectCard returns false, and the game ignores the input.

Design decision: fallback search vs tracker update

Two solutions were considered:

  1. Fallback in findByView (adopted): When zone-specific search misses, fall back to global ID search across all zones. Safe, minimal blast radius — only triggers on a miss, and the global search was already the existing fallback for views with null zone/controller.

  2. Update tracker entries on zone change: Remove the !hasObj guard in updateObjLookup so new CardViews always overwrite old entries, or explicitly refresh tracker entries during zone changes. This would be the "correct" fix but risks unintended side effects — the !hasObj guard exists for a reason (prevents overwriting during copyChangedProps and initial updateObjLookup), and the tracker is used by both the delta sync system and the view update system. Changing its behavior could cause subtle breakage elsewhere.

The fallback approach was chosen because the tracker is a shared component with wide usage, and the stale-entry issue only manifests during encoder IdRef resolution. The cost of the fallback (an extra linear scan on miss) is negligible — it only occurs for protocol args after a zone change, which is infrequent.

Testing

Manual testing

  • Network game (server + client): verified client can play lands from hand, tap lands for mana, cast creatures, and interact normally across multiple turns with encoder replacement enabled.

Automated batch testing

Note: Batch tests use headless AI-vs-AI games — they exercise game logic, delta sync, and serialization but do not run a GUI. GUI-driven protocol methods (e.g. getChoices, confirm, dialog-based interactions) are only covered by manual testing.

10-game quick

  • 10/10 games passed (100%), 0 errors, 0 checksum mismatches, 93.2% bandwidth savings

100-game comprehensive

  • 99/100 games passed (99%), 0 errors, 0 send errors, 0 checksum mismatches
  • 93.2% bandwidth savings overall, 96.0% state-only
  • 2p: 50/50 (100%), 3p: 30/30 (100%), 4p: 19/20 (95%)
  • 1 incomplete game was a 4p timeout (no crash or error)
  • 0 dropped events, 0 protocol errors across all 100 games

🤖 Generated with Claude Code

MostCromulent and others added 3 commits April 6, 2026 06:37
Protocol methods were serializing the entire CardView/PlayerView object
graph for each argument. Add IdRef replacement at the Netty encoder/decoder
level: the server encoder swaps CardView/PlayerView with lightweight
IdRef(typeTag, id) markers, and the client decoder resolves them from the
Tracker. setGameView and openView are excluded (they carry the full state).

Extract shared ID-mapping primitives (IdRef, type tags, typeTagFor,
trackableTypeFor) from GameEventProxy into TrackableRef so both the
encoder/decoder path and GameEventProxy can reuse them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends encoder replacement (previously server→client only) to the
client→server path. The client encoder now replaces CardView/PlayerView
with lightweight IdRef markers in protocol method args like selectCard,
and the server decoder resolves them from the tracker.

Testing revealed that after a zone change (e.g. playing a land from
hand), the server tracker holds a stale CardView with the old zone,
causing Game.findByView to search the wrong zone and return null. Added
a fallback global search in findByView when the zone-specific search
misses, rather than modifying tracker update behavior which has wide
blast radius.

Also removes GameEventProxy — raw GameEvent objects travel inside
DeltaPacket, with replacement handled by the encoder/decoder. Confirmed
by batch testing (0 dropped events).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…coder

Events embedded in DeltaPacket are excluded from encoder IdRef replacement
(to protect state data), so their TrackableObject references arrive as
raw deserialized instances with null trackers. When event handlers call
TrackableTypes.lookup(), the null tracker causes NPE.

Added a fallback in TrackableSerializer.resolve() that detects raw
TrackableObjects with no tracker and resolves them to the canonical
tracker instance, matching what GameEventProxy.unwrapAll previously did.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ProtocolMethod method = event.getMethod();
return method != ProtocolMethod.setGameView
&& method != ProtocolMethod.openView
&& method != ProtocolMethod.applyDelta;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When removing the GameEventProxy I didn't expect having to exclude deltas:
if their carried events don't get ID replacements it seems too much of a regression (and the other DeltaPacket fields should be unaffected anyway?)

Wasn't this already a smaller risk with the old architecture anyway which we accepted?

Copy link
Copy Markdown
Owner Author

@MostCromulent MostCromulent Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See latest commit.

Events bundled in applyDelta were excluded from encoder-level IdRef replacement because of a fundamental decoder timing issue: the Netty deserializer necessarily runs first - ObjectInputStream.resolveObject() executes during deserialization of the entire DeltaPacket, before the application code ever sees it. By the time applyDelta creates new objects in the client tracker, IdRef resolution has already happened. Events that reference cards created in the same delta (e.g. a GameEventTokenCreated for a token whose CardView is in newObjects) resolve their IdRefs to null, causing NPEs.

Removing the exclusion and running the 10-game batch immediately hits Cannot invoke "CardView.getCurrentState()" because "card" is null - events with unresolvable IdRefs for objects not yet in the tracker.

The old GameEventProxy avoided this by wrapping events to byte arrays (with IdRef replacement) before they entered the DeltaPacket, then unwrapping them after state application when the tracker was populated -sidestepping the deserializer entirely.

The fix is to restore that pattern inside TrackableSerializer - wrapEvents() serializes each event with IdRef/StaleCardRef replacement at DeltaPacket construction time, and unwrapEvents() deserializes with resolution after phases 1-2 of applyDelta. All IdRef replacement logic stays in one class, the encoder exclusion is removed, the raw-TrackableObject fallback in resolve() is removed, and events get proper stale reference detection. 10-game batch passes clean.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I understand the timing problem with unwrapping but:
we just recently changed it to happen before Phase 1 for delta in 43070f7

Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already? 🤔

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already?

Confirming I'm looking into this.

Very aggravating situation where AI has come to different conclusions in different context windows, so I'm trying to resolve the disagreement by running diagnostic logging on the unwrap timing here vs in delta branch to work out which is correct / if that change in delta was redundant.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed: the unwrap timing change in 43070f7 was redundant. The problem it was trying to fix was already solved by the StaleCardRef fix in that same commit.

Post-delta unwrapping is architecturally correct. Pre-unwrapping introduces risk of silent event drops which were seen with new diagnostic logging when testing delta branch today, although these seem to be benign and didn't result in any visually observable desync or UI issues because silent event drop is almost immediately corrected by the gameview state update.

event-unwrap-timing-investigation.md

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also separately an interesting observation when I re-ran the old test scenario involving cloning Leafkin Avenger, setting off the ability, and then flickering it:

Both the stack and log entries are identical between host and client except for the client's log card image when casting clone: host shows image for Leafkin Avenger (the clone target) while client shows Clone itself.

Key point: Client's log entry is more accurate here / host seems to be wrong?

Screenshot 2026-04-07 063252

This is minor and can be dealt with separately from this PR, but raises question of whether the host game log image construction should use similar StaleCardRef architecture to remote client.

  • The host has the actual game state. GameLogFormatter.visit(GameEventSpellAbilityCast) fires synchronously when Clone is cast and constructs a GameLogEntry whose sourceCard is a live reference to the host's CardView for Clone.
  • Clone then resolves and copies Leafkin Avenger. The card engine mutates the same CardView Java instance in place — it sets name to "Leafkin Avenger" and imgKey to Leafkin Avenger's image key.
  • When the desktop log panel paints the entry, CachedCardImage calls card.getCurrentState().getImageKey(...) on the live reference, which now returns the post-mutation values. The log row shows "you cast Leafkin Avenger" with Leafkin Avenger's image — even though that's not what was actually cast.
  • Both host and client actually agree about the event ("Player cast spell with id=X"). They disagree about the card image attached to the log entry because they resolve sourceCard to two different things: a mutable live reference (host) vs. a frozen detached snapshot (client).

(This behavior is the same in both networkplay/main and this branch because its a product of StaleCardRef rather than the unwrap timing.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analysis seems reasonable - ideally this has reached its final size
a cleanup of the GameLog stuff can be attempted later
(looks like unfortunately it might not be possible to transfer this PR to upstream as is, so to preserve more of the workflow I'll investigate further after delta lands)

Copy link
Copy Markdown
Owner Author

@MostCromulent MostCromulent Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm - do you want me to merge this into delta branch now (before delta is merged to master)?

Or hold this until after and re-target a PR at master once delta is merged in?

(Edit: if we don't want to merge this in entirety to delta branch, should we at least fix the event unwrap timing on delta per analysis above?)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to touch delta this close to the finish line or delay its schedule:

  • my previous assessment ("too complex to reasonably merge together") is still valid
  • but the merged version will still be disarmed and therefore only send events on their own
  • so the timing fix would only need to be cherry picked after that - though maybe just merging this fully before arming is now even better :)

Events bundled in DeltaPacket were excluded from encoder-level IdRef
replacement to avoid a decoder timing issue: the Netty decoder resolves
IdRefs during deserialization, before applyDelta creates new objects in
the client tracker. This caused events to travel as raw TrackableObjects,
losing StaleCardRef detection and inflating wire size.

Restore the wrap/unwrap pattern from the removed GameEventProxy, now in
TrackableSerializer: events are serialized to byte arrays with IdRef
replacement at DeltaPacket construction time (server), and deserialized
with resolution after state application (client), when the tracker is
fully populated. This gives events proper IdRef/StaleCardRef handling
without the decoder timing problem.

- Remove applyDelta exclusion from shouldReplaceTrackables
- Remove raw-TrackableObject fallback from TrackableSerializer.resolve
- Add wrapEvents/unwrapEvents to TrackableSerializer
- Wrap events in both delta+events and events-only paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
return var5;
}

private static class ResolvingObjectInputStream extends ObjectInputStream {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems redundant to TrackableSerializer.ReplacingOutputStream now

same for Encoder variant

* Simple replacement — no tracker, no stale detection.
* Used by the client encoder (which has no game-state awareness).
*/
static Object replace(Object obj) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method seems redundant, can just always call the overloaded variant with null tracker

@Override
protected Object resolveObject(Object obj) {
return resolve(obj, tracker);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve conflict by adding fix of Card-Forge#10304 here now?

Resolve conflict on GameEventProxy.java (deleted on this branch, modified
on NetworkPlay/main by Card-Forge#10304) by porting Card-Forge#10304's serialVersionUID fix
into TrackableSerializer.ResolvingInputStream — the event deserialization
path that replaced GameEventProxy.
…verload

Address PR #17 review feedback from tool4ever:

- Delete single-arg TrackableSerializer.replace(Object) — verified
  byte-equivalent to replace(obj, null) (the tracker null-check is the
  only difference, and skipping it falls through to the same IdRef
  return). Update CObjectOutputStream and TrackableSerializer's own
  ReplacingOutputStream to use the unified two-arg form.

- Delete CompatibleObjectEncoder.ReplacingObjectOutputStream and
  CompatibleObjectDecoder.ResolvingObjectInputStream — exact duplicates
  of TrackableSerializer.ReplacingOutputStream/ResolvingInputStream
  modulo class qualifier. Make those inner classes package-private and
  reuse them directly from the encoder/decoder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants