Extend IdRef serialization from game events to protocol method args (delta branch) by MostCromulent · Pull Request #17 · MostCromulent/forge

MostCromulent · 2026-04-05T23:45:14Z

Cherry-pick of fix-protocol-serialization from master applied against NetworkPlay/main

What changed from fix-protocol-serialization

The original PR (fix-protocol-serialization, based on master) added encoder-level IdRef replacement for protocol method args. When rebased onto NetworkPlay/main, several design changes were made:

GameEventProxy removed

Events travel as raw GameEvent objects inside DeltaPacket — no byte-array wrapping. Testing confirmed per-event isolation is unnecessary (0 dropped events across 10+ game batch tests). Event wrapping was also tested and works, but is not needed.

TrackableSerializer replaces TrackableRef + GameEventProxy

Single class with: IdRef, StaleCardRef, type tags, replace()/resolve(), and bandwidth measurement utilities.

Encoder-level replacement enabled with findByView fallback

Infrastructure (shouldReplaceTrackables, ReplacingObjectOutputStream, tracker on encoder/decoder) is enabled. shouldReplaceTrackables excludes setGameView, openView, and applyDelta from replacement. See "Client→server path" below for the stale tracker issue and how it's handled.

Bandwidth measurement

measureReplacedSize() (with IdRef replacement) for delta path, measurePlainSize() (no replacement) for full-state comparison. Important: these must use different serialization modes because the encoder doesn't replace in setGameView messages.

Client→server protocol arg path

The original fix-protocol-serialization PR (master-based) only added encoder replacement for server→client messages. Client→server replacement is new in this branch. Initial testing revealed that enabling it broke client card selection after zone changes — the client could play a land from hand, but could not tap that land for mana on the battlefield.

How master works (no encoder replacement)

On master, the client sends full serialized CardView/PlayerView objects in protocol args (e.g. selectCard(CardView, ...)). The server deserializes these into fresh object instances whose properties reflect the client's current state (zone, controller, etc. — updated by prior setGameView calls). Game.findByView(CardView) uses the view's zone and controller properties to narrow the search to a specific player's zone, then matches by ID. Because the deserialized view carries the client's up-to-date zone info, the search finds the correct Card.

How this branch works (encoder IdRef replacement)

With encoder replacement enabled, the client encoder replaces CardView/PlayerView with lightweight IdRef(typeTag, id) markers. The server decoder resolves these from the server's tracker (gameView.getTracker()), returning the CardView registered in the tracker for that ID.

The stale tracker problem: TrackableTypes.TrackableObjectType.updateObjLookup only registers a new object if the tracker doesn't already have an entry for that ID (!tracker.hasObj(this, newObj.getId())). When a card changes zones, the game engine creates a new Card+CardView with the same ID, but the tracker skips it because the original CardView is already registered. The server decoder therefore returns a stale CardView with the old zone (e.g. zone=Hand after the card moved to Battlefield). findByView searches the wrong zone and misses.

Observed behavior: First client click works (card is still in the zone the tracker knows). After a zone change (e.g. playing a land), subsequent clicks silently fail — findByView returns null, selectCard returns false, and the game ignores the input.

Design decision: fallback search vs tracker update

Two solutions were considered:

Fallback in findByView (adopted): When zone-specific search misses, fall back to global ID search across all zones. Safe, minimal blast radius — only triggers on a miss, and the global search was already the existing fallback for views with null zone/controller.
Update tracker entries on zone change: Remove the !hasObj guard in updateObjLookup so new CardViews always overwrite old entries, or explicitly refresh tracker entries during zone changes. This would be the "correct" fix but risks unintended side effects — the !hasObj guard exists for a reason (prevents overwriting during copyChangedProps and initial updateObjLookup), and the tracker is used by both the delta sync system and the view update system. Changing its behavior could cause subtle breakage elsewhere.

The fallback approach was chosen because the tracker is a shared component with wide usage, and the stale-entry issue only manifests during encoder IdRef resolution. The cost of the fallback (an extra linear scan on miss) is negligible — it only occurs for protocol args after a zone change, which is infrequent.

Testing

Manual testing

Network game (server + client): verified client can play lands from hand, tap lands for mana, cast creatures, and interact normally across multiple turns with encoder replacement enabled.

Automated batch testing

Note: Batch tests use headless AI-vs-AI games — they exercise game logic, delta sync, and serialization but do not run a GUI. GUI-driven protocol methods (e.g. getChoices, confirm, dialog-based interactions) are only covered by manual testing.

10-game quick

10/10 games passed (100%), 0 errors, 0 checksum mismatches, 93.2% bandwidth savings

100-game comprehensive

99/100 games passed (99%), 0 errors, 0 send errors, 0 checksum mismatches
93.2% bandwidth savings overall, 96.0% state-only
2p: 50/50 (100%), 3p: 30/30 (100%), 4p: 19/20 (95%)
1 incomplete game was a 4p timeout (no crash or error)
0 dropped events, 0 protocol errors across all 100 games

🤖 Generated with Claude Code

Protocol methods were serializing the entire CardView/PlayerView object graph for each argument. Add IdRef replacement at the Netty encoder/decoder level: the server encoder swaps CardView/PlayerView with lightweight IdRef(typeTag, id) markers, and the client decoder resolves them from the Tracker. setGameView and openView are excluded (they carry the full state). Extract shared ID-mapping primitives (IdRef, type tags, typeTagFor, trackableTypeFor) from GameEventProxy into TrackableRef so both the encoder/decoder path and GameEventProxy can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extends encoder replacement (previously server→client only) to the client→server path. The client encoder now replaces CardView/PlayerView with lightweight IdRef markers in protocol method args like selectCard, and the server decoder resolves them from the tracker. Testing revealed that after a zone change (e.g. playing a land from hand), the server tracker holds a stale CardView with the old zone, causing Game.findByView to search the wrong zone and return null. Added a fallback global search in findByView when the zone-specific search misses, rather than modifying tracker update behavior which has wide blast radius. Also removes GameEventProxy — raw GameEvent objects travel inside DeltaPacket, with replacement handled by the encoder/decoder. Confirmed by batch testing (0 dropped events). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…coder Events embedded in DeltaPacket are excluded from encoder IdRef replacement (to protect state data), so their TrackableObject references arrive as raw deserialized instances with null trackers. When event handlers call TrackableTypes.lookup(), the null tracker causes NPE. Added a fallback in TrackableSerializer.resolve() that detects raw TrackableObjects with no tracker and resolves them to the canonical tracker instance, matching what GameEventProxy.unwrapAll previously did. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tool4ever · 2026-04-06T07:14:53Z

forge-gui/src/main/java/forge/gamemodes/net/CompatibleObjectEncoder.java

+            ProtocolMethod method = event.getMethod();
+            return method != ProtocolMethod.setGameView
+                    && method != ProtocolMethod.openView
+                    && method != ProtocolMethod.applyDelta;


When removing the GameEventProxy I didn't expect having to exclude deltas:
if their carried events don't get ID replacements it seems too much of a regression (and the other DeltaPacket fields should be unaffected anyway?)

Wasn't this already a smaller risk with the old architecture anyway which we accepted?

See latest commit.

Events bundled in applyDelta were excluded from encoder-level IdRef replacement because of a fundamental decoder timing issue: the Netty deserializer necessarily runs first - ObjectInputStream.resolveObject() executes during deserialization of the entire DeltaPacket, before the application code ever sees it. By the time applyDelta creates new objects in the client tracker, IdRef resolution has already happened. Events that reference cards created in the same delta (e.g. a GameEventTokenCreated for a token whose CardView is in newObjects) resolve their IdRefs to null, causing NPEs.

Removing the exclusion and running the 10-game batch immediately hits Cannot invoke "CardView.getCurrentState()" because "card" is null - events with unresolvable IdRefs for objects not yet in the tracker.

The old GameEventProxy avoided this by wrapping events to byte arrays (with IdRef replacement) before they entered the DeltaPacket, then unwrapping them after state application when the tracker was populated -sidestepping the deserializer entirely.

The fix is to restore that pattern inside TrackableSerializer - wrapEvents() serializes each event with IdRef/StaleCardRef replacement at DeltaPacket construction time, and unwrapEvents() deserializes with resolution after phases 1-2 of applyDelta. All IdRef replacement logic stays in one class, the encoder exclusion is removed, the raw-TrackableObject fallback in resolve() is removed, and events get proper stale reference detection. 10-game batch passes clean.

ok, I understand the timing problem with unwrapping but:
we just recently changed it to happen before Phase 1 for delta in 43070f7

Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already? 🤔

Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already?

Confirming I'm looking into this.

Very aggravating situation where AI has come to different conclusions in different context windows, so I'm trying to resolve the disagreement by running diagnostic logging on the unwrap timing here vs in delta branch to work out which is correct / if that change in delta was redundant.

Confirmed: the unwrap timing change in 43070f7 was redundant. The problem it was trying to fix was already solved by the StaleCardRef fix in that same commit.

Post-delta unwrapping is architecturally correct. Pre-unwrapping introduces risk of silent event drops which were seen with new diagnostic logging when testing delta branch today, although these seem to be benign and didn't result in any visually observable desync or UI issues because silent event drop is almost immediately corrected by the gameview state update.

event-unwrap-timing-investigation.md

Also separately an interesting observation when I re-ran the old test scenario involving cloning Leafkin Avenger, setting off the ability, and then flickering it:

Both the stack and log entries are identical between host and client except for the client's log card image when casting clone: host shows image for Leafkin Avenger (the clone target) while client shows Clone itself.

Key point: Client's log entry is more accurate here / host seems to be wrong?

This is minor and can be dealt with separately from this PR, but raises question of whether the host game log image construction should use similar StaleCardRef architecture to remote client.

The host has the actual game state. GameLogFormatter.visit(GameEventSpellAbilityCast) fires synchronously when Clone is cast and constructs a GameLogEntry whose sourceCard is a live reference to the host's CardView for Clone.

Clone then resolves and copies Leafkin Avenger. The card engine mutates the same CardView Java instance in place — it sets name to "Leafkin Avenger" and imgKey to Leafkin Avenger's image key.

When the desktop log panel paints the entry, CachedCardImage calls card.getCurrentState().getImageKey(...) on the live reference, which now returns the post-mutation values. The log row shows "you cast Leafkin Avenger" with Leafkin Avenger's image — even though that's not what was actually cast.

Both host and client actually agree about the event ("Player cast spell with id=X"). They disagree about the card image attached to the log entry because they resolve sourceCard to two different things: a mutable live reference (host) vs. a frozen detached snapshot (client).

(This behavior is the same in both networkplay/main and this branch because its a product of StaleCardRef rather than the unwrap timing.)

analysis seems reasonable - ideally this has reached its final size
a cleanup of the GameLog stuff can be attempted later
(looks like unfortunately it might not be possible to transfer this PR to upstream as is, so to preserve more of the workflow I'll investigate further after delta lands)

So just to confirm - do you want me to merge this into delta branch now (before delta is merged to master)?

Or hold this until after and re-target a PR at master once delta is merged in?

(Edit: if we don't want to merge this in entirety to delta branch, should we at least fix the event unwrap timing on delta per analysis above?)

there's no need to touch delta this close to the finish line or delay its schedule:

my previous assessment ("too complex to reasonably merge together") is still valid

but the merged version will still be disarmed and therefore only send events on their own

so the timing fix would only need to be cherry picked after that - though maybe just merging this fully before arming is now even better :)

Events bundled in DeltaPacket were excluded from encoder-level IdRef replacement to avoid a decoder timing issue: the Netty decoder resolves IdRefs during deserialization, before applyDelta creates new objects in the client tracker. This caused events to travel as raw TrackableObjects, losing StaleCardRef detection and inflating wire size. Restore the wrap/unwrap pattern from the removed GameEventProxy, now in TrackableSerializer: events are serialized to byte arrays with IdRef replacement at DeltaPacket construction time (server), and deserialized with resolution after state application (client), when the tracker is fully populated. This gives events proper IdRef/StaleCardRef handling without the decoder timing problem. - Remove applyDelta exclusion from shouldReplaceTrackables - Remove raw-TrackableObject fallback from TrackableSerializer.resolve - Add wrapEvents/unwrapEvents to TrackableSerializer - Wrap events in both delta+events and events-only paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tool4ever · 2026-04-06T11:10:26Z

forge-gui/src/main/java/forge/gamemodes/net/CompatibleObjectDecoder.java

        return var5;
    }
+
+    private static class ResolvingObjectInputStream extends ObjectInputStream {


this seems redundant to TrackableSerializer.ReplacingOutputStream now

same for Encoder variant

tool4ever · 2026-04-06T11:11:37Z

forge-gui/src/main/java/forge/gamemodes/net/TrackableSerializer.java

+     * Simple replacement — no tracker, no stale detection.
+     * Used by the client encoder (which has no game-state awareness).
+     */
+    static Object replace(Object obj) {


this method seems redundant, can just always call the overloaded variant with null tracker

tool4ever · 2026-04-06T12:33:33Z

forge-gui/src/main/java/forge/gamemodes/net/TrackableSerializer.java

+        @Override
+        protected Object resolveObject(Object obj) {
+            return resolve(obj, tracker);
+        }


resolve conflict by adding fix of Card-Forge#10304 here now?

Resolve conflict on GameEventProxy.java (deleted on this branch, modified on NetworkPlay/main by Card-Forge#10304) by porting Card-Forge#10304's serialVersionUID fix into TrackableSerializer.ResolvingInputStream — the event deserialization path that replaced GameEventProxy.

…verload Address PR #17 review feedback from tool4ever: - Delete single-arg TrackableSerializer.replace(Object) — verified byte-equivalent to replace(obj, null) (the tracker null-check is the only difference, and skipping it falls through to the same IdRef return). Update CObjectOutputStream and TrackableSerializer's own ReplacingOutputStream to use the unified two-arg form. - Delete CompatibleObjectEncoder.ReplacingObjectOutputStream and CompatibleObjectDecoder.ResolvingObjectInputStream — exact duplicates of TrackableSerializer.ReplacingOutputStream/ResolvingInputStream modulo class qualifier. Make those inner classes package-private and reuse them directly from the encoder/decoder.

MostCromulent and others added 3 commits April 6, 2026 06:37

Cleanup: remove stale DIAG comment, fix FQN ArrayList usage

47f5341

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MostCromulent marked this pull request as draft April 5, 2026 23:47

MostCromulent mentioned this pull request Apr 5, 2026

Extend IdRef serialization from game events to protocol method args Card-Forge/forge#10287

Closed

tool4ever reviewed Apr 6, 2026

View reviewed changes

MostCromulent added 2 commits April 7, 2026 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend IdRef serialization from game events to protocol method args (delta branch)#17

Extend IdRef serialization from game events to protocol method args (delta branch)#17
MostCromulent wants to merge 7 commits intoNetworkPlay/mainfrom
fix-protocol-serialization-delta

MostCromulent commented Apr 5, 2026 •

edited

Loading

Uh oh!

tool4ever Apr 6, 2026

Uh oh!

MostCromulent Apr 6, 2026 •

edited

Loading

Uh oh!

tool4ever Apr 6, 2026

Uh oh!

MostCromulent Apr 6, 2026

Uh oh!

MostCromulent Apr 6, 2026

Uh oh!

MostCromulent Apr 6, 2026

Uh oh!

tool4ever Apr 7, 2026

Uh oh!

MostCromulent Apr 7, 2026 •

edited

Loading

Uh oh!

tool4ever Apr 7, 2026

Uh oh!

tool4ever Apr 6, 2026

Uh oh!

tool4ever Apr 6, 2026

Uh oh!

tool4ever Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MostCromulent commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed from fix-protocol-serialization

GameEventProxy removed

TrackableSerializer replaces TrackableRef + GameEventProxy

Encoder-level replacement enabled with findByView fallback

Bandwidth measurement

Client→server protocol arg path

How master works (no encoder replacement)

How this branch works (encoder IdRef replacement)

Design decision: fallback search vs tracker update

Testing

Manual testing

Automated batch testing

10-game quick

100-game comprehensive

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MostCromulent Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MostCromulent Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MostCromulent commented Apr 5, 2026 •

edited

Loading

MostCromulent Apr 6, 2026 •

edited

Loading

MostCromulent Apr 7, 2026 •

edited

Loading