Extend IdRef serialization from game events to protocol method args (delta branch)#17
Extend IdRef serialization from game events to protocol method args (delta branch)#17MostCromulent wants to merge 7 commits intoNetworkPlay/mainfrom
Conversation
Protocol methods were serializing the entire CardView/PlayerView object graph for each argument. Add IdRef replacement at the Netty encoder/decoder level: the server encoder swaps CardView/PlayerView with lightweight IdRef(typeTag, id) markers, and the client decoder resolves them from the Tracker. setGameView and openView are excluded (they carry the full state). Extract shared ID-mapping primitives (IdRef, type tags, typeTagFor, trackableTypeFor) from GameEventProxy into TrackableRef so both the encoder/decoder path and GameEventProxy can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends encoder replacement (previously server→client only) to the client→server path. The client encoder now replaces CardView/PlayerView with lightweight IdRef markers in protocol method args like selectCard, and the server decoder resolves them from the tracker. Testing revealed that after a zone change (e.g. playing a land from hand), the server tracker holds a stale CardView with the old zone, causing Game.findByView to search the wrong zone and return null. Added a fallback global search in findByView when the zone-specific search misses, rather than modifying tracker update behavior which has wide blast radius. Also removes GameEventProxy — raw GameEvent objects travel inside DeltaPacket, with replacement handled by the encoder/decoder. Confirmed by batch testing (0 dropped events). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…coder Events embedded in DeltaPacket are excluded from encoder IdRef replacement (to protect state data), so their TrackableObject references arrive as raw deserialized instances with null trackers. When event handlers call TrackableTypes.lookup(), the null tracker causes NPE. Added a fallback in TrackableSerializer.resolve() that detects raw TrackableObjects with no tracker and resolves them to the canonical tracker instance, matching what GameEventProxy.unwrapAll previously did. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ProtocolMethod method = event.getMethod(); | ||
| return method != ProtocolMethod.setGameView | ||
| && method != ProtocolMethod.openView | ||
| && method != ProtocolMethod.applyDelta; |
There was a problem hiding this comment.
When removing the GameEventProxy I didn't expect having to exclude deltas:
if their carried events don't get ID replacements it seems too much of a regression (and the other DeltaPacket fields should be unaffected anyway?)
Wasn't this already a smaller risk with the old architecture anyway which we accepted?
There was a problem hiding this comment.
See latest commit.
Events bundled in
applyDeltawere excluded from encoder-level IdRef replacement because of a fundamental decoder timing issue: the Netty deserializer necessarily runs first -ObjectInputStream.resolveObject()executes during deserialization of the entire DeltaPacket, before the application code ever sees it. By the timeapplyDeltacreates new objects in the client tracker, IdRef resolution has already happened. Events that reference cards created in the same delta (e.g. aGameEventTokenCreatedfor a token whose CardView is innewObjects) resolve their IdRefs to null, causing NPEs.Removing the exclusion and running the 10-game batch immediately hits
Cannot invoke "CardView.getCurrentState()" because "card" is null- events with unresolvable IdRefs for objects not yet in the tracker.The old
GameEventProxyavoided this by wrapping events to byte arrays (with IdRef replacement) before they entered the DeltaPacket, then unwrapping them after state application when the tracker was populated -sidestepping the deserializer entirely.The fix is to restore that pattern inside
TrackableSerializer-wrapEvents()serializes each event with IdRef/StaleCardRef replacement at DeltaPacket construction time, andunwrapEvents()deserializes with resolution after phases 1-2 ofapplyDelta. All IdRef replacement logic stays in one class, the encoder exclusion is removed, the raw-TrackableObject fallback inresolve()is removed, and events get proper stale reference detection. 10-game batch passes clean.
There was a problem hiding this comment.
ok, I understand the timing problem with unwrapping but:
we just recently changed it to happen before Phase 1 for delta in 43070f7
Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already? 🤔
There was a problem hiding this comment.
Either that change was (partly) wrong or I'm missing something - but shouldn't this problem then also happened there already?
Confirming I'm looking into this.
Very aggravating situation where AI has come to different conclusions in different context windows, so I'm trying to resolve the disagreement by running diagnostic logging on the unwrap timing here vs in delta branch to work out which is correct / if that change in delta was redundant.
There was a problem hiding this comment.
Confirmed: the unwrap timing change in 43070f7 was redundant. The problem it was trying to fix was already solved by the StaleCardRef fix in that same commit.
Post-delta unwrapping is architecturally correct. Pre-unwrapping introduces risk of silent event drops which were seen with new diagnostic logging when testing delta branch today, although these seem to be benign and didn't result in any visually observable desync or UI issues because silent event drop is almost immediately corrected by the gameview state update.
There was a problem hiding this comment.
Also separately an interesting observation when I re-ran the old test scenario involving cloning Leafkin Avenger, setting off the ability, and then flickering it:
Both the stack and log entries are identical between host and client except for the client's log card image when casting clone: host shows image for Leafkin Avenger (the clone target) while client shows Clone itself.
Key point: Client's log entry is more accurate here / host seems to be wrong?
This is minor and can be dealt with separately from this PR, but raises question of whether the host game log image construction should use similar StaleCardRef architecture to remote client.
- The host has the actual game state. GameLogFormatter.visit(GameEventSpellAbilityCast) fires synchronously when Clone is cast and constructs a GameLogEntry whose sourceCard is a live reference to the host's CardView for Clone.
- Clone then resolves and copies Leafkin Avenger. The card engine mutates the same CardView Java instance in place — it sets name to "Leafkin Avenger" and imgKey to Leafkin Avenger's image key.
- When the desktop log panel paints the entry, CachedCardImage calls card.getCurrentState().getImageKey(...) on the live reference, which now returns the post-mutation values. The log row shows "you cast Leafkin Avenger" with Leafkin Avenger's image — even though that's not what was actually cast.
- Both host and client actually agree about the event ("Player cast spell with id=X"). They disagree about the card image attached to the log entry because they resolve sourceCard to two different things: a mutable live reference (host) vs. a frozen detached snapshot (client).
(This behavior is the same in both networkplay/main and this branch because its a product of StaleCardRef rather than the unwrap timing.)
There was a problem hiding this comment.
analysis seems reasonable - ideally this has reached its final size
a cleanup of the GameLog stuff can be attempted later
(looks like unfortunately it might not be possible to transfer this PR to upstream as is, so to preserve more of the workflow I'll investigate further after delta lands)
There was a problem hiding this comment.
So just to confirm - do you want me to merge this into delta branch now (before delta is merged to master)?
Or hold this until after and re-target a PR at master once delta is merged in?
(Edit: if we don't want to merge this in entirety to delta branch, should we at least fix the event unwrap timing on delta per analysis above?)
There was a problem hiding this comment.
there's no need to touch delta this close to the finish line or delay its schedule:
- my previous assessment ("too complex to reasonably merge together") is still valid
- but the merged version will still be disarmed and therefore only send events on their own
- so the timing fix would only need to be cherry picked after that - though maybe just merging this fully before arming is now even better :)
Events bundled in DeltaPacket were excluded from encoder-level IdRef replacement to avoid a decoder timing issue: the Netty decoder resolves IdRefs during deserialization, before applyDelta creates new objects in the client tracker. This caused events to travel as raw TrackableObjects, losing StaleCardRef detection and inflating wire size. Restore the wrap/unwrap pattern from the removed GameEventProxy, now in TrackableSerializer: events are serialized to byte arrays with IdRef replacement at DeltaPacket construction time (server), and deserialized with resolution after state application (client), when the tracker is fully populated. This gives events proper IdRef/StaleCardRef handling without the decoder timing problem. - Remove applyDelta exclusion from shouldReplaceTrackables - Remove raw-TrackableObject fallback from TrackableSerializer.resolve - Add wrapEvents/unwrapEvents to TrackableSerializer - Wrap events in both delta+events and events-only paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| return var5; | ||
| } | ||
|
|
||
| private static class ResolvingObjectInputStream extends ObjectInputStream { |
There was a problem hiding this comment.
this seems redundant to TrackableSerializer.ReplacingOutputStream now
same for Encoder variant
| * Simple replacement — no tracker, no stale detection. | ||
| * Used by the client encoder (which has no game-state awareness). | ||
| */ | ||
| static Object replace(Object obj) { |
There was a problem hiding this comment.
this method seems redundant, can just always call the overloaded variant with null tracker
| @Override | ||
| protected Object resolveObject(Object obj) { | ||
| return resolve(obj, tracker); | ||
| } |
There was a problem hiding this comment.
resolve conflict by adding fix of Card-Forge#10304 here now?
Resolve conflict on GameEventProxy.java (deleted on this branch, modified on NetworkPlay/main by Card-Forge#10304) by porting Card-Forge#10304's serialVersionUID fix into TrackableSerializer.ResolvingInputStream — the event deserialization path that replaced GameEventProxy.
…verload Address PR #17 review feedback from tool4ever: - Delete single-arg TrackableSerializer.replace(Object) — verified byte-equivalent to replace(obj, null) (the tracker null-check is the only difference, and skipping it falls through to the same IdRef return). Update CObjectOutputStream and TrackableSerializer's own ReplacingOutputStream to use the unified two-arg form. - Delete CompatibleObjectEncoder.ReplacingObjectOutputStream and CompatibleObjectDecoder.ResolvingObjectInputStream — exact duplicates of TrackableSerializer.ReplacingOutputStream/ResolvingInputStream modulo class qualifier. Make those inner classes package-private and reuse them directly from the encoder/decoder.
Cherry-pick of fix-protocol-serialization from master applied against NetworkPlay/main
What changed from fix-protocol-serialization
The original PR (fix-protocol-serialization, based on master) added encoder-level IdRef replacement for protocol method args. When rebased onto NetworkPlay/main, several design changes were made:
GameEventProxy removed
Events travel as raw
GameEventobjects insideDeltaPacket— no byte-array wrapping. Testing confirmed per-event isolation is unnecessary (0 dropped events across 10+ game batch tests). Event wrapping was also tested and works, but is not needed.TrackableSerializer replaces TrackableRef + GameEventProxy
Single class with:
IdRef,StaleCardRef, type tags,replace()/resolve(), and bandwidth measurement utilities.Encoder-level replacement enabled with findByView fallback
Infrastructure (
shouldReplaceTrackables,ReplacingObjectOutputStream, tracker on encoder/decoder) is enabled.shouldReplaceTrackablesexcludessetGameView,openView, andapplyDeltafrom replacement. See "Client→server path" below for the stale tracker issue and how it's handled.Bandwidth measurement
measureReplacedSize()(with IdRef replacement) for delta path,measurePlainSize()(no replacement) for full-state comparison. Important: these must use different serialization modes because the encoder doesn't replace insetGameViewmessages.Client→server protocol arg path
The original
fix-protocol-serializationPR (master-based) only added encoder replacement for server→client messages. Client→server replacement is new in this branch. Initial testing revealed that enabling it broke client card selection after zone changes — the client could play a land from hand, but could not tap that land for mana on the battlefield.How master works (no encoder replacement)
On master, the client sends full serialized CardView/PlayerView objects in protocol args (e.g.
selectCard(CardView, ...)). The server deserializes these into fresh object instances whose properties reflect the client's current state (zone, controller, etc. — updated by priorsetGameViewcalls).Game.findByView(CardView)uses the view'szoneandcontrollerproperties to narrow the search to a specific player's zone, then matches by ID. Because the deserialized view carries the client's up-to-date zone info, the search finds the correct Card.How this branch works (encoder IdRef replacement)
With encoder replacement enabled, the client encoder replaces CardView/PlayerView with lightweight
IdRef(typeTag, id)markers. The server decoder resolves these from the server's tracker (gameView.getTracker()), returning the CardView registered in the tracker for that ID.The stale tracker problem:
TrackableTypes.TrackableObjectType.updateObjLookuponly registers a new object if the tracker doesn't already have an entry for that ID (!tracker.hasObj(this, newObj.getId())). When a card changes zones, the game engine creates a new Card+CardView with the same ID, but the tracker skips it because the original CardView is already registered. The server decoder therefore returns a stale CardView with the old zone (e.g.zone=Handafter the card moved to Battlefield).findByViewsearches the wrong zone and misses.Observed behavior: First client click works (card is still in the zone the tracker knows). After a zone change (e.g. playing a land), subsequent clicks silently fail —
findByViewreturns null,selectCardreturns false, and the game ignores the input.Design decision: fallback search vs tracker update
Two solutions were considered:
Fallback in
findByView(adopted): When zone-specific search misses, fall back to global ID search across all zones. Safe, minimal blast radius — only triggers on a miss, and the global search was already the existing fallback for views with null zone/controller.Update tracker entries on zone change: Remove the
!hasObjguard inupdateObjLookupso new CardViews always overwrite old entries, or explicitly refresh tracker entries during zone changes. This would be the "correct" fix but risks unintended side effects — the!hasObjguard exists for a reason (prevents overwriting duringcopyChangedPropsand initialupdateObjLookup), and the tracker is used by both the delta sync system and the view update system. Changing its behavior could cause subtle breakage elsewhere.The fallback approach was chosen because the tracker is a shared component with wide usage, and the stale-entry issue only manifests during encoder IdRef resolution. The cost of the fallback (an extra linear scan on miss) is negligible — it only occurs for protocol args after a zone change, which is infrequent.
Testing
Manual testing
Automated batch testing
Note: Batch tests use headless AI-vs-AI games — they exercise game logic, delta sync, and serialization but do not run a GUI. GUI-driven protocol methods (e.g.
getChoices,confirm, dialog-based interactions) are only covered by manual testing.10-game quick
100-game comprehensive
🤖 Generated with Claude Code