fix(meetings): improve handling of websocket reconnections for 5k webinars#4864
fix(meetings): improve handling of websocket reconnections for 5k webinars#4864marcin-bazyl wants to merge 12 commits intowebex:nextfrom
Conversation
| * @param {string} debugText - text to include in logs | ||
| * @returns {Promise} | ||
| */ | ||
| private sendInitializationSyncRequestToLocus( |
There was a problem hiding this comment.
this method is not needed anymore as we're using enqueueSyncForDataset() istead and have the special logic for initialization syncs in performSync() - based on the isInitialization flag in the queued entry
|
This pull request is automatically being deployed by Amplify Hosting (learn more). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30ca445e9d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| replacedAt: replaces?.replacedAt, | ||
| }); | ||
|
|
||
| parser.initializeFromGetLociResponse(locus); |
There was a problem hiding this comment.
Await parser initialization from API response
handleHashTreeParserSwitchForAPIResponse() calls parser.initializeFromGetLociResponse(locus) without awaiting or handling rejection. That initializer propagates failures from getAllVisibleDataSetsFromLocus(), so transient Locus errors during reconnect can raise an unhandled promise rejection and leave a newly created parser active but not actually initialized, which breaks the intended post-reconnect dataset recovery path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
this is fine, there is nothing more we want to do if initializeFromGetLociResponse() fails and if things fail, there will be warnings already logged by code closer to the failures
| locusUrl, | ||
| entry, | ||
| replaces, | ||
| () => entry.parser.resumeFromApiResponse(locus) |
There was a problem hiding this comment.
Handle async resume failures for stopped parsers
The API-response resume path passes () => entry.parser.resumeFromApiResponse(locus) into resumeStoppedParser(), but that callback’s returned promise is not awaited or caught. If resumeFromApiResponse() fails (for example, fetching visible datasets fails), the rejection is unhandled and the parser is treated as reactivated without completing re-initialization, which can leave hash-tree updates stuck after breakout switches.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
that's ok, there is not much more we can do if it fails. Even if re-initialization fails at some point, at least the parser will no longer be in "stopped" state, so further messages can be processed
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b4397602a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const parser = this.createHashTreeParser({ | ||
| locusUrl, | ||
| initialLocus: {locus: null, dataSets: []}, | ||
| metadata: null, | ||
| replacedAt: replaces?.replacedAt, |
There was a problem hiding this comment.
Skip parser switch when API locus lacks visibleDataSets URL
When handleHashTreeParserSwitchForAPIResponse() creates a parser unconditionally for an unknown locusUrl, it can replace a working parser with one that cannot initialize from the same response. createHashTreeParser() immediately stops the current active parser, and initializeFromGetLociResponse() later bails out if locus.links.resources.visibleDataSets.url is missing. In reconnect/sync flows where GET /loci returns a partial locus without that link, this leaves the meeting with no initialized hash-tree parser and hash-tree updates stop until another successful switch occurs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
this comment is invalid - if it's creating a new parser for unknown locusUrl, then it cannot be replacing anything
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9aecb58b38
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| this.state = 'active'; | ||
| this.dataSets = {}; | ||
|
|
There was a problem hiding this comment.
Keep parser stopped until API resume initialization succeeds
resumeFromApiResponse() flips state to active and clears dataSets before initializeFromGetLociResponse() completes, so any initialization failure (including missing visibleDataSets URL or request errors) leaves an active parser with no hash trees. In that state, later hash-tree messages cannot recover the parser because metadata updates are only applied through existing dataset hash trees, so updates get ignored and the meeting can stay desynced after reconnect. Fresh evidence from this change: this method now clears state up front, and parseMessage()/metadata handling requires existing hashTree entries to bootstrap updates.
Useful? React with 👍 / 👎.
COMPLETES #SPARK-790957
This pull request addresses
When we regain Mercury connection, Meetings.syncMeetings() is called which does a GET /loci. For meetings that use hash trees we need to additionally do a sync on each dataset.
by making the following changes
One consequence of using a sync queue now for everything is that previously initializeFromMessage() or initializeFromGetLociResponse() would result in a single callback to LocusInfo.updateFromHashTree() with all the updates in one go after all datasets were synced, now as we're using the queue, as each sync from the queue is processed, the callback LocusInfo.updateFromHashTree() is called with the updates. I think it's fine, I haven't seen any bad effects from this while testing and I think LocusInfo always has to manage to handle partial updates and we make sure that main data set is synced first, followed by self (see sortByInitPriority()), so it should be fine.
Change Type
The following scenarios were tested
unit tests
The GAI Coding Policy And Copyright Annotation Best Practices
I certified that
I have read and followed contributing guidelines
I discussed changes with code owners prior to submitting this pull request
I have not skipped any automated checks
All existing and new tests passed
I have updated the documentation accordingly
Make sure to have followed the contributing guidelines before submitting.