Skip to content

fix(meetings): improve handling of websocket reconnections for 5k webinars#4864

Open
marcin-bazyl wants to merge 12 commits intowebex:nextfrom
marcin-bazyl:spark-790957
Open

fix(meetings): improve handling of websocket reconnections for 5k webinars#4864
marcin-bazyl wants to merge 12 commits intowebex:nextfrom
marcin-bazyl:spark-790957

Conversation

@marcin-bazyl
Copy link
Copy Markdown
Collaborator

@marcin-bazyl marcin-bazyl commented Apr 15, 2026

COMPLETES #SPARK-790957

This pull request addresses

When we regain Mercury connection, Meetings.syncMeetings() is called which does a GET /loci. For meetings that use hash trees we need to additionally do a sync on each dataset.

by making the following changes

  1. added code to Meetings.syncMeetings() to sync hash tree data sets
  2. fixed Meetings.syncMeetings() so that it doesn't remove a meeting if we were moved between breakouts while being offline (it's a fix for a bug I found while working on this feature)
  3. since we already get a locus (but only a partial one) as output of GET /loci that's done by Meetings.syncMeetings(), so I also made sure that we use that partial locus to update our hash tree based LocusInfo (this will be quicker than the syncs that get kicked off), but since the format of that API response is not like a hash tree message but like a normal API, then I had to add code for handling cases of creating new HashTreeParser or resuming a stopped one like we had for messages but now also for API responses
  4. the new method syncAllDatasets() is still needed, because GET /loci doesn't return a full locus, so we still need to try sync all datasets and syncAllDatasets() does a sync of all data sets sequentially, so I've introduced a queue for syncs and made sure that all existing code for syncs also uses that queue, so now any sync is always queued in that queue.

One consequence of using a sync queue now for everything is that previously initializeFromMessage() or initializeFromGetLociResponse() would result in a single callback to LocusInfo.updateFromHashTree() with all the updates in one go after all datasets were synced, now as we're using the queue, as each sync from the queue is processed, the callback LocusInfo.updateFromHashTree() is called with the updates. I think it's fine, I haven't seen any bad effects from this while testing and I think LocusInfo always has to manage to handle partial updates and we make sure that main data set is synced first, followed by self (see sortByInitPriority()), so it should be fine.

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Tooling change
  • Internal code refactor

The following scenarios were tested

unit tests

The GAI Coding Policy And Copyright Annotation Best Practices

  • GAI was not used (or, no additional notation is required)
  • Code was generated entirely by GAI
  • GAI was used to create a draft that was subsequently customized or modified
  • Coder created a draft manually that was non-substantively modified by GAI (e.g., refactoring was performed by GAI on manually written code)
  • Tool used for AI assistance (GitHub Copilot / Other - specify)
    • Github Copilot
    • Other - Please Specify
  • This PR is related to
    • Feature
    • Defect fix
    • Tech Debt
    • Automation

I certified that

  • I have read and followed contributing guidelines

  • I discussed changes with code owners prior to submitting this pull request

  • I have not skipped any automated checks

  • All existing and new tests passed

  • I have updated the documentation accordingly


Make sure to have followed the contributing guidelines before submitting.

* @param {string} debugText - text to include in logs
* @returns {Promise}
*/
private sendInitializationSyncRequestToLocus(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not needed anymore as we're using enqueueSyncForDataset() istead and have the special logic for initialization syncs in performSync() - based on the isInitialization flag in the queued entry

@aws-amplify-us-east-2
Copy link
Copy Markdown

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-4864.d3m3l2kee0btzx.amplifyapp.com

@rsarika rsarika changed the title fix(meetings): improve handling of websocket reconnections for 5k webinars fix(calling-sdk): improve handling of websocket reconnections for 5k webinars Apr 15, 2026
@rsarika rsarika changed the title fix(calling-sdk): improve handling of websocket reconnections for 5k webinars fix(meetings): improve handling of websocket reconnections for 5k webinars Apr 15, 2026
@marcin-bazyl marcin-bazyl marked this pull request as ready for review April 15, 2026 15:18
@marcin-bazyl marcin-bazyl requested review from a team as code owners April 15, 2026 15:18
@marcin-bazyl marcin-bazyl added the validated If the pull request is validated for automation. label Apr 15, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30ca445e9d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

replacedAt: replaces?.replacedAt,
});

parser.initializeFromGetLociResponse(locus);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Await parser initialization from API response

handleHashTreeParserSwitchForAPIResponse() calls parser.initializeFromGetLociResponse(locus) without awaiting or handling rejection. That initializer propagates failures from getAllVisibleDataSetsFromLocus(), so transient Locus errors during reconnect can raise an unhandled promise rejection and leave a newly created parser active but not actually initialized, which breaks the intended post-reconnect dataset recovery path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, there is nothing more we want to do if initializeFromGetLociResponse() fails and if things fail, there will be warnings already logged by code closer to the failures

locusUrl,
entry,
replaces,
() => entry.parser.resumeFromApiResponse(locus)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle async resume failures for stopped parsers

The API-response resume path passes () => entry.parser.resumeFromApiResponse(locus) into resumeStoppedParser(), but that callback’s returned promise is not awaited or caught. If resumeFromApiResponse() fails (for example, fetching visible datasets fails), the rejection is unhandled and the parser is treated as reactivated without completing re-initialization, which can leave hash-tree updates stuck after breakout switches.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's ok, there is not much more we can do if it fails. Even if re-initialization fails at some point, at least the parser will no longer be in "stopped" state, so further messages can be processed

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3b4397602a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1050 to +1054
const parser = this.createHashTreeParser({
locusUrl,
initialLocus: {locus: null, dataSets: []},
metadata: null,
replacedAt: replaces?.replacedAt,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip parser switch when API locus lacks visibleDataSets URL

When handleHashTreeParserSwitchForAPIResponse() creates a parser unconditionally for an unknown locusUrl, it can replace a working parser with one that cannot initialize from the same response. createHashTreeParser() immediately stops the current active parser, and initializeFromGetLociResponse() later bails out if locus.links.resources.visibleDataSets.url is missing. In reconnect/sync flows where GET /loci returns a partial locus without that link, this leaves the meeting with no initialized hash-tree parser and hash-tree updates stop until another successful switch occurs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is invalid - if it's creating a new parser for unknown locusUrl, then it cannot be replacing anything

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9aecb58b38

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1581 to +1583
this.state = 'active';
this.dataSets = {};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep parser stopped until API resume initialization succeeds

resumeFromApiResponse() flips state to active and clears dataSets before initializeFromGetLociResponse() completes, so any initialization failure (including missing visibleDataSets URL or request errors) leaves an active parser with no hash trees. In that state, later hash-tree messages cannot recover the parser because metadata updates are only applied through existing dataset hash trees, so updates get ignored and the meeting can stay desynced after reconnect. Fresh evidence from this change: this method now clears state up front, and parseMessage()/metadata handling requires existing hashTree entries to bootstrap updates.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

validated If the pull request is validated for automation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants