Skip to content

CLDSRV-805: Fix flaky GCP tests due to rate limit#6111

Merged
bert-e merged 8 commits intodevelopment/9.3from
improvement/CLDSRV-805-gcp-rate-limit
Mar 18, 2026
Merged

CLDSRV-805: Fix flaky GCP tests due to rate limit#6111
bert-e merged 8 commits intodevelopment/9.3from
improvement/CLDSRV-805-gcp-rate-limit

Conversation

@BourgoisMickael
Copy link
Contributor

@BourgoisMickael BourgoisMickael commented Mar 16, 2026

SlowDown: The project exceeded the rate limit for bucket operations
(bucket creating, updating and deleting). Reduce your request rate.

Fix GCP bucket rate related issues by:

  • retrying more
  • encapsulating some functions in retry
  • deduplicating some tests (HeadBucket & Bucket versioning)
  • regrouping some tests in the same bucket to reduce gcp bucket quota pressure
    • in bucket: head.js, get.js -> bucket.js
    • inbucket: getVersioning.js, putVersioning.js -> versioning.js
    • in object: head.js, get.js, put.js, delete.js, copy.js -> object.js
    • in object: putTagging.js, getTagging.js, deleteTagging.js -> tagging.js

This reduces the number of buckets created across a full GCP test run from 21 to 12 (-9 buckets, -43%)

@bert-e
Copy link
Contributor

bert-e commented Mar 16, 2026

Hello bourgoismickael,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

async.mapLimit(
createdObjects,
10,
async object => gcpClient.send(new PutObjectCommand({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The populateBucket and removeObjects helpers call gcpClient.send() directly without gcpRetry. If a PutObjectCommand or DeleteObjectCommand hits a SlowDown/429 during setup/teardown, the test will still fail. Consider wrapping these calls with gcpRetry to match the PR goal of fixing rate-limit flakiness.

--- Claude Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of existing code to another file

@claude
Copy link

claude bot commented Mar 16, 2026

bucket.js populateBucket (line 74) and removeObjects (line 93) call gcpClient.send() directly without gcpRetry. These setup and teardown helpers are still vulnerable to SlowDown and 429 errors. Wrap with gcpRetry inside these helpers. Review by Claude Code

@codecov
Copy link

codecov bot commented Mar 16, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
8225 2 8223 0
View the full list of 2 ❄️ flaky test(s)
"after each" hook for "should put an object and set the acl via query param"::PUT object With v4 signature "after each" hook for "should put an object and set the acl via query param"

Flake rate in main: 100.00% (Passed 0 times, Failed 27 times)

Stack Traces | 0.025s run time
Expected values to be strictly equal:
+ actual - expected

+ '404 NOT FOUND'
- '200 OK'
"after each" hook for "should put an object and set the acl via query param"::PUT object With v4 signature "after each" hook for "should put an object and set the acl via query param"

Flake rate in main: 100.00% (Passed 0 times, Failed 27 times)

Stack Traces | 0.025s run time
done() called multiple times in hook <PUT object With v4 signature "after each" hook for "should put an object and set the acl via query param"> of file .../test/object/put.js

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@claude
Copy link

claude bot commented Mar 17, 2026

LGTM

Clean consolidation of GCP test files to reduce bucket create/delete operations and avoid rate limiting. Test logic is faithfully preserved, retry defaults are reasonable (6 attempts with exponential backoff up to ~63s), and timeouts are appropriately increased.

Review by Claude Code

@BourgoisMickael BourgoisMickael force-pushed the improvement/CLDSRV-805-gcp-rate-limit branch 2 times, most recently from af90065 to d4596ba Compare March 17, 2026 14:34
@bert-e
Copy link
Contributor

bert-e commented Mar 17, 2026

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

Key: badObjectKey,
}, err => {
assert(err);
assert.strictEqual(err.$metadata.httpStatusCode, 404);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing optional chaining on err.$metadata. If $metadata is undefined this will throw a TypeError. Use err.$metadata?.httpStatusCode like line 123 does.

Same issue on lines 225 and 244.

— Claude Code

Copy link
Contributor Author

@BourgoisMickael BourgoisMickael Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from existing code as is

Bucket: bucketName,
Key: this.test.key,
}, (err, res) => {
assert.notStrictEqual(res.VersionId, this.test.uploadId);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error check. If putObject fails, res will be undefined and res.VersionId will throw a TypeError. Add an assert.equal(err, null) check before accessing res.

— Claude Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from existing code as is

assert.strictEqual(err.$metadata.httpStatusCode, 404);
resolve();
});
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the assertion fails, the Promise is never settled (no reject call), so the test hangs until timeout instead of reporting the assertion error. Either wrap assertions in try/catch with reject, or use the done callback pattern like the GET Object 404 test at line 116.

— Claude Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from existing code as is

@claude
Copy link

claude bot commented Mar 17, 2026

  • object.js:87, 225, 244: Missing optional chaining on err.$metadata.httpStatusCode. If $metadata is undefined, this throws a TypeError. Line 123 correctly uses ?. but these three lines do not.
    • Use err.$metadata?.httpStatusCode consistently
  • object.js:90: Promise wrapping a callback without a reject path. If assertions fail inside the callback, the promise is never settled and the test hangs until timeout.
    • Wrap assertions in try/catch with reject, or use the done callback pattern like line 116
  • object.js:146: Missing error check in putObject callback before accessing res.VersionId. If the call fails, res is undefined and this throws a TypeError.
    • Add assert.equal(err, null) before accessing res

Review by Claude Code

@scality scality deleted a comment from bert-e Mar 17, 2026
Because default retries can go up to 63s total wait time
Head tests where twice in head and get file.

Regroup in 1 file for 1 bucket for both tests kind to reduce GCP quota pressure
Exact same test in 2 different files regrouped.
Using 1 bucket on test suite to reduce gcp quota pressure
Regroup copy, delete, get, head, put into an object.js file

Using 1 bucket on test suite to reduce gcp quota pressure
Using 1 bucket on test suite to reduce gcp quota pressure
@BourgoisMickael BourgoisMickael force-pushed the improvement/CLDSRV-805-gcp-rate-limit branch from d4596ba to 5f1edfe Compare March 17, 2026 15:56
@claude
Copy link

claude bot commented Mar 17, 2026

LGTM

Clean test consolidation that reduces GCP bucket creation from 21 to 12 buckets. The retry increase from 3 to 6 attempts with exponential backoff is reasonable for rate-limited GCP operations. Test logic is faithfully preserved across the consolidation.

Review by Claude Code

@BourgoisMickael BourgoisMickael requested review from a team, anurag4DSB, fredmnl and jonathan-gramain and removed request for a team March 17, 2026 15:59
@BourgoisMickael
Copy link
Contributor Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Mar 18, 2026

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Mar 18, 2026

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/9.3

  • ✔️ development/9.4

The following branches have NOT changed:

  • development/7.10
  • development/7.4
  • development/7.70
  • development/8.8
  • development/9.0
  • development/9.1
  • development/9.2

Please check the status of the associated issue CLDSRV-805.

Goodbye bourgoismickael.

The following options are set: approve

@bert-e bert-e merged commit 5f1edfe into development/9.3 Mar 18, 2026
51 of 54 checks passed
@bert-e bert-e deleted the improvement/CLDSRV-805-gcp-rate-limit branch March 18, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants