Skip to content

rework website ctst test to preconfigure the overlay before running t…#2356

Open
SylvainSenechal wants to merge 1 commit intodevelopment/2.14from
improvement/ZENKO-5221
Open

rework website ctst test to preconfigure the overlay before running t…#2356
SylvainSenechal wants to merge 1 commit intodevelopment/2.14from
improvement/ZENKO-5221

Conversation

@SylvainSenechal
Copy link
Contributor

Issue: ZENKO-5221

Rule 3 from "how to write tests" :

3. Do not reconfigure the environment if it affects other tests' resources.

If a reconfiguration triggers a reconciliation of the resources, running tests
might be affected. Also, it restarts the services, which leads to missing logs,
harder debugging, and longer test execution.

For examples, location creation or bucket Website endpoints, that triggers a
restart of multiple services, must be created before the test starts, or in a
dedicated set of tests, that is, a set of test executed before all tests having
a specific tag used for identifying the feature(s).

Note: Testing the reconfiguration of the environment is recommended, but it
should be done carefully to not affect other tests.

@bert-e
Copy link
Contributor

bert-e commented Mar 17, 2026

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Mar 17, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francoisferrand
Let me know if you find this PR irrelevant.
I understand there is a conflict between what we ultimately want to achieve in the tests :

  • On one side, I guess this PR should probably not exists, and instead we should aim to have a software thats stable and functional even we update the overlay
  • On the other side, if we really just wanna check that the feature bucket website works, and avoid affecting other tests, then I guess this pr is ok, doing the overlay configuration before the tests still allows us to test that the management api is working

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one side, I guess this PR should probably not exists, and instead we should aim to have a software thats stable and functional even we update the overlay

I agree it should not indeed:

  • because it should work in any case. Even if they involve modifying the overlay, this is a normal operation which should not disrupt the cluster....and thus not disrupt the other tests either
  • this "fix" does not solve the actual problem causing the flakiness, it just avoids it....and we will still have the underlying problem

On the other side, if we really just wanna check that the feature bucket website works, and avoid affecting other tests, then I guess this pr is ok, doing the overlay configuration before the tests still allows us to test that the management api is working

  • "avoid affecting other tests" is not a goal in itself ; at most a mean to avoid flakiness
  • the approach works with the current test, but only because the "website CRUD test" actually only test the creation.... when we add update/delete, it will not work anymore
  • moving the creation out of the test makes more coupling between setup & tests:
    • which is the opposite direction than where we want to go ("portable" tests that can be played against any cluster, idempotentcy, minimize World and configuration variables...)
    • which makes it harder to understand the test (issues may come from more places, or all the machinery in between) and troubleshoot it (less temporary locality, the test actually spans from setup to the moment it is actually executed)

Overall,
→ we are missing a few tests here, should create a followup ticket
→ this confirms some flakiness related to overlay update : not a solution, but knowing we should search there is progress, and should us to think both of ways this could make test fail & ways to better observe such failure path [*]
→ the overlay is a big blob, and actually quite a lot of things may be updated when it changes. We may want to optimize this eventually, and start by identifying exactly what depends on it / reviewing if it is relevant / maybe finding ways to be more granular

[*] the other day, you had a graph/timeline of tests. If we also display k8s "events" (pod restarts...) and reconciliations on top of it, this may help identify patterns of failures or correlations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think we will rediscuss this

# Add website endpoint to the overlay so it doesn't trigger reconciliation during tests
INSTANCE_ID=$(kubectl get zenko ${ZENKO_NAME} -o jsonpath='{.status.instanceID}')
MGMT_TOKEN=$(get_token)
curl -sf -X POST \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not idem-potent, so it would prevent "re-running" the tests (no problem in CI, but locally or in codespace) : is this not an issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 409 error check that I tested in codespace, should be idempotent now

@SylvainSenechal SylvainSenechal force-pushed the improvement/ZENKO-5221 branch from b155d74 to 79b4f91 Compare March 18, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants