rework website ctst test to preconfigure the overlay before running t…#2356
rework website ctst test to preconfigure the overlay before running t…#2356SylvainSenechal wants to merge 1 commit intodevelopment/2.14from
Conversation
Hello sylvainsenechal,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
|
There was a problem hiding this comment.
@francoisferrand
Let me know if you find this PR irrelevant.
I understand there is a conflict between what we ultimately want to achieve in the tests :
- On one side, I guess this PR should probably not exists, and instead we should aim to have a software thats stable and functional even we update the overlay
- On the other side, if we really just wanna check that the feature bucket website works, and avoid affecting other tests, then I guess this pr is ok, doing the overlay configuration before the tests still allows us to test that the management api is working
There was a problem hiding this comment.
On one side, I guess this PR should probably not exists, and instead we should aim to have a software thats stable and functional even we update the overlay
I agree it should not indeed:
- because it should work in any case. Even if they involve modifying the overlay, this is a normal operation which should not disrupt the cluster....and thus not disrupt the other tests either
- this "fix" does not solve the actual problem causing the flakiness, it just avoids it....and we will still have the underlying problem
On the other side, if we really just wanna check that the feature bucket website works, and avoid affecting other tests, then I guess this pr is ok, doing the overlay configuration before the tests still allows us to test that the management api is working
- "avoid affecting other tests" is not a goal in itself ; at most a mean to avoid flakiness
- the approach works with the current test, but only because the "website CRUD test" actually only test the creation.... when we add update/delete, it will not work anymore
- moving the creation out of the test makes more coupling between setup & tests:
- which is the opposite direction than where we want to go ("portable" tests that can be played against any cluster, idempotentcy, minimize
Worldand configuration variables...) - which makes it harder to understand the test (issues may come from more places, or all the machinery in between) and troubleshoot it (less temporary locality, the test actually spans from setup to the moment it is actually executed)
- which is the opposite direction than where we want to go ("portable" tests that can be played against any cluster, idempotentcy, minimize
Overall,
→ we are missing a few tests here, should create a followup ticket
→ this confirms some flakiness related to overlay update : not a solution, but knowing we should search there is progress, and should us to think both of ways this could make test fail & ways to better observe such failure path [*]
→ the overlay is a big blob, and actually quite a lot of things may be updated when it changes. We may want to optimize this eventually, and start by identifying exactly what depends on it / reviewing if it is relevant / maybe finding ways to be more granular
[*] the other day, you had a graph/timeline of tests. If we also display k8s "events" (pod restarts...) and reconciliations on top of it, this may help identify patterns of failures or correlations...
There was a problem hiding this comment.
Ok I think we will rediscuss this
| # Add website endpoint to the overlay so it doesn't trigger reconciliation during tests | ||
| INSTANCE_ID=$(kubectl get zenko ${ZENKO_NAME} -o jsonpath='{.status.instanceID}') | ||
| MGMT_TOKEN=$(get_token) | ||
| curl -sf -X POST \ |
There was a problem hiding this comment.
this is not idem-potent, so it would prevent "re-running" the tests (no problem in CI, but locally or in codespace) : is this not an issue?
There was a problem hiding this comment.
I added a 409 error check that I tested in codespace, should be idempotent now
b155d74 to
79b4f91
Compare
Issue: ZENKO-5221
Rule 3 from "how to write tests" :
3. Do not reconfigure the environment if it affects other tests' resources.
If a reconfiguration triggers a reconciliation of the resources, running tests
might be affected. Also, it restarts the services, which leads to missing logs,
harder debugging, and longer test execution.
For examples, location creation or bucket Website endpoints, that triggers a
restart of multiple services, must be created before the test starts, or in a
dedicated set of tests, that is, a set of test executed before all tests having
a specific tag used for identifying the feature(s).
Note: Testing the reconfiguration of the environment is recommended, but it
should be done carefully to not affect other tests.