NO-ISSUE: Synchronize From Upstream Repositories#710
NO-ISSUE: Synchronize From Upstream Repositories#710openshift-bot wants to merge 93 commits intoopenshift:mainfrom
Conversation
- Rename local variable `tag` to `bundleTag` in catalog.Build to avoid shadowing the method's `tag` parameter - Fail the step with an error on invalid LargeCRD field counts in the bundle option parser; use strconv.ParseInt with bitSize 0 to also reject non-positive values - Look up channels by name in TestCatalog_FBCGeneration instead of assuming insertion order - Use errgroup.Group with SetLimit(8) in ScenarioCleanup deletion loop to bound concurrent goroutines and kubectl calls - Clarify README that ClusterObjectSet deletion is conditional on the BoxcutterRuntime feature gate - Add missing `text` language specifier to fenced code block in e2e-isolation design doc - Remove extra blank line in setup.sh Signed-off-by: Todd Short <tshort@redhat.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@openshift-bot: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: openshift-bot The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughAdds HA-focused e2e tests and related step implementations, adjusts catalog server to serve on every pod (readiness probe, non-leader runnable), tightens parsing/validation and cleanup concurrency in e2e tests, increments replica counts for experimental deployments, and updates minor docs, tests, and build settings. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Runner
participant K8s as Kubernetes API
participant Pod as catalogd Pod (leader)
participant Manager as controller-runtime Manager (pod instances)
participant Catalog as catalogd Server
Test->>K8s: create catalog, wait for readiness
K8s-->>Pod: schedule/start leader Pod
Pod->>Catalog: bind catalog port, report ready
Test->>K8s: force-delete leader Pod
K8s-->>Pod: TERMINATE
Manager->>Manager: leader election (other replicas)
Manager->>Catalog: start runnable on other Pod(s)
Catalog->>Test: new leader elected, serving readiness true
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/steps/steps.go (1)
1661-1708:⚠️ Potential issue | 🟠 MajorReject unknown/malformed
contentstokens instead of silently ignoring them.
parseContentsnow returns an error, but unknown tokens still fall through and are ignored. That can hide typos and let scenarios pass with unintended bundle contents.🔧 Suggested hardening
func parseContents(contents string) ([]catalog.BundleOption, error) { contents = strings.TrimSpace(contents) if contents == "" { return nil, nil } if strings.EqualFold(contents, "BadImage") { return []catalog.BundleOption{catalog.BadImage()}, nil } var opts []catalog.BundleOption for _, part := range strings.Split(contents, ",") { part = strings.TrimSpace(part) + if part == "" { + continue + } switch { case part == "CRD": opts = append(opts, catalog.WithCRD()) case part == "Deployment": opts = append(opts, catalog.WithDeployment()) case part == "ConfigMap": opts = append(opts, catalog.WithConfigMap()) case strings.HasPrefix(part, "Property(") && strings.HasSuffix(part, ")"): inner := part[len("Property(") : len(part)-1] - if k, v, ok := strings.Cut(inner, "="); ok { - opts = append(opts, catalog.WithBundleProperty(k, v)) - } + k, v, ok := strings.Cut(inner, "=") + if !ok || strings.TrimSpace(k) == "" || strings.TrimSpace(v) == "" { + return nil, fmt.Errorf("invalid Property token %q: expected Property(type=value)", part) + } + opts = append(opts, catalog.WithBundleProperty(k, v)) case strings.HasPrefix(part, "InstallMode(") && strings.HasSuffix(part, ")"): mode := part[len("InstallMode(") : len(part)-1] opts = append(opts, catalog.WithInstallMode(v1alpha1.InstallModeType(mode))) case strings.HasPrefix(part, "LargeCRD(") && strings.HasSuffix(part, ")"): countStr := part[len("LargeCRD(") : len(part)-1] count, err := strconv.ParseInt(countStr, 10, 0) if err != nil || count <= 0 { return nil, fmt.Errorf("invalid LargeCRD field count %q: must be a positive integer", countStr) } opts = append(opts, catalog.WithLargeCRD(int(count))) case strings.HasPrefix(part, "ClusterRegistry(") && strings.HasSuffix(part, ")"): host := part[len("ClusterRegistry(") : len(part)-1] opts = append(opts, catalog.WithClusterRegistry(host)) case strings.HasPrefix(part, "StaticBundleDir(") && strings.HasSuffix(part, ")"): dir := part[len("StaticBundleDir(") : len(part)-1] absDir := filepath.Join(projectRootDir(), dir) opts = append(opts, catalog.StaticBundleDir(absDir)) + default: + return nil, fmt.Errorf("unknown bundle content token %q", part) } } return opts, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/steps/steps.go` around lines 1661 - 1708, parseContents silently ignores unknown or malformed tokens which hides typos; update parseContents to return an error when a part does not match any recognized pattern (CRD, Deployment, ConfigMap, Property(...), InstallMode(...), LargeCRD(...), ClusterRegistry(...), StaticBundleDir(...), BadImage or empty) or when a recognized pattern is malformed (e.g., Property without "="). In the loop over parts in parseContents, after trimming each part, detect empty parts (from extra commas) and treat them as errors or skip consistently, and if none of the switch cases handled the part, return fmt.Errorf with the offending token; also return errors for malformed Property(...) and other pattern-parsing failures so callers get immediate feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@test/e2e/steps/steps.go`:
- Around line 1661-1708: parseContents silently ignores unknown or malformed
tokens which hides typos; update parseContents to return an error when a part
does not match any recognized pattern (CRD, Deployment, ConfigMap,
Property(...), InstallMode(...), LargeCRD(...), ClusterRegistry(...),
StaticBundleDir(...), BadImage or empty) or when a recognized pattern is
malformed (e.g., Property without "="). In the loop over parts in parseContents,
after trimming each part, detect empty parts (from extra commas) and treat them
as errors or skip consistently, and if none of the switch cases handled the
part, return fmt.Errorf with the offending token; also return errors for
malformed Property(...) and other pattern-parsing failures so callers get
immediate feedback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bf6bbb55-1573-4b42-9326-cc2ab1b49755
📒 Files selected for processing (7)
docs/designs/testing/2026-04-13-e2e-isolation/design.mdtest/e2e/README.mdtest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (1)
- test/extension-developer-e2e/setup.sh
* fix(catalogd): bind catalog HTTP port lazily; add readiness check The catalog HTTP server has OnlyServeWhenLeader: true, so only the leader pod should serve catalog content. Previously, net.Listen was called eagerly at startup for all pods: the listen socket was bound on non-leaders even though http.Serve was never called, causing TCP connections to queue without being served. With replicas > 1 this made ~50% of catalog content requests fail silently. Replace manager.Server with a custom Runnable (catalogServerRunnable) in serverutil that: - Binds the catalog port lazily inside Start(), which is only called on the leader by controller-runtime's leader election machinery. - Closes a ready channel once the listener is established, and registers a channel-select readiness check via AddReadyzCheck so non-leader pods fail the /readyz probe and are excluded from Service endpoints. This keeps cmd/catalogd/main.go health/readiness setup identical to cmd/operator-controller/main.go (healthz.Ping for both liveness and readiness); the catalog-server readiness check is an implementation detail of serverutil.AddCatalogServerToManager. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(experimental): run catalogd and operator-controller with 2 replicas The experimental e2e suite uses a 2-node kind cluster, making it a natural fit to validate HA behaviour. Set replicas=2 for both components in helm/experimental.yaml so the experimental and experimental-e2e manifests exercise the multi-replica path end-to-end. This is safe for operator-controller (no leader-only HTTP servers) and for catalogd now that the catalog server starts on all pods via NeedLeaderElection=false, preventing the rolling-update deadlock that would arise if the server were leader-only. Also adds a @CatalogdHA experimental e2e scenario that force-deletes the catalogd leader pod and verifies that a new leader is elected and the catalog resumes serving. The scenario is gated on a 2-node cluster (detected in BeforeSuite and reflected in the featureGates map), so it is automatically skipped in the standard 1-node e2e suite. The experimental e2e timeout is bumped from 20m to 25m to accommodate leader re-election time (~163s worst case). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Todd Short <tshort@redhat.com> --------- Signed-off-by: Todd Short <tshort@redhat.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps [github.com/google/go-containerregistry](https://github.com/google/go-containerregistry) from 0.21.4 to 0.21.5. - [Release notes](https://github.com/google/go-containerregistry/releases) - [Commits](google/go-containerregistry@v0.21.4...v0.21.5) --- updated-dependencies: - dependency-name: github.com/google/go-containerregistry dependency-version: 0.21.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dtfranz <dfranz@redhat.com> UPSTREAM: <carry>: Update generate-manifests to handle new directory The `default` directory was renamed `base`. Signed-off-by: Todd Short <todd.short@me.com> The `base` directory was moved to `base\operator-controller`. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Drop commitchecker Signed-off-by: Alexander Greene <greene.al1991@gmail.com> UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/4022cd290f00a44d667dda03f2d78d84a488c7ed/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: update owners * Remove alumni from owners * Add m1kola to approvers Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com> UPSTREAM: <carry>: Add pointer to tooling README UPSTREAM: <carry>: Disable Validating Admission Policy APIs downstream Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com> UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.16 Reconciling with https://github.com/openshift/ocp-build-data/tree/6250d54c4686a708ca5985afb73080e8ca9a1f7f/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: Enable Validating Admission Policy APIs downstream * This reverts commit 3f079c4. * Includes Validating Admission Policy manifests Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com> UPSTREAM: <carry>: manifests: set required-scc for openshift workloads UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.17 Reconciling with https://github.com/openshift/ocp-build-data/tree/4c1326094222f9209876f06833179a1b9178faf7/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: add everettraven to approvers+reviewers Signed-off-by: everettraven <everettraven@gmail.com> UPSTREAM: <carry>: add openshift kustomize overlay to enable TLS communication with catalogd. Configure the CA certs using the configmap injection method via service-ca-operator Signed-off-by: everettraven <everettraven@gmail.com> UPSTREAM: <carry>: Add tmshort to approvers Also `s/runtime/framework/g` in the DOWNSTREAM_OWNERS Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.18 Reconciling with https://github.com/openshift/ocp-build-data/tree/dd68246f3237db5db458127566fc7b05b55e1660/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: Properly copy and call kustomize Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: manifests: add hostPath mount for /etc/containers Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Add test-e2e target for downstream Makefile to be run by openshift/release. Signed-off-by: dtfranz <dfranz@redhat.com> UPSTREAM: <carry>: Add downstream verify makefile target Signed-off-by: dtfranz <dfranz@redhat.com> UPSTREAM: <carry>: openshift: template log verbosity to be managed by cluster-olm-operator Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Add global-pull-secret flag Pass global-pull-secret to the manager container. Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com> UPSTREAM: <carry>: Update openshift CAs to operator-controller The /run/secrets/kubernetes.io/serviceaccount/ directory is projected into the pod and contains the following CA certificates: * configmap/kube-root-ca.crt as ca.crt * configmap/openshift-service-ca.crt as service-ca.crt Update the --ca-certs-dir argument to reference the directory. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Add HowTo for origin tests Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Add e2e registry Dockerfile Signed-off-by: dtfranz <dfranz@redhat.com> UPSTREAM: <carry>: add nodeSelector and tolerations to operator-controller deployment via kustomize patch Signed-off-by: everettraven <everettraven@gmail.com> UPSTREAM: <carry>: namespace: use privileged PSA for audit and warn levels Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Enable downstream e2e Signed-off-by: dtfranz <dfranz@redhat.com> UPSTREAM: <carry>: Remove m1kola from owners Signed-off-by: Mikalai Radchuk <mradchuk@redhat.com> UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.19 Reconciling with https://github.com/openshift/ocp-build-data/tree/a39508c86497b4e5e463d7b2c78e51e577be9e7d/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: generate and mount service-ca server cert Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Add support for proxy trustedCAs Just map the list of trusted ca certs into the deployment Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Fix error to build the image Copy correct (new) executable name for operator-controller Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Fix make verify for mac os envs Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Move operator-controller openshift files to its own dir UPSTREAM: <carry>: Upgrade OCP images from 4.18 to 4.19 UPSTREAM: <carry>: Add Openshift's catalogd manifests - Move to openshift/catalogd the specific manifest under: https://github.com/openshift/operator-framework-catalogd/tree/main/openshift - Add call to generate catalogd manifest to 'make manifest'. Make verify test is now done for catalogd and operator-controller Openshift's manifests UPSTREAM: <carry>: resolve issue with pre-mature mounting of trusted CA configmap Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Add /etc/docker to the operator-controller and catalogd deployments This allows for use of the any image.config.openshift.io trusted CAs Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: fixup catalogd.Dockerfile paths Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Resolve issue with pre-mature mounting of service CA configmap Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: use projected volume for CAs to avoid subPath limitations Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Revert "UPSTREAM: <carry>: use projected volume for CAs to avoid subPath limitations" This reverts commit 548caa4. UPSTREAM: <carry>: use projected volume for CAs to avoid subPath limitations Signed-off-by: Joe Lanford <joe.lanford@gmail.com> UPSTREAM: <carry>: Remove vet from openshift verify The `vet` target was removed upstream. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Skip another upstream test Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Cleanup openshift/Makefile by removing no longer required comments regards catalogd e2e tests UPSTREAM: <carry>: Enable OCP metrics collection by default Enables OCP to collect Prometheus metrics for both catalogd and operator-controller by default. This is accomplished via ServiceMonitor CRs which are now created for both projects. UPSTREAM: <carry>: Fix catalogd.Dockerfile to use new paths The root catalogd directory has been removed Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Update DOWNSTREAM_OWNERS_ALIASES Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Add openshift node selector annotation Signed-off-by: Catherine Chan-Tse <cchantse@redhat.com> (cherry picked from commit 9b4a113) UPSTREAM: <carry>: Add caalogd-cas-dir option to op-con Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: set the SElinux type Signed-off-by: Jian Zhang <jiazha@redhat.com> UPSTREAM: <carry>: Add initial stack to run tests to validate the catalogs UPSTREAM: <carry>: Add vendor files for the catalog-sync tests UPSTREAM: <carry>: Bump catalog versions to 4.19 Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: revert "Bump catalog versions to 4.19" This reverts commit a98980b. UPSTREAM: <carry>: Update HOWTO-origin-tests techpreview is no longer a required option. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [DefaultCatalogTests]: Allow to pass auth path for docker credentials" UPSTREAM: <carry>: fix: set NoLchown=true to allow image unpack on OCPci UPSTREAM: <carry>: [DefaultCatalogTests]: Moving parse of ENVVAR to the caller (follow-up 345) UPSTREAM: <carry>: [Default Catalog]: Create tmp dir to extract layers with right permissions to avoid issues scenarios UPSTREAM: <carry>: [Default Catalog](cleanp) Remove hack directory which is not used UPSTREAM: <carry>: Change code implementation to extract layers in OCP env UPSTREAM: <carry>: Add vendor files for change in the extract code implementation UPSTREAM: <carry>: [Default Catalog Tests]: Final cleanups and enhancements of initial implementation UPSTREAM: <carry>: SELinux type for operator-controller Signed-off-by: Jian Zhang <jiazha@redhat.com> UPSTREAM: <carry>: Bump catalog versions to 4.19 Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [Default Catalog Consistency Test] (feat) add check for executable files in filesystem Checks if given paths exist and point to executable files or valid symlinks. UPSTREAM: <carry>: [Default Catalog Consistency Test]: fix junit output format to allow generate xml UPSTREAM: <carry>: [Default Catalog Consistency Test] (feat) add check to validate multi-arch support UPSTREAM: <carry>: [Default Catalog Consistency Test]: Enable CatalogChecks UPSTREAM: <carry>: [Default Catalog Consistency Test]: Rename Tests suite and small cleanups UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.20 Reconciling with https://github.com/openshift/ocp-build-data/tree/dfb5c7d531490cfdc61a3b88bc533702b9624997/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: Updating ose-olm-catalogd-container image to be consistent with ART for 4.20 Reconciling with https://github.com/openshift/ocp-build-data/tree/dfb5c7d531490cfdc61a3b88bc533702b9624997/images/ose-olm-catalogd.yml UPSTREAM: <carry>: Update e2e registry to use 1.24/4.20 Update the e2e registry Dockerfile to use golang 1.24/OCP 4.20 Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [Catalog Default Tests]: Upgrade go version to 1.24.3, dependencies and fix new lint issue UPSTREAM: <carry>: Add structure to allow move the orgin tests using OTE This commit introduces a binary and supporting structure to enable the execution of OpenShift origin (olmv1) tests using the Open Test Environment (OTE). It lays the groundwork for moving origin test in openshift/origin to be executed from this repository using OTE. UPSTREAM: <carry>: Add support for experimental manifests Update the openshift kustomize configuration for both operator-controller and catalogd. Update the manifest generation scripts to put the core generation code into a function (ignore-whitespace will help with the review), so that it can be called twice; once for standard, and once for experimental. Move around some of the kustomization directives to * Create a patch kustomization (Component) file and move the patch directives from olmv1-ns there. This allows it to be referenced from a different directory. * Add a kustomization file for tusted-ca. This allows it to be referenced from a different directory. * Move the setting of the namePrefix for operator-controller; this makes the generation compatible with upstream feature components. * Define experimental kustomization files that reference existing components. * Reference the correct CRDs (standard or experimental). * Add references to upstream feature components into the experimental manifests. This *will* add `--feature-gates` options from the upstream feature components to the experimental manifests. The cluster-olm-operator will strip those arguments from the deployments before adding the enabled feature gates. Update the Dockerfiles to include the experimental manifests and a copy script (`cp-manifests`) into the image containers. The complexity of having multiple sets of manifests mean that the simple initContainer copy mechanism found in cluster-olm-operator is no longer sufficient. This attempts to keep backwards compatibility with older versions of cluster-olm-operator, specifically by keeping the original (standard) manifests in the original location, and adding the experimental manifests in a new directory. The new `cp-manifests` script is used by newer versions of cluster-olm-operator. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [OTE] - chore: follow up openshift#383 – remove unreachable target call UPSTREAM: <carry>: Remove build of test image registry Upstream now uses a different image Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Add test-experimental-e2e target to openshift Makefile This adds a test-experimental-e2e target to allow the CI to run the experimental e2e test. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [OTE]: Add binary in the operator controller image to allow proper integration with OCP tests UPSTREAM: <carry>: Fix experimental manifest copying The standard manifest was being copied rather than the experimental manifest. This meant that the expected feature-flags are not present. This is failing now that we are doing a check for those feature-flags. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Update manifest generation for upstream rbac/webhooks Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [OTE] - Add tracking mechanism UPSTREAM: <carry>: Update OTE dep to get fix UPSTREAM: <carry>: [OTE] Add Readme UPSTREAM: <carry>: set GIT_COMMIT env from SOURCE_GIT_COMMIT in Dockerfiles for operator-controller and catalogd Signed-off-by: Rashmi Gottipati <chowdary.grashmi@gmail.com> UPSTREAM: <carry>: add openshift specific build target to pass commit info downstream Signed-off-by: Ankita Thomas <ankithom@redhat.com> UPSTREAM: <carry>: add source commit into binaries when linking - Removes extra GIT_COMMIT set - fixup Dockerfiles after rebase - consider "" unset so build-info can fill commit/date - double quote go flags & honor GIT_COMMIT if set - improve robustness of build-info parsing - Trim whitespace on all version fields - isUnset and valueOrUnknown now call strings.TrimSpace - Avoid clobbering values injected via ldflags - set repoState from build-info only when repoState is still unset - set version from build-info only when unset and build-info value is non-empty UPSTREAM: <carry>: OTE add first test from openshift/origin olmv1.go UPSTREAM: <carry>: Migrate tasks from openshift/origin olm v1.go file which are remaining This commit moves the final OLMv1 tests from openshift/origin/test/extended/olm/olmv1.go to their proper location in this repository. This migration is part of a larger effort to streamline development by co-locating tests with the component they validate. This will reduce CI overhead and allow for faster, more atomic changes. Assisted-by: Gemini UPSTREAM: <carry>: OTE - How to test locally with OCP instances UPSTREAM: <carry>: [OTE] Refac: refac helper and olmv1 test to create namespace instead to use pre-existent UPSTREAM: <carry>: [OTE] add webhook tests Migrates OLMv1 webhook operator tests from using external YAML files to defining resources in Go structs. This change removes file dependencies, improving test reliability and simplifying test setup. The migration is a refactoring of code from openshift/origin#30059. The new code uses better naming conventions and adapts the tests to work with a controller-runtime client, enhancing test consistency and maintainability. The migration covers all core test scenarios: - Validating, mutating, and conversion webhooks. - Certificate and secret rotation tolerance. Assisted-by: Gemini UPSTREAM: <carry>: OTE: rewrite the upgrade incompatible operator test This test replaces the existing upgrade incompatible test. The main change is that operator and catalog bundles are created on-the-fly to support OCP 4.20. This means we are no longer dependent on public operators for this test. This creates new bundles in the OCP ImageRegistry, this requires using a number of OCP APIs, including using a raw API URL to invoke the build. This is done by invoking an external k8s client (either `oc` or `kubectl`), and passing it a tarball of the bundle to be created. So, it can't be done by the golang k8sClient normally available (i.e. the create input is a tarball not a YAML file). This introduces the use of go-bindata to store the bundle contents. It also pulls in openshift mage, buld and operator APIs. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Handle service-ca cert availability/rotation There is problem when the service-ca certificate is not available at pod start. This is an issue because the SystemCertPool is created from SSL_CERT_DIR, which may include the empty service-ca. The SystemCertPool is never regenerated during the lifetime of the program execution, so it will never get updated when the service-ca is filled. Thus, we need to use --pull-cas-dir to reference the CAs that we want to use. This will also allow OLMv1 to reload the service-ca when it is reloaded (after 2 years, mind you). Removing the SSL_CERT_DIR setting, and adding the --pull-cas-dir flag ought to be equivalent to what we have now (i.e. SSL_CERT_DIR and no --pull-cas-dir), except that rotation will be handled better. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [OTE] add webhook tests Revert "UPSTREAM: <carry>: [OTE] add webhook tests" This reverts commit 9963614. UPSTREAM: <carry>: Upgrade OCP Catalog images from 4.19 to 4.20 UPSTREAM: <carry>: Remove bindata generation from build Using go-bindata is causing problems with ART builds. This removes the use of go-bindata from the builds. This will subsequently require that users MANUALLY run the `bindata` target to refresh the bindata, or use the `build-update` target. This is a quickfix to put out the fire. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: [OTE] Add webhook tests - Add dumping of container logs and `kubectl describe pods` output for better diagnostics. - Include targeted certificate details dump (`tls.crt` parse) when failures occur. - Add additional check to verify webhook responsiveness after certificate rotation. This change is a refactor of code from openshift/origin#30059. Assisted-by: Gemini UPSTREAM: <carry>: OTE add logs and dumps for olmv1 test and fix helper for clusterextensions UPSTREAM: <carry>: [OTE] Migrate preflight checks from openshift/origin Migrated OLMv1 operator preflight checks from using external YAML files to defining ClusterRole permissions directly in Go structs. This improves test reliability and simplifies test setup by removing file dependencies. The changes ensure precise replication of original test scenarios, including specific permission omissions for services, create verbs, ClusterRoleBindings, ConfigMap resourceNames, and escalate/bind verbs. Assisted-by: Gemini UPSTREAM: <carry>: [OTE] Add webhook to validate openshift-service-ca certificate rotation This change is a refactor of code from openshift/origin#30059. Assisted-by: Gemini UPSTREAM: <carry>: Adds ResourceVersion checks to the tls secret deletion test, mirroring the logic used in the certificate rotation test. This makes the test more robust by ensuring a new secret is created, not just that an existing one is still present. UPSTREAM: <carry>: [OTE] - Readme:Add info to help use payload-aggregate with new tests UPSTREAM: <carry>: remove obsolete owners Signed-off-by: grokspawn <jordan@nimblewidget.com> UPSTREAM: <carry>: [OTE] add catalog tests from openshift/origin This commit migrates the olmv1_catalog set of tests from openshift/origin to OTE as part the broad effort to migrate all tests. Assisted-by: Gemini UPSTREAM: <carry>: Migrate single/own namespace tests This commit migrates the OLMv1 single and own namespace watch mode tests from openshift/origin/test/extended/olm/olmv1-singleownnamespace.go to this repository. This is part of the effort to move component-specific tests into their respective downstream locations. Assisted-by: Gemini UPSTREAM: <carry>: Adds ResourceVersion checks to the tls secret deletion test, mirroring the logic used in the certificate rotation test. This makes the test more robust by ensuring a new secret is created, not just that an existing one is still present. This reverts commit 0bb1953. UPSTREAM: <carry>: [OTE] Add webhook to validate openshift-service-ca certificate rotation This reverts commit e9e3220. UPSTREAM: <carry>: Ensure unique name for bad-catalog tests UPSTREAM: <carry>: Revert "Handle service-ca cert availability/rotation" This reverts commit 9cc13d8. UPSTREAM: <carry>: grant QE approver permission for OTE UPSTREAM: <carry>: Update webhook ote tests to use latest webhook-operator Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com> UPSTREAM: <carry>: update operator-controller to v1.5.1 UPSTREAM: <carry>: configure watchnamespace using spec.config for OTE tests UPSTREAM: <carry>: add jiazha to approvers UPSTREAM: <carry>: Create combined manifests for comparison Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Use Helm charts for openshift manifests Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: add support for tests-private cases and add the case UPSTREAM: <carry>: Fix cp-manifests copying of helm charts The method used to copy the helm charts is including an extra `helm` directory in the destination path, that is making the cluster-olm-operator code just a bit more complicated than it needs to be. This fixes the copy location. Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Remove kustomize manifests from images and repo Now that helm manifests are being used to dynamically generate the manifests, the pre-generated manifests are no longer needed. So, we can remove them from the repo and the images. However, because we still want to verify the manifests are "good", we are still creating a "single-file" version of the manifests for verification purposes, and to allow us to see what changes are happening to the manifests (from upstream and/or downstream sources). Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Add pedjak and trgeiger as reviewers UPSTREAM: <carry>: migrate more cases from tests-private and enhance suites with filters UPSTREAM: <carry>: Updating ose-olm-operator-controller-container image to be consistent with ART for 4.21 Reconciling with https://github.com/openshift/ocp-build-data/tree/4fbe3fab45239dc4be6f5d9d98a0bf36e0274ec9/images/ose-olm-operator-controller.yml UPSTREAM: <carry>: Updating ose-olm-catalogd-container image to be consistent with ART for 4.21 Reconciling with https://github.com/openshift/ocp-build-data/tree/4fbe3fab45239dc4be6f5d9d98a0bf36e0274ec9/images/ose-olm-catalogd.yml UPSTREAM: <carry>: OTE: Enable disconnected environment and build test operator controller image Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com> UPSTREAM: <carry>: for incompatible test add func to wait builder and deployer SA creation by OCP controller UPSTREAM: <carry>: Fix VERSION replacement in catalog bindata Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: check kubeconfig only run-test and run-suite UPSTREAM: <carry>: Clean up cp-manifests There is no longer a need to copy conditionally Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: Update does-not-exist and simple install to work in a disconnected environment Signed-off-by: Todd Short <todd.short@me.com> UPSTREAM: <carry>: support webhook case in disconnected UPSTREAM: <carry>: Consolidate build API This consolidates the in-cluster building of a bundle and catalog. The catalog and bundle bindata are inputs, along with a set of replacements so that catalog and bundle templates can be used to create the images. This can be done in the BeforeEach() for a set of tests that use the same data. Signed-off-by: Todd Short <todd.short@me.com>
…images from openshift/catalogd/manifests.yaml
Signed-off-by: Todd Short <todd.short@me.com>
…oss to avoid flakes
Signed-off-by: Todd Short <todd.short@me.com>
…uess and waiting for k8s cleanups Co-Author: kuiwang@redhat.com
…nts ( Follow-Up of: 714977c )
… uninstall Assisted-by: Cursor
…g for CE install tests With BoxcutterRuntime, Installed=True is only set after all availability probes pass, which can take longer on TechPreview clusters (IPv6, multi-arch). Increases install-specific timeout from 5m to 10m and logs condition state on each poll to aid debugging flaky failures.
Add 7 Ginkgo tests under [sig-olmv1][OCPFeatureGate:NewOLMDeploymentConfig] covering the spec.config.inline.deploymentConfig feature: Positive tests (verify applied customisations): - environment variables - resource requirements - tolerations - node selector - annotations on deployment and pod template Negative tests (verify terminal validation errors): - invalid deploymentConfig.env type (string instead of array) - unknown field inside deploymentConfig (additionalProperties:false) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Todd Short <tshort@redhat.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…64 support Signed-off-by: Daniel Franz <dfranz@redhat.com>
Signed-off-by: Todd Short <tshort@redhat.com>
…t in OTE tests Update all remaining references to ClusterExtensionRevision in openshift/tests-extension to use ClusterObjectSet, matching the upstream rename in operator-framework/operator-controller#2589. Files updated: - test/qe/specs/olmv1_ce.go: RBAC resource names and comments - test/olmv1-preflight.go: scenario constants, test names, RBAC rules - .openshift-tests-extension/openshift_payload_olmv1.json: test name - pkg/bindata/qe/bindata.go: embedded RBAC templates - test/qe/testdata/olm/sa-nginx-limited-boxcutter.yaml: RBAC resources - test/qe/testdata/olm/sa-nginx-insufficient-operand-rbac-boxcutter.yaml: RBAC resources Signed-off-by: Camila Macedo <cmacedo@redhat.com> Made-with: Cursor
…s ClusterObjectSet The upstream rename of ClusterExtensionRevision to ClusterObjectSet (operator-framework/operator-controller#2589) breaks the incompatible operator detection in cluster-olm-operator. The cluster-olm-operator binary still reads ClusterExtensionRevision resources to find operators with olm.maxOpenShiftVersion, so after the rename it never detects incompatible operators and InstalledOLMOperatorsUpgradeable stays True. Skip this test when NewOLMBoxCutterRuntime feature gate is enabled until cluster-olm-operator is updated to read ClusterObjectSet. Signed-off-by: Camila Macedo <cmacedo@redhat.com> Made-with: Cursor
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Signed-off-by: Todd Short <todd.short@me.com>
…to run outside of OCP
7e81373 to
7e83d75
Compare
|
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
internal/operator-controller/catalogmetadata/client/client.go (1)
109-112: Align cache contract/docs with new non-200 behavior.This change is reasonable for HA retries, but it now conflicts with the
Cachecontract comments (Lines 30-35) that say population errors are cached. Please update those docs (and any related tests) to reflect that HTTP non-200 errors are returned directly and not cached.Suggested doc update
type Cache interface { // Get returns cache for a specified catalog name and version (resolvedRef). // // Method behaviour is as follows: // - If cache exists, it returns a non-nil fs.FS and nil error // - If cache doesn't exist, it returns nil fs.FS and nil error - // - If there was an error during cache population, - // it returns nil fs.FS and the error from the cache population. - // In other words - cache population errors are also cached. + // - If there was a cached error during cache population, + // it returns nil fs.FS and that cached error. + // Note: not all population failures are cached (for example, + // HTTP non-200 responses may be returned directly by PopulateCache). Get(catalogName, resolvedRef string) (fs.FS, error)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operator-controller/catalogmetadata/client/client.go` around lines 109 - 112, Update the Cache contract comment to reflect the new behavior where HTTP non-200 responses are not cached: modify the comment on the Cache interface (the block that previously said "population errors are cached") to state that transient HTTP non-200 responses (e.g., non-leader 404s) are returned directly and not stored, and add a note that only successful 200 responses are populated into the cache; adjust any related unit/integration tests that assert caching of non-200 responses to expect immediate error returns from the function that currently does `return nil, fmt.Errorf("error: received unexpected response status code %d", resp.StatusCode)`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/catalogd/serverutil/serverutil.go`:
- Around line 96-99: The TLS setup assumes r.cw is non-nil when r.cfg.CertFile
and r.cfg.KeyFile are set; add a nil guard before using r.cw to avoid a panic:
check that r.cw != nil before constructing tls.Config with GetCertificate, and
if it is nil return a descriptive error (or load the certificate directly) so
tls.Config.GetCertificate is never dereferenced; update the block that creates
tls.Config (references: r.cfg.CertFile, r.cfg.KeyFile, r.cw, tls.Config,
GetCertificate) to validate r.cw and handle the misconfiguration path safely.
In `@test/e2e/steps/ha_steps.go`:
- Around line 45-47: The code reads sc.leaderPods["catalogd"] into oldLeader
without validating existence; change the step to first check presence and
non-empty value (e.g. look up with the comma-ok form on sc.leaderPods for key
"catalogd" or test oldLeader != "") and fail fast if missing (use the test
context's failure helper such as t.Fatalf or the step's error return) so the
step doesn't proceed when the prior leader-discovery did not record a leader.
---
Nitpick comments:
In `@internal/operator-controller/catalogmetadata/client/client.go`:
- Around line 109-112: Update the Cache contract comment to reflect the new
behavior where HTTP non-200 responses are not cached: modify the comment on the
Cache interface (the block that previously said "population errors are cached")
to state that transient HTTP non-200 responses (e.g., non-leader 404s) are
returned directly and not stored, and add a note that only successful 200
responses are populated into the cache; adjust any related unit/integration
tests that assert caching of non-200 responses to expect immediate error returns
from the function that currently does `return nil, fmt.Errorf("error: received
unexpected response status code %d", resp.StatusCode)`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c7e7f28d-de77-429b-8cf2-46b69cedc5ba
⛔ Files ignored due to path filters (6)
go.sumis excluded by!**/*.sumvendor/github.com/docker/cli/AUTHORSis excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/configfile/file.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/credentials/file_store.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/docker/cli/cli/config/memorystore/store.gois excluded by!**/vendor/**,!vendor/**vendor/modules.txtis excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (17)
Makefiledocs/designs/testing/2026-04-13-e2e-isolation/design.mdgo.modhelm/experimental.yamlinternal/catalogd/serverutil/serverutil.gointernal/operator-controller/catalogmetadata/client/client.gointernal/operator-controller/catalogmetadata/client/client_test.gomanifests/experimental-e2e.yamlmanifests/experimental.yamltest/e2e/README.mdtest/e2e/features/ha.featuretest/e2e/steps/ha_steps.gotest/e2e/steps/hooks.gotest/e2e/steps/steps.gotest/extension-developer-e2e/setup.shtest/internal/catalog/catalog.gotest/internal/catalog/catalog_test.go
💤 Files with no reviewable changes (2)
- test/extension-developer-e2e/setup.sh
- internal/operator-controller/catalogmetadata/client/client_test.go
✅ Files skipped from review due to trivial changes (5)
- docs/designs/testing/2026-04-13-e2e-isolation/design.md
- helm/experimental.yaml
- test/internal/catalog/catalog.go
- go.mod
- test/e2e/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- test/internal/catalog/catalog_test.go
- test/e2e/steps/hooks.go
- test/e2e/steps/steps.go
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | ||
| config := &tls.Config{ | ||
| GetCertificate: tlsFileWatcher.GetCertificate, | ||
| GetCertificate: r.cw.GetCertificate, | ||
| MinVersion: tls.VersionTLS12, |
There was a problem hiding this comment.
Add a nil guard for certificate watcher when TLS is enabled.
When cert/key are configured, r.cw is assumed non-nil. A misconfiguration here can panic during TLS handshake.
🔧 Proposed defensive fix
if r.cfg.CertFile != "" && r.cfg.KeyFile != "" {
+ if r.cw == nil {
+ return fmt.Errorf("catalog server TLS is enabled but certificate watcher is nil")
+ }
config := &tls.Config{
GetCertificate: r.cw.GetCertificate,
MinVersion: tls.VersionTLS12,
}
listener = tls.NewListener(listener, config)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | |
| config := &tls.Config{ | |
| GetCertificate: tlsFileWatcher.GetCertificate, | |
| GetCertificate: r.cw.GetCertificate, | |
| MinVersion: tls.VersionTLS12, | |
| if r.cfg.CertFile != "" && r.cfg.KeyFile != "" { | |
| if r.cw == nil { | |
| return fmt.Errorf("catalog server TLS is enabled but certificate watcher is nil") | |
| } | |
| config := &tls.Config{ | |
| GetCertificate: r.cw.GetCertificate, | |
| MinVersion: tls.VersionTLS12, | |
| } | |
| listener = tls.NewListener(listener, config) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/catalogd/serverutil/serverutil.go` around lines 96 - 99, The TLS
setup assumes r.cw is non-nil when r.cfg.CertFile and r.cfg.KeyFile are set; add
a nil guard before using r.cw to avoid a panic: check that r.cw != nil before
constructing tls.Config with GetCertificate, and if it is nil return a
descriptive error (or load the certificate directly) so
tls.Config.GetCertificate is never dereferenced; update the block that creates
tls.Config (references: r.cfg.CertFile, r.cfg.KeyFile, r.cw, tls.Config,
GetCertificate) to validate r.cw and handle the misconfiguration path safely.
| sc := scenarioCtx(ctx) | ||
| oldLeader := sc.leaderPods["catalogd"] | ||
|
|
There was a problem hiding this comment.
Fail fast if previous leader was never recorded.
Line 46 reads sc.leaderPods["catalogd"] without validating it. If it is empty, this step can pass by accepting any holder identity, even when the prerequisite leader-discovery step did not run successfully.
Suggested fix
func NewCatalogdLeaderIsElected(ctx context.Context) error {
sc := scenarioCtx(ctx)
oldLeader := sc.leaderPods["catalogd"]
+ if oldLeader == "" {
+ return fmt.Errorf("catalogd leader pod not found in scenario context; run 'catalogd is ready to reconcile resources' first")
+ }
waitFor(ctx, func() bool {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/steps/ha_steps.go` around lines 45 - 47, The code reads
sc.leaderPods["catalogd"] into oldLeader without validating existence; change
the step to first check presence and non-empty value (e.g. look up with the
comma-ok form on sc.leaderPods for key "catalogd" or test oldLeader != "") and
fail fast if missing (use the test context's failure helper such as t.Fatalf or
the step's error return) so the step doesn't proceed when the prior
leader-discovery did not record a leader.
|
@openshift-bot: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The downstream repository has been updated with the following following upstream commits:
The
vendor/directory has been updated and the following commits were carried:@catalogd-updateThis pull request is expected to merge without any human intervention. If tests are failing here, changes must land upstream to fix any issues so that future downstreaming efforts succeed.
/assign @openshift/openshift-team-operator-runtime
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores