Skip to content

feat: migrate to controller-runtime v0.23 operator#6

Open
MaxRink wants to merge 62 commits intomainfrom
feat/controller-runtime-migration
Open

feat: migrate to controller-runtime v0.23 operator#6
MaxRink wants to merge 62 commits intomainfrom
feat/controller-runtime-migration

Conversation

@MaxRink
Copy link
Collaborator

@MaxRink MaxRink commented Mar 2, 2026

Migrate to controller-runtime v0.23 Operator

Summary

Replaces the hand-rolled informers/workqueues reconciliation logic (ip-control-loop, node-slice-controller, pkg/reconciler) with a proper controller-runtime v0.23 operator. Additionally rewrites the CNI install script as a Go binary, introduces validating webhooks, Prometheus metrics, comprehensive test coverage, Helm chart improvements, security hardening, and extensive documentation.

Scope: 181 files changed (excl. vendor/generated), +15,667 / −6,879 lines across 36 commits.


Changes

1. Dependencies & Tooling

  • Bump k8s.io/* to v0.35.0 (client-go, apimachinery, api, etc.)
  • Add controller-runtime v0.23.2 (pseudo-version) as the operator framework
  • Add cert-controller v0.15.0 (open-policy-agent/cert-controller) for automated webhook TLS certificate rotation
  • Add Cobra v1.10.2 for operator CLI structure
  • Migrate Ginkgo v1 → v2 across all test files (dot-imports: . "github.com/onsi/ginkgo/v2")
  • Run codegen with --with-applyconfig for Server-Side Apply (SSA) typed configurations
  • golangci-lint: Strict CAPI-style configuration (.golangci.yml, 234+ lines) with 50+ enabled linters including gocritic, nolintlint, revive, gosec, errorlint
  • Go 1.26 with updated go.mod / go.sum

2. New Operator Binary (cmd/operator/)

Cobra-based entry point (whereabouts-operator) with a single controller subcommand:

Aspect Detail
Reconcilers Leader-elected (configurable lease duration/renew/retry)
Webhook server All replicas serve webhooks on port 9443 with cert-controller TLS rotation
Metrics Prometheus endpoint on :8080 via controller-runtime
Health probes Readiness + liveness on :8081
Flags --metrics-bind-address, --health-probe-bind-address, --leader-elect, --leader-elect-lease-duration, --leader-elect-renew-deadline, --leader-elect-retry-period, --webhook-port, --webhook-cert-dir

Files: cmd/operator/main.go, cmd/operator/controller.go

3. Reconcilers (internal/controller/)

Reconciler Watches Action
IPPoolReconciler IPPool CRDs Removes orphaned allocations via MergePatch. Handles DisruptionTarget eviction condition, pending pods (RequeueAfter), IPv6 normalization (net.IP.Equal), malformed pod annotations. Sets Stale condition on IPPool when orphans are removed.
NodeSliceReconciler NADs (primary) + Nodes (secondary) Manages NodeSlicePool CRDs. Creates/updates with owner references. SSA-based status updates for slice tracking. Parses IPAM config from NAD spec.config JSON.
OverlappingRangeReconciler OverlappingRangeIPReservation CRDs Deletes orphaned reservations after checking podRef against live pods in the cluster.

Supporting code:

  • patchhelper.go: Cluster-API-inspired PatchHelper — computes minimal diffs (spec via MergePatch, status via SSA) to reduce API server load. SHA-256 content hashing for change detection.
  • conditions.go: Typed conditions (Stale, Ready) with severity levels, following Cluster-API condition conventions.
  • metrics.go: Prometheus counters/gauges for reconciliation operations (allocations cleaned, errors, pool sizes, reconcile duration).
  • setup.go: Wires all reconcilers + webhooks into the controller-runtime manager with readiness/liveness checks.

4. Validating Webhooks (internal/webhook/)

Uses controller-runtime v0.23 typed admission.Validator[T] API:

Webhook Validates
IPPoolValidator Range must be valid CIDR; each allocation's podRef must be "namespace/name" format
NodeSlicePoolValidator Range must be valid CIDR; SliceSize must be string-encoded integer 1–128
OverlappingRangeValidator podRef in spec must be "namespace/name" format
  • matchConditions CEL bypass: ValidatingWebhookConfiguration includes CEL expressions to skip validation for the CNI ServiceAccount (performance optimization for hot-path ADD/DEL operations)
  • Cert rotation: internal/webhook/certrotator/ wraps cert-controller for automatic TLS certificate management with configurable namespace, secret name, and webhook name
  • Metrics: Prometheus counters for webhook admission decisions (allowed/denied per resource type)

5. Shared Validation Library (internal/validation/)

Extracted common validation logic used by both webhooks and future admission policies:

  • ValidateCIDR() — validates CIDR notation using net.ParseCIDR
  • ValidatePodRef() — validates "namespace/name" format with non-empty components
  • ValidateSliceSize() — validates string-encoded integer in range [1, 128]
  • 94 test cases covering edge cases (empty strings, IPv6, CIDR boundary conditions)

6. Install-CNI Go Binary (cmd/install-cni/)

Replaces the shell scripts (script/install-cni.sh, script/lib.sh, script/token-watcher.sh) with a pure Go implementation:

Feature Detail
Binary copy Copies whereabouts to host CNI bin directory with temp-file + atomic rename
Kubeconfig generation Generates kubeconfig with mounted ServiceAccount token, supports custom CA/token/API paths
Flat-file config Writes whereabouts.conf to configured path
Token watcher Watches for ServiceAccount token rotation via fsnotify, regenerates kubeconfig on change
Signal handling Graceful shutdown on SIGTERM/SIGINT with context cancellation
Idempotent SHA-256 content comparison skips unnecessary rewrites

Files: cmd/install-cni/main.go (410 lines), cmd/install-cni/main_test.go (948 lines, 39 test cases)

7. Bug Fixes (CNI Storage Layer)

Critical fixes in pkg/storage/kubernetes/ipam.go:

ID Fix Impact
P2-1 Per-retry context.WithTimeout inside RETRYLOOP Was sharing a single timeout across all 100 retries — could exhaust deadline early
P2-2 Nil guard on leader elector before goroutine launch Prevented potential nil-pointer panic when leader election is disabled
P2-3 Scoped err per ipRange loop iteration Was using shared variable — wrong error could propagate across iterations
P2-4 Reset skipOverlappingRangeUpdate per ipRange Flag was leaking across iterations, skipping necessary overlapping-range updates

8. IP Math & Helpers (pkg/iphelpers/)

  • Migrated IP arithmetic to math/big.Int — eliminates overflow for large IPv6 subnets
  • Uses k8s.io/utils/net for range parsing where applicable
  • Robust CIDR normalization and IP-to-offset calculations
  • 55 new test cases covering IPv4/IPv6 edge cases, large subnets, boundary conditions

9. CRD Types & Conventions

New API types in api/whereabouts.cni.cncf.io/v1alpha1/ (kubebuilder v4 layout):

  • IPPool, NodeSlicePool, OverlappingRangeIPReservation with full kubebuilder markers
  • Added +kubebuilder:shortName markers: ipp, nsp, orip
  • Added +kubebuilder:printcolumn markers: Range, Age
  • Typed conditions (conditions.go) following Cluster-API patterns
  • Regenerated deepcopy with updated markers
  • PROJECT file for kubebuilder scaffolding metadata

Deleted old types in pkg/api/ — superseded by the new api/ package.

10. Kubebuilder Configuration (config/)

Full kubebuilder v4 config directory:

config/
├── crd/bases/           # Generated CRD YAML (3 CRDs)
├── crd/kustomization.yaml
├── daemonset/           # CNI DaemonSet + RBAC
├── default/             # Top-level kustomization
├── manager/             # Operator Deployment manifest
├── prometheus/          # ServiceMonitor for metrics scraping
├── rbac/                # ClusterRole, RoleBinding, ServiceAccount, leader-election
├── samples/             # Example CR instances
└── webhook/             # VWC, Service, cert patches

11. Helm Chart Updates (deployment/whereabouts-chart/)

  • New templates: operator-clusterrole.yaml, operator-clusterrolebinding.yaml, operator-serviceaccount.yaml, operator-pdb.yaml, validatingwebhookconfiguration.yaml, webhook-service.yaml
  • Updated: cluster_role.yaml (split CNI vs operator RBAC), daemonset.yaml (slimmed to CNI-only), node-slice-controller.yaml (deprecated, superseded by operator)
  • Removed: configmap.yaml (ip-control-loop cron config — no longer needed)
  • Updated CRDs: All three CRDs regenerated with shortNames, printColumns, and additional fields
  • New values: operator.image, operator.replicas, operator.resources, webhook.* configuration

12. Prometheus Metrics (internal/controller/metrics.go, internal/webhook/metrics.go)

Controller metrics:

Metric Type Description
whereabouts_ippool_allocations_cleaned_total Counter Orphaned allocations removed
whereabouts_ippool_reconcile_errors_total Counter Reconciliation errors
whereabouts_ippool_size Gauge IP pool size by name
whereabouts_ippool_allocations Gauge Current allocation count by pool
whereabouts_ippool_reconcile_duration_seconds Histogram Reconciliation latency

Webhook metrics:

Metric Type Description
whereabouts_webhook_requests_total Counter Total webhook requests by resource and result
whereabouts_webhook_denied_total Counter Denied admission requests by resource and reason

13. Security Hardening

  • Non-root containers: runAsNonRoot: true, runAsUser: 65532 (nonroot), readOnlyRootFilesystem: true
  • Security contexts: allowPrivilegeEscalation: false, dropped all capabilities
  • Cert-controller: Automatic TLS certificate rotation with Kubernetes Secret bootstrap
  • SHA-256 content hashing: PatchHelper uses SHA-256 (not MD5) for content change detection
  • Atomic file writes: install-cni uses temp-file + rename pattern to prevent partial writes
  • Pod Disruption Budget: Operator deployment includes PDB for availability during upgrades

14. Build System & CI

  • Makefile: Complete rewrite (196+ line diff) — make build, make docker-build, make test, make test-skip-static, make generate, make manifests, make lint, make lint-fix, make kind
  • Dockerfile: Multi-stage build producing both whereabouts (CNI) and whereabouts-operator binaries, plus install-cni
  • CI workflows: Updated test.yml (staticcheck + golangci-lint), build.yml, release workflows for new binary targets
  • Removed: hack/build-go.sh, hack/test-go.sh, hack/generate-code.sh, hack/install-kubebuilder-tools.sh, hack/update-deps.sh (functionality moved to Makefile targets)

15. Documentation

Document Content
doc/architecture.md System architecture overview with component diagram
doc/metrics.md Complete metrics reference for operators and monitoring
doc/proposals/operator-controller-runtime-migration.md Design proposal (276 lines) covering motivation, architecture, migration plan
doc/extended-configuration.md Updated with operator deployment, webhook configuration, new flags
doc/developer_notes.md Updated build/test instructions for new tooling
CHANGELOG.md Structured changelog for the release
CONTRIBUTING.md New contributor guide (99 lines)
README.md Updated with operator architecture, new binary descriptions
CLAUDE.md AI assistant context for the new architecture
.github/copilot-instructions.md Updated Copilot context

16. Existing Package Improvements

Package Changes
pkg/config/ Extended configuration parsing, better error messages, 44 new test cases
pkg/logging/ Enhanced logging utilities, 17 new test cases
pkg/types/ Updated type definitions, JSON marshalling improvements, 30 new test cases
pkg/version/ Version reporting utilities, 11 new test cases
pkg/storage/kubernetes/ Bug fixes (P2-1 through P2-4), improved retry logic, 25 new test cases
pkg/allocate/ Switched %w%s in non-wrapping format contexts (lint compliance)

17. E2E Test Updates

  • Migrated all e2e tests from Ginkgo v1 to v2
  • Updated e2e/util/util.go with proper Go template → json.Marshal for NAD generation
  • Updated test clients (e2e/client/) for new API types
  • Updated pool consistency checkers
  • hack/e2e-setup-kind-cluster.sh: Updated for operator deployment

18. Removed Old Code

Removed Replacement
cmd/controlloop/ cmd/operator/ with controller subcommand
cmd/nodeslicecontroller/ cmd/operator/ (NodeSliceReconciler)
pkg/controlloop/ (430+ lines) internal/controller/ippool_controller.go
pkg/node-controller/ (623+ lines) internal/controller/nodeslice_controller.go
pkg/reconciler/ (1,460+ lines) internal/controller/ reconcilers
pkg/api/ (old CRD types) api/whereabouts.cni.cncf.io/v1alpha1/
script/install-cni.sh, script/lib.sh, script/token-watcher.sh cmd/install-cni/main.go
doc/crds/*.yaml (old static manifests) config/ kubebuilder manifests + Helm chart
hack/build-go.sh, hack/test-go.sh, etc. Makefile targets

19. Upstream Feature Implementations

Implements, tests, and documents several upstream feature requests and bug fixes:

Issue Feature Implementation
k8snetworkplumbingwg#573 /32, /31, /128, /127 support pkg/iphelpers/ updated: FirstUsableIP, LastUsableIP, HasUsableIPs, CountUsableIPs now support /32 (1 IP), /31 (2 IPs per RFC 3021), /128, /127. 10 new unit tests + 4 e2e tests (single IP, point-to-point, IPv6).
k8snetworkplumbingwg#601 Optional gateway IP exclusion New exclude_gateway (bool) config field. When enabled with gateway set, auto-adds gateway/32 or /128 to each range's OmitRanges. 5 new config tests + 1 e2e test.
k8snetworkplumbingwg#550 Graceful node shutdown cleanup IPPoolReconciler checks pod.DeletionTimestamp != nil — treats terminating pods during node shutdown as orphaned. 2 new controller tests.
k8snetworkplumbingwg#510 Optimistic IPAM (no leader election) New optimistic_ipam (bool) config field. When enabled, CNI ADD/DEL bypass leader election and call IPManagementKubernetesUpdate directly, relying on optimistic concurrency. 2 config tests + 3 e2e tests.
k8snetworkplumbingwg#621 Preferred/sticky IP assignment Pod annotation whereabouts.cni.cncf.io/preferred-ip. During allocation, attempts the preferred IP first; falls back to lowest-available if taken or excluded. Supports IPv4/IPv6. 6 allocate tests + 3 e2e tests.
L3 mode Enable L3/routed pools New enable_l3 (bool) config field. When enabled, uses network address and broadcast address for allocation (no .0 skip, no broadcast skip). 9 new tests (4 allocate + 4 config + 1 allocate) + 1 e2e test.

Total new tests for upstream features: 42 unit tests + 12 e2e tests.

All new features are documented in doc/extended-configuration.md with configuration examples.

20. Upstream Bug Coverage Tests

16 new unit tests covering confirmed upstream bugs to prevent regression:

Upstream Issue Bug Description Test Coverage
k8snetworkplumbingwg#666 NodeSlice exclude ranges crash (GetExcludeSubnetsForNodeSlice) Fixed: ranges excluded from allocatable subnets
k8snetworkplumbingwg#622 IPv6 normalization mismatch Tests net.IP.Equal comparison in allocations
k8snetworkplumbingwg#610 Race in concurrent operations Tests parallel allocations under multiple goroutines
k8snetworkplumbingwg#609 Reconciler missing cleanup of orphaned allocations Tests orphan detection with non-existent pods
k8snetworkplumbingwg#597 DualStack allocation inconsistencies Tests IPv4+IPv6 dual-stack allocation + deallocation
k8snetworkplumbingwg#596 Network name collision Tests pool isolation via network_name
k8snetworkplumbingwg#556 Reserved IP range overlap Tests overlapping exclude + range_start/range_end
k8snetworkplumbingwg#525 Multi-interface allocation Tests same pod, multiple interfaces

Test Coverage

~470 test cases across new and updated test files:

Area File(s) Test Cases
IPPool controller ippool_controller_test.go, controller_extended_test.go 99
NodeSlice controller nodeslice_controller_test.go 15
OverlappingRange controller overlappingrange_controller_test.go 5
Controller metrics metrics_test.go 5
IPPool webhook ippool_webhook_test.go 9
NodeSlicePool webhook nodeslicepool_webhook_test.go 12
OverlappingRange webhook overlappingrange_webhook_test.go 8
Webhook metrics metrics_test.go 10
Cert rotator certrotator_test.go 5
Validation validation_test.go, validation_extended_test.go 6
Install-CNI main_test.go 39
CNI main whereabouts/main_test.go 22
IP helpers iphelpers_extended_test.go 55
Config config_extended_test.go, config_test.go 57
Types types_test.go 30
Logging logging_extended_test.go 17
Allocate allocate_test.go 75
Storage/k8s extended_test.go 25
Version version_test.go 11
E2E e2e_test.go 12 (new upstream feature tests)

Tests use controller-runtime envtest (with real etcd + API server) for reconciler and webhook tests, and fake.NewClientset for unit tests.


Binaries

Before After
whereabouts (CNI plugin) whereabouts (CNI plugin) — unchanged
ip-control-loop (reconciler) whereabouts-operator controller
node-slice-controller (Fast IPAM) whereabouts-operator controller
(none) whereabouts-operator controller (includes webhook server)
install-cni.sh (shell script) install-cni (Go binary)

Migration Path

  1. Deploy updated DaemonSet (CNI-only, no more ip-control-loop sidecar)
  2. Deploy operator Deployment (whereabouts-operator controller) — handles reconciliation + webhooks
  3. Remove old ip-control-loop and node-slice-controller Deployments/CronJobs
  4. Apply ValidatingWebhookConfiguration for CRD validation

MaxRink added 15 commits March 2, 2026 00:33
Bumps all k8s.io dependencies from v0.34.1 to v0.35.0 to prepare for
controller-runtime v0.23.2 (unreleased, release-0.23 branch).
Adds ginkgo/v2 as indirect dep; controller-runtime and cert-controller
will be added when imported.

All existing tests pass.
Generate ApplyConfiguration types for all CRDs (IPPool, NodeSlicePool,
OverlappingRangeIPReservation) to enable Server-Side Apply where safe.
- Add kubebuilder validation markers (+kubebuilder:validation:MinLength)
- Add shortNames: ipp, nsp, orip for kubectl convenience
- Add printcolumn markers for Range, PodRef, SliceSize, etc.
- Add +kubebuilder:subresource:status on NodeSlicePool
- Fix NodeSlicePoolStatus godoc (was 'desired state', now 'observed state')
- Add omitempty to NodeSlicePoolStatus.Allocations
- Use runtime.NewSchemeBuilder() (controller-runtime convention)
- Remove unused localSchemeBuilder and empty init()
- Proper field-level godoc on IPAllocation/OverlappingRangeIPReservationSpec
- Fix generate-code.sh to output CRDs to both doc/crds and helm chart
- Regenerate CRD YAMLs and deepcopy with controller-gen v0.20
- cmd/operator/main.go: Cobra root command with scheme registration
  (clientgoscheme + whereaboutsv1alpha1 + nadv1)
- cmd/operator/controller.go: leader-elected controller subcommand
  with health/readiness probes, Prometheus metrics, reconcile interval
- cmd/operator/webhook.go: webhook subcommand with cert-controller
  rotation, readiness gating on cert bootstrap
- internal/controller/setup.go: stub for reconciler registration
- internal/webhook/setup.go: Runnable that waits for certs then
  registers webhooks, with ReadyCheck helper
- internal/webhook/certrotator/: thin wrapper around cert-controller
  rotator.AddRotator with telekom CA config
- controller-runtime v0.23.2 (unreleased, branch tip 4dbfa5c6)
- cert-controller v0.15.0, cobra v1.10.2
- Vendor synced and verified
- operator-install.yaml: Deployment (2 replicas, leader election) + RBAC
- webhook-install.yaml: Deployment + Service (443→9443) + RBAC
- validatingwebhookconfiguration.yaml: 3 webhooks with matchConditions CEL bypass
- daemonset-install.yaml: Remove ip-control-loop, cron ConfigMap; CNI-only RBAC
- node-slice-controller.yaml: Add deprecation notice (superseded by operator)
Delete directories replaced by the new controller-runtime operator:
- cmd/controlloop/ (old ip-control-loop entry point)
- cmd/nodeslicecontroller/ (old node-slice-controller entry point)
- pkg/controlloop/ (old pod watcher using informers+workqueues)
- pkg/node-controller/ (old node-slice controller logic)
- pkg/reconciler/ (old CronJob-based IP reconciler)

Update hack/build-go.sh and Dockerfile to remove old binary targets
(ip-control-loop, node-slice-controller).

Run go mod tidy + vendor to prune unused dependencies (gocron, etc.).
…tecture

- Replace old pkg/reconciler, pkg/controlloop, cmd/controlloop references
- Document operator, reconcilers, webhooks, cert-controller
- Update Ginkgo v1→v2 references
- Update binary list: two binaries instead of three
@MaxRink MaxRink marked this pull request as ready for review March 2, 2026 07:10
@MaxRink MaxRink requested a review from Copilot March 2, 2026 07:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Whereabouts control-plane components from hand-rolled informers/workqueues to a proper controller-runtime v0.23 operator. The CNI binary is unchanged.

Changes:

  • Replaces ip-control-loop and node-slice-controller with a single whereabouts-operator binary (Cobra subcommands: controller and webhook)
  • Adds three new controller-runtime reconcilers (IPPoolReconciler, NodeSliceReconciler, OverlappingRangeReconciler) and validating webhooks for all three CRDs
  • Fixes several CNI storage layer bugs (per-retry context timeout, nil guard on leader elector, scoped err per ipRange, reset skipOverlappingRangeUpdate)

Reviewed changes

Copilot reviewed 71 out of 2554 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
cmd/operator/main.go, controller.go, webhook.go New Cobra-based operator entry point with controller and webhook subcommands
internal/controller/ippool_controller.go New IPPoolReconciler replacing CronJob-based reconciler
internal/controller/nodeslice_controller.go New NodeSliceReconciler replacing node-slice-controller
internal/controller/overlappingrange_controller.go New OverlappingRangeReconciler for orphaned reservation cleanup
internal/controller/*_test.go, internal/webhook/*_test.go Ginkgo v2 unit tests for new reconcilers and webhooks
internal/webhook/ippool_webhook.go, nodeslicepool_webhook.go, overlappingrange_webhook.go Validating webhooks for all three CRDs
internal/webhook/certrotator/certrotator.go Thin wrapper around cert-controller for TLS rotation
internal/webhook/setup.go Runnable that gates webhook registration until certs are ready
pkg/storage/kubernetes/ipam.go Bug fixes: per-retry context, nil guard on LE, scoped err, reset skipOverlappingRangeUpdate
pkg/api/whereabouts.cni.cncf.io/v1alpha1/*_types.go Added kubebuilder shortNames, printColumns, field-level doc comments
pkg/api/whereabouts.cni.cncf.io/v1alpha1/register.go Simplified SchemeBuilder initialization
doc/crds/*.yaml, deployment/whereabouts-chart/crds/*.yaml Updated CRD manifests (shortNames, printColumns, status subresource) and new operator/webhook install manifests
cmd/controlloop/, cmd/nodeslicecontroller/, pkg/controlloop/, pkg/reconciler/ Deleted old entry points and reconciliation code
go.mod Bumps k8s.io/* to v0.35.0, adds controller-runtime v0.23, ginkgo v2, cobra, cert-controller
hack/build-go.sh, Dockerfile Build system updated to produce whereabouts-operator instead of old binaries
Various *_test.go Migrated Ginkgo v1 imports to v2
CLAUDE.md, .github/copilot-instructions.md, proposal.md Documentation updated to reflect new architecture
Comments suppressed due to low confidence (1)

internal/controller/ippool_controller.go:1

  • The CIDR validation in validateIPPool is skipped when pool.Spec.Range is empty (""). An empty range is semantically invalid — an IPPool without a range cannot be used for allocation. The condition should be changed to reject an empty range, consistent with the +kubebuilder:validation:MinLength=1 marker added to the type and how NodeSlicePoolValidator handles an empty range.
// Copyright 2025 Deutsche Telekom

Fixes 3 CRITICAL (VWC name mismatch, cmdDel swallows errors, multi-range dealloc abort), 11 HIGH (byteSliceSub borrow bug, signal-aware context, cmdDelFunc philosophy, cmdCheck no-op, defer-in-loop, Helm chart binary refs, deposed deadlock, ORIP rollback, webhook failurePolicy, version ldflags), 3 CNI spec (error wrapping, zero IPs check, flat config non-fatal), 8 Copilot PR findings (unused var, requestCtx threading, double blank line, error swallowing, immutability enforcement, error message accuracy). Adds NodeSliceReconciler tests (11 specs), webhook ValidateUpdate tests, fixes existing test for error convention.
@MaxRink MaxRink requested a review from Copilot March 2, 2026 09:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 89 out of 2572 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

pkg/storage/kubernetes/ipam.go:1

  • The result channel (buffered with capacity 2) is never read after the deposed case sends to stopM in the updated code. Looking at the full IPManagement function in the context, the second goroutine reads from stopM and then writes to result, but result is never consumed — wg.Wait() is used instead. The result channel write in the leader-election goroutine will block if neither goroutine reads from it, potentially causing a goroutine leak. This was latent before the change, but the PR's refactoring of the deposed case (now sending to stopM instead of result) means the flow should be carefully verified to ensure result is still drained. Specifically, after wg.Wait(), result may have an unread value if the leader-election goroutine completes — confirm that the result <- (<-res) write in the second goroutine (which unblocks after stopM) does not block due to capacity issues when the channel already has an entry.
package kubernetes

…it Helm RBAC, add finalizers

- Replace internal addrToInt/intToAddr with netutils.BigForIP from k8s.io/utils/net
  in IPGetOffset, IPAddOffset, and DivideRangeBySize (pkg/iphelpers)
- Migrate fakek8sclient.NewSimpleClientset to NewClientset (cmd tests)
- Suppress SA1019 for generated whereabouts fake (lacks apply configs)
- Remove deprecated result.Requeue assertions from controller tests
- Split Helm chart RBAC: separate SA/ClusterRole/CRB for webhook and operator
- Add IPPool finalizer for clean overlapping-reservation teardown
- Add shared validation package (internal/validation)
- Harden webhooks with typed admission.Validator[T] API
- Add extnetip dependency for netip range operations
- Rewrite iphelpers to use netip.Addr/Prefix throughout
@MaxRink MaxRink requested a review from Copilot March 2, 2026 12:02
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 103 out of 2595 changed files in this pull request and generated 8 comments.

@MaxRink MaxRink requested a review from Copilot March 2, 2026 12:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 103 out of 2595 changed files in this pull request and generated 7 comments.

MaxRink added 2 commits March 2, 2026 13:49
CI fix:
- e2e-setup-kind-cluster.sh: use operator-install.yaml instead of
  node-slice-controller.yaml (binary /node-slice-controller doesn't exist,
  only /whereabouts-operator). Update wait label to app=whereabouts-operator.

Code hardening (copilot review):
- Remove IPPool finalizer: IPPools are written by CNI binary out-of-band;
  if operator is down, finalizer blocks IPPool deletion indefinitely.
  OverlappingRangeReconciler handles ORIP cleanup asynchronously.
- cmdDel: return nil on failure (CNI spec: DEL must be idempotent,
  returning errors blocks pod deletion)
- cmdDel: fix off-by-one in retry loop (was doing 4 attempts not 3)
- allocationKeyToIP: reject negative offsets before uint64 cast
- denormalizeIPName: convert from recursive to iterative (bounded)
- ipam.go rollback: use context.Background() instead of cancelled ctx
- ipam.go: DRY inline rollback via rollbackCommitted() helper
- ipam.go: set skipOverlappingRangeUpdate when dealloc finds no allocation
  (prevents nil IP passed to UpdateOverlappingRangeAllocation)
- ipam.go: include attempt count in retry error message
- ensureOwnerRef: return error instead of silently swallowing
- config.go: restore defer jsonFile.Close() before ReadAll
- daemonset-install.yaml: background token-watcher.sh + sleep infinity
  (foreground process blocks container readiness)
- NodeSlicePool webhook: warn on spec.range/sliceSize changes
- validation_test.go: document intentional multiple-slash leniency
@MaxRink MaxRink requested a review from Copilot March 2, 2026 13:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2934 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

hack/test-go.sh:1

  • This line is in the deleted file and was already removed, so this issue is fixed.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 19:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2934 changed files in this pull request and generated 2 comments.

- Fix retry() defaults: use :- (default) instead of := (assign) in both
  e2e setup scripts to avoid overwriting global RETRY_MAX/INTERVAL/TIMEOUT
- Remove dead TIMEOUT_K8="5000s" initial assignment (immediately shadowed)
- Fix incomplete block comment in validation_extended_test.go
- Add explicit zapcore.InfoLevel for info case, zapcore.ErrorLevel with
  Development=false for error case, and new warn log level support
- Extract duplicated CEL matchConditions into whereabouts.webhookBypassCondition
  named Helm template
@MaxRink MaxRink requested a review from Copilot March 6, 2026 19:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 5 comments.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 20:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 2 comments.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 20:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 2 comments.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 21:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 3 comments.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 21:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 4 comments.

@MaxRink MaxRink requested a review from Copilot March 6, 2026 23:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 1 comment.

@MaxRink MaxRink requested a review from Copilot March 7, 2026 15:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 1 comment.

- make PatchHelper DeepCopy type assertions safe (no panic)
- replace panic in controller flag setup with utilruntime.Must
- fix verify-codegen diff result accumulation and robust cp semantics
- add operator nodeSelector/tolerations override support in Helm values/template
- clarify cert volume startup/readiness comment and cni-args hostPath purpose
- improve validation test naming/comment clarity for whitespace cases
- drain FakeRecorder events in overlapping range controller tests
- normalize warning text punctuation in IPPool webhook
@MaxRink MaxRink requested a review from Copilot March 7, 2026 22:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 4 comments.

@MaxRink MaxRink requested a review from Copilot March 7, 2026 23:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated no new comments.

@MaxRink MaxRink requested a review from Copilot March 8, 2026 00:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 1 comment.

- clarify jsonEqual fallback semantics for malformed second blob
- rename slice-size test case for no-leading-slash clarity
- update daemonset cni-args comment to current install-cni binary flow
- include warn in --log-level flag description
- scope operator secret permissions to webhook secret resourceName
@MaxRink MaxRink requested a review from Copilot March 9, 2026 13:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 3 comments.

Comment on lines +27 to +30
before := getCounterValue(webhookValidationTotal.WithLabelValues("ippool", "create", "allowed"))
recordValidation("ippool", "create", nil)
after := getCounterValue(webhookValidationTotal.WithLabelValues("ippool", "create", "allowed"))
Expect(after - before).To(Equal(float64(1)))
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prometheus CounterVec used here is a package-level variable registered in init(). Because these tests share the same counter across test cases (metrics are not reset between It blocks), a test that reads the counter before and after an operation is valid only if no other concurrent test increments the same label combination. However, the getCounterValue helper calls WithLabelValues, which also increments the label-value pair's internal label set if it didn't exist yet — this is fine — but test isolation is not guaranteed if tests run in parallel and hit the same ("ippool", "create", "allowed") label vector. More concretely, getCounterValue calls Write on the result of WithLabelValues, not on a snapshot — each call to WithLabelValues returns (or creates) the counter for those labels, so the before and after reads from separate WithLabelValues() calls are consistent as long as no parallel test races on the same labels. The real risk is that any test in the same suite (or even a real webhook call during test) could increment the counter between the before and after reads. Consider using a fresh prometheus.NewRegistry() per test and registering a new CounterVec to isolate test state.

Copilot uses AI. Check for mistakes.
{name: "negative with slash", input: "/-1", want: 0, wantErr: true},
{name: "very large number", input: "9999", want: 0, wantErr: true},
{name: "float", input: "24.5", want: 0, wantErr: true},
{name: "hex", input: "0x18", want: 0, wantErr: true},
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strconv.Atoi in Go does not interpret leading-zero strings as octal — it always parses in base 10. So "024" is parsed as 24, which is within the valid range [1, 128], and the test correctly expects no error and a value of 24. However, the test case name "leading zero" with wantErr: false could be misleading to future contributors who might expect octal parsing. Consider adding a comment to the test case to clarify that Go's strconv.Atoi treats "024" as decimal 24, not octal 20, and that this is the intended behavior. This is a documentation concern rather than a code defect, but it's worth noting since the existing comment on line 97 for "whitespace padded" already returns an error, and having adjacent cases with different error expectations and no explanation could be confusing.

Suggested change
{name: "hex", input: "0x18", want: 0, wantErr: true},
{name: "hex", input: "0x18", want: 0, wantErr: true},
// Go's strconv.Atoi parses "024" as decimal 24 (base 10), not octal 20; this is the intended behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +26
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- get
- list
- patch
- update
- watch
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as in the Helm operator-clusterrole.yaml: secrets access here has no resourceNames restriction at all, granting the operator broad read/write access to all Secrets in the cluster. The create verb legitimately cannot use resourceNames, but get, list, watch, update, and patch should be restricted to the specific webhook cert Secret (e.g., whereabouts-webhook-cert). Add a second rule with resourceNames: [whereabouts-webhook-cert] for the non-create verbs to follow least-privilege.

Copilot uses AI. Check for mistakes.
@MaxRink MaxRink requested a review from Copilot March 9, 2026 14:47
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 2935 changed files in this pull request and generated 1 comment.

Comment on lines +30 to +36
cp -a "${TMP_DIFFROOT_PKG}/." "${DIFFROOT_PKG}/"

echo "diffing api/ against freshly generated codegen"
if ! diff -Naupr "${DIFFROOT_API}" "${TMP_DIFFROOT_API}"; then
ret=1
fi
cp -a "${TMP_DIFFROOT_API}/." "${DIFFROOT_API}/"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cp -a restore steps on lines 30 and 36 run unconditionally, even when diff detected changes (i.e., the code is out of date). This overwrites the freshly-generated code with the original snapshot that was saved before running update-codegen.sh, undoing the regenerated output. The copies back should be conditional on the diff having succeeded (no changes), or removed entirely — the original intent of a verify script is typically to leave the working tree clean and just report the exit code, not to restore the pre-existing content.

Suggested change
cp -a "${TMP_DIFFROOT_PKG}/." "${DIFFROOT_PKG}/"
echo "diffing api/ against freshly generated codegen"
if ! diff -Naupr "${DIFFROOT_API}" "${TMP_DIFFROOT_API}"; then
ret=1
fi
cp -a "${TMP_DIFFROOT_API}/." "${DIFFROOT_API}/"
echo "diffing api/ against freshly generated codegen"
if ! diff -Naupr "${DIFFROOT_API}" "${TMP_DIFFROOT_API}"; then
ret=1
fi

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants