Skip to content

feat: plan-centric reconcile wiring (Steps 5-6)#88

Merged
bdchatham merged 7 commits intomainfrom
feat/plan-centric-reconcile-wiring
Apr 15, 2026
Merged

feat: plan-centric reconcile wiring (Steps 5-6)#88
bdchatham merged 7 commits intomainfrom
feat/plan-centric-reconcile-wiring

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented Apr 15, 2026

Summary

Wires the plan foundations from #87 into the actual reconcile loop. This is the behavioral change that makes Plan the central abstraction across all SeiNode phases.

  • Phase-aware BuildPlan: Init plans now include ensure-data-pvc, apply-statefulset, apply-service before sidecar tasks. All plans set TargetPhase/FailedPhase. Running phase builds a convergence plan (apply-statefulset + apply-service).
  • Mutating ForNode: ForNode(node) error resumes active plans or builds new ones, stamping directly onto node.Status.Plan. No more (NodePlanner, error) return.
  • Simplified reconcile loop: ensureNodeFinalizer → reconcilePeers → ForNode → persist plan → ExecutePlan → emit metrics → reconcileRunningTasks. The phase switch, reconcilePending, reconcileInitializing, reconcileRunning, transitionPhase, and all direct resource creation methods are deleted.
  • Bootstrap plans: ensure-data-pvc prepended at start, apply-statefulset + apply-service inserted after teardown-bootstrap (preserving RWO PVC constraint).
  • Genesis plans: Infrastructure tasks prepended before ceremony tasks.

What changed in the reconcile loop

Before (6 code paths):

ForNode → Validate → reconcilePeers → ensureNodeDataPVC → switch phase {
  Pending: reconcilePending (build plan, transition)
  Initializing: reconcileInitializing (special-case STS/SVC, ExecutePlan, manual transition)
  Running: reconcileRunning (direct STS/SVC SSA, runtime tasks)
}

After (1 linear path):

ensureNodeFinalizer → reconcilePeers → ForNode → persist plan → ExecutePlan → emit metrics → reconcileRunningTasks

Test plan

  • make build — compiles cleanly
  • make test — all tests pass (80.5% controller coverage)
  • make lint — zero issues
  • Review: ForNode plan persistence flow (patch base captured before mutation)
  • Review: Bootstrap plan task ordering (ensure-data-pvc → bootstrap → teardown → apply-statefulset)
  • Review: Convergence plan lifecycle (built for Running, nilled on completion)

🤖 Generated with Claude Code

Wire the plan foundations into the actual reconcile loop. This is the
behavioral change that makes Plan the central abstraction.

Step 5: Phase-aware BuildPlan + infrastructure tasks in plans
- buildBasePlan prepends ensure-data-pvc, apply-statefulset, apply-service
  before sidecar tasks
- buildBootstrapPlan inserts ensure-data-pvc at start, apply-statefulset +
  apply-service after teardown-bootstrap
- buildGenesisPlan includes infrastructure tasks before ceremony tasks
- All plans set TargetPhase: Running, FailedPhase: Failed
- Each mode planner's BuildPlan checks node phase: Pending builds init
  plan, Running builds convergence plan (apply-statefulset + apply-service)

Step 6: Mutating ForNode + simplified reconcile loop
- ForNode(node) error — resumes active plan or builds new one, stamps
  directly onto node.Status.Plan
- Reconcile loop: ensure finalizer → check Failed → reconcilePeers →
  ForNode → persist new plan → ExecutePlan → emit metrics → runtime tasks
- Deleted: reconcilePending, reconcileInitializing, reconcileRunning,
  transitionPhase, ensureNodeDataPVC, reconcileNodeStatefulSet,
  reconcileNodeService, fieldOwner constant
- New plan is persisted via status patch before ExecutePlan runs (captures
  patch base before ForNode mutates the in-memory object)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread internal/controller/node/controller.go Outdated
Comment thread internal/planner/bootstrap.go Outdated
bdchatham and others added 6 commits April 15, 2026 16:21
Address all findings from the Kubernetes and Platform specialist reviews.

Must fix:
- Document deliberate empty FailedPhase on convergence plans (retry, not fail)
- Delete dead TestBootstrapMode test (passed but tested nothing)
- Replace non-asserting phase transition tests with proper executor test

Renames for clarity:
- ForNode → ResolvePlan (conveys mutation, not accessor)
- hadPlan → planAlreadyActive (self-documenting branch condition)
- nilPlanIfConvergence → clearCompletedConvergencePlan (says what it does)

Deduplication:
- Remove duplicate NeedsLongStartup from planner (noderesource has it)
- Consolidate fieldOwner into task.go (was duplicated across task files)
- Remove unused *SeiNode param from configureGenesisParams
- Add DefaultStatus() to taskBase, use in all fire-and-forget tasks

Test quality:
- Replace hardcoded Tasks[3] indices with findPlannedTask name-based lookup
- Add TestExecutePlan_ConvergencePlan_NilsOnCompletion
- Add TestExecutePlan_TaskFailure_SetsPlanFailedCondition
- Add comments to magic for-range-N loops explaining plan progression

Documentation:
- Add planner/doc.go explaining full plan lifecycle
- Document ResolvePlan contract (caller must capture patch base before calling)
- Fix truncated ResultRequeueImmediate comment
- Update CLAUDE.md Key Patterns with plan-driven reconciliation section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename prevPhase to observedPhase in controller.go
- Genesis plan uses genesisParamsForTaskType for genesis-specific tasks,
  falls through to paramsForTaskType for shared tasks (infrastructure +
  common sidecar). Eliminates the inline switch that duplicated the
  params factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
buildGenesisPlan now follows the same pattern as buildBasePlan:
configApplyParams (with validator mode + overrides) is passed through
to the shared paramsForTaskType factory. genesisCeremonyParams only
handles the 4 ceremony-specific tasks (generate-identity, generate-gentx,
upload-genesis-artifacts, set-genesis-peers) and returns nil for
everything else, falling through to the shared factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…actory

Move genesis ceremony task params (generate-identity, generate-gentx,
upload-genesis-artifacts, set-genesis-peers) into the shared
paramsForTaskType factory. Remove the separate genesisCeremonyParams
function — one factory handles all task types.

Genesis ceremony params are guarded by a nil check on
Validator.GenesisCeremony, extracted into genesisCeremonyTaskParams
helper for readability. buildGenesisPlan now calls paramsForTaskType
directly, same as every other plan builder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s error

Two structural improvements to plan composition:

1. Extract buildSidecarProgression — the insertBefore calls that build
   the sidecar task sequence were duplicated between buildBasePlan and
   buildBootstrapProgression. Now both delegate to a single shared
   function, eliminating the drift risk.

2. insertBefore returns an error when the target task is not found,
   instead of silently dropping the insertion. This catches plan
   construction bugs at build time rather than producing silently
   incomplete plans that run without the missing task.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tent

Remove the one-line buildBootstrapProgression wrapper — buildBootstrapPlan
calls buildSidecarProgression directly. Add comment to
buildPostBootstrapProgression explaining it is intentionally separate
from the shared sidecar progression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 23bf638 into main Apr 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant