feat: status-aware task Submit with Run counter and livez endpoint (Alpha P3) by bdchatham · Pull Request #61 · sei-protocol/seictl

bdchatham · 2026-04-03T18:29:00Z

Summary

Cloud-API model for the sidecar task engine: the controller submits stable keys, and the sidecar owns the execution lifecycle.

Status-aware Submit: failed tasks are transparently re-executed on re-submit; running/completed are idempotent no-ops
Run counter on TaskResult: tracks execution count under the same stable ID. Increments on failed→re-execute, NOT on crash-recovery rehydration
Concurrency safety: sync.Mutex + inFlight map prevents double-execution of the same failed task ID
/v0/livez endpoint: SQLite liveness check via Ping() — distinct from /v0/healthz (readiness). Use as Kubernetes liveness probe.
SQLite migration v3: adds run column

Why

Deterministic task IDs + PVC-persisted SQLite = permanently stuck failed tasks after pod restart. The engine's dedup check returned the cached failure without re-executing. This is the sidecar half of the Alpha P3 task reliability initiative. The controller half (plan IDs, simplified retry, failure diagnostics) follows in a separate PR on sei-node-controller-networking.

Test plan

TestSubmitReExecutesFailedTask — failed task re-executes, Run increments to 2
TestSubmitReExecutesFailedTaskThatFailsAgain — persistent failure increments Run
TestSubmitDoesNotIncrementRunOnRehydration — crash recovery preserves Run=1
TestSubmitConcurrentSameFailedID — mutex prevents double-execution
TestSubmitRunFieldOnFirstSubmit — new tasks start at Run=1
TestLivezReturns200WhenStoreHealthy / TestLivezReturns200BeforeReady
All 40+ existing engine, server, and store tests pass
go vet clean

🤖 Generated with Claude Code

Cloud-API model for task lifecycle: the controller submits stable keys and the sidecar owns execution lifecycle. Failed tasks are transparently re-executed on re-submit; running and completed tasks are idempotent no-ops. Engine changes: - Submit branches on existing task status: failed → increment Run, reset to running, re-execute. Running/completed → return existing ID. - sync.Mutex + inFlight map prevents concurrent double-execution of the same failed task ID. - Run counter on TaskResult tracks how many times a task has been executed under the same ID. Starts at 1, increments only on failed→re-execute (NOT on stale-task rehydration). - SQLite migration v3 adds the run column. Observability: - /v0/livez endpoint checks SQLite responsiveness via Ping(). Use as a Kubernetes liveness probe (distinct from /v0/healthz readiness). - Run counter included in submit/complete/fail log lines. Tests: 6 new tests covering re-execution, rehydration stability, concurrent dedup, and Run field correctness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sidecar/engine/engine.go

Addresses Tide review and PR feedback: - Remove inFlight map — the mutex alone serializes Submit; the microsecond race between execute-return and store.Save is handled by the next controller poll seeing "running" then "failed". - Migration default 0 → 1 (pre-existing tasks completed their first run) - Fix concurrent test: use blocking handler to actually test mutex - Add livez failure path test (503 when store is closed) - Add store Ping and Run round-trip tests - Log stale task Save errors instead of silently discarding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bdchatham mentioned this pull request Apr 3, 2026

feat: plan-scoped task IDs, simplified retry, failure diagnostics (Alpha P3) sei-protocol/sei-k8s-controller#50

Merged

6 tasks

bdchatham commented Apr 3, 2026

View reviewed changes

sidecar/engine/engine.go Outdated Show resolved Hide resolved

bdchatham merged commit a595641 into main Apr 3, 2026
2 checks passed

bdchatham deleted the feat/status-aware-submit-and-livez branch April 3, 2026 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: status-aware task Submit with Run counter and livez endpoint (Alpha P3)#61

feat: status-aware task Submit with Run counter and livez endpoint (Alpha P3)#61
bdchatham merged 2 commits intomainfrom
feat/status-aware-submit-and-livez

bdchatham commented Apr 3, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Apr 3, 2026

Summary

Why

Test plan

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant