Skip to content

Eagerly register invalid counter metrics at startup#245

Merged
pablodeymo merged 2 commits intomainfrom
eagerly-register-invalid-counters
Mar 20, 2026
Merged

Eagerly register invalid counter metrics at startup#245
pablodeymo merged 2 commits intomainfrom
eagerly-register-invalid-counters

Conversation

@pablodeymo
Copy link
Collaborator

@pablodeymo pablodeymo commented Mar 20, 2026

Motivation

Metrics using function-scoped LazyLock statics were only registered with Prometheus on first use. Counters that may never be incremented (e.g. invalid attestation counters) were entirely absent from /metrics output, causing:

  1. Dashboard gaps: Panels show "No data" instead of 0
  2. Alerting blind spots: PromQL expressions like rate(lean_attestations_invalid_total[5m]) > 0 silently fail when the metric doesn't exist

Flagged by the leanMetrics team during devnet-3 metrics review.

Description

Moves all blockchain metrics from function-scoped LazyLock statics to module-scoped statics, and adds a metrics::init() function that eagerly registers every metric at startup via LazyLock::force().

Changes

  • crates/blockchain/src/metrics.rs: All 28 metric statics (14 gauges, 10 counters, 8 histograms) are now at module scope, organized by type (Gauges / Counters / Histograms). The init() function forces all of them. Public API functions are now thin wrappers that reference the module-scope statics directly.
  • bin/ethlambda/src/main.rs: Calls metrics::init() at startup before other metric initialization.

Why all metrics, not just the invalid counters?

Per review feedback from @MegaRedHand — initializing all metrics makes the pattern consistent and ensures the full metric set is visible from the first Prometheus scrape. This also aligns with the module-scope style already used in state_transition/src/metrics.rs and net/p2p/src/metrics.rs.

How to Test

  1. Start a node: cargo run
  2. Query the metrics endpoint: curl localhost:5054/metrics
  3. All metrics should appear immediately, including zero-value counters:
    lean_attestations_invalid_total 0
    lean_pq_sig_aggregated_signatures_invalid_total 0
    lean_pq_sig_attestation_signatures_invalid_total 0
    lean_fork_choice_reorgs_total 0
    ...
    

… with value 0

Counters like lean_attestations_invalid_total, lean_pq_sig_aggregated_signatures_invalid_total,
and lean_pq_sig_attestation_signatures_invalid_total may never be incremented during normal
operation. Because they used function-scoped LazyLock statics, they were only registered with
Prometheus on first increment — meaning they were absent from /metrics output entirely if no
invalid event occurred. This moves them to module scope and forces initialization at startup
via metrics::init().
@github-actions
Copy link

🤖 Kimi Code Review

The PR is a clean refactoring that improves observability by ensuring "never-incremented" counters appear in Prometheus. No critical issues found.

Minor suggestions:

  1. Use expect() instead of unwrap() (lines 7, 16, 25 in metrics.rs):

    .expect("failed to register lean_attestations_invalid_total metric")

    This aids debugging if registration fails (e.g., duplicate metric names).

  2. Documentation for init() (line 31 in metrics.rs):
    Add a note that this should be called exactly once during startup:

    /// Initialize metrics that may never be incremented so they appear in Prometheus with value 0.
    /// Must be called once at startup before any metrics are updated.
    

Correctness verification:

  • Moving LazyLock statics to module scope is sound and more efficient (eliminates per-call initialization checks)
  • LazyLock::force() is the correct idiom for eager initialization and is thread-safe
  • The three selected counters (*_INVALID_TOTAL) are appropriate candidates for eager initialization since valid systems may never increment them, unlike head slot/gauges which update every block

Security/Consensus impact: None. This is purely instrumentation code with no effect on attestation validation, fork choice, or state transition logic.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

The change is small and focused. Here's the review:


PR Review: Eagerly register invalid counter metrics at startup

Overall: The change is correct and addresses a real observability gap. The approach is consistent with the existing codebase patterns.

Correctness

The mechanics are sound. LazyLock::force initializes the static if not already initialized, which triggers the register_int_counter! macro and registers the metric with the global Prometheus registry at value 0. Subsequent calls to LEAN_ATTESTATIONS_INVALID_TOTAL.inc() will use the already-initialized value — no double-registration risk.

One concern: init() is not idempotent-safe under test parallelism

LazyLock::force is itself safe to call multiple times (it's a no-op after first init), but register_int_counter! panics on duplicate registration with the default registry. If test code ever calls metrics::init() concurrently from multiple threads (e.g., via #[tokio::test] parallel test workers), two threads could race inside the LazyLock initializer, and the second registration attempt would panic.

This is the same risk that all the function-scoped statics had, and the existing codebase lives with it — so this PR doesn't regress anything. Worth noting in passing.

Minor: comment should say "force" not "LazyLock::force" for readability

metrics.rs:35:

/// When adding a new module-scope counter above, add a corresponding `LazyLock::force` call here.

This is fine. No change needed — the backtick-qualified name is unambiguous.

Consistency with CLAUDE.md patterns

The CLAUDE.md documents two scoping styles for metrics:

  • Module-scope LazyLock statics (preferred for state_transition metrics)
  • Function-scope statics (used in blockchain metrics)

This PR migrates three blockchain metrics from function-scope to module-scope. The motivation (eager init) justifies the deviation. The CLAUDE.md description of function-scope as the blockchain pattern may now be slightly stale, but the three moved metrics are a natural exception category (zero-fire counters), so the existing guidance still holds for the majority.

main.rs placement

main.rs:84: metrics::init() is correctly placed before set_node_info / set_node_start_time. No ordering issue — init() only touches the three invalid counters.

Summary

Clean, minimal, correct change. No functional regressions. The init() doc comment with the "add a corresponding force call here" reminder is a good guard against future omissions.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

No blocking findings.

The patch is narrowly scoped to Prometheus registration and does not change fork choice, attestation validation, justification/finalization, state transition, XMSS handling, or SSZ logic. Eagerly forcing these three LazyLock<IntCounter> statics in crates/blockchain/src/metrics.rs:32 and calling that from bin/ethlambda/src/main.rs:85 looks correct: it preserves the existing metric names/types and only ensures zero-value counters are registered before the first scrape.

One non-blocking maintainability note: the new init() list in crates/blockchain/src/metrics.rs:34 is manual, so future “may stay at zero” metrics can silently regress if someone adds a module-scope counter and forgets the matching LazyLock::force. A small scrape-level test for startup metrics exposure would make that behavior durable.

I couldn’t run cargo test in this sandbox because rustup tried to write to a read-only path under /home/runner/.rustup.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR fixes a Prometheus observability gap by eagerly registering three "invalid" counter metrics (lean_attestations_invalid_total, lean_pq_sig_aggregated_signatures_invalid_total, lean_pq_sig_attestation_signatures_invalid_total) at node startup so they appear with value 0 even when no invalid event has ever occurred.

  • Three LazyLock statics are promoted from function-scope (inside each inc_* function) to module-scope in crates/blockchain/src/metrics.rs.
  • A new pub fn init() function calls LazyLock::force on each static, which triggers the register_int_counter! macro and inserts the counters into the Prometheus global registry before the first scrape.
  • metrics::init() is called in main.rs before the metrics server is spawned, guaranteeing zero-value presence from the very first /metrics response.
  • The existing inc_* functions are simplified to reference the module-scope statics directly — no behavioral change.

Confidence Score: 5/5

  • This PR is safe to merge — it is a targeted, additive observability fix with no behavioral changes to existing logic.
  • LazyLock::force is idempotent and thread-safe; calling it from init() before the metrics server spawns eliminates any scrape-vs-registration race. The inc_* functions are untouched in semantics. The only risk is a panic inside the LazyLock closures if register_int_counter! fails (e.g., duplicate name registration), but this is consistent with the existing code-wide use of .unwrap() on metric registration and would only surface as a hard programming error, not a runtime regression.
  • No files require special attention.

Important Files Changed

Filename Overview
crates/blockchain/src/metrics.rs Promotes three "invalid" counter statics from function-scope to module-scope and adds init() to eagerly force their initialization, ensuring Prometheus sees them at value 0 from startup. No logic changes to the inc_* functions.
bin/ethlambda/src/main.rs Adds a call to ethlambda_blockchain::metrics::init() before the metrics server is spawned, ensuring the three counters are registered at node startup well before any Prometheus scrape can occur.

Sequence Diagram

sequenceDiagram
    participant Main as main()
    participant Metrics as ethlambda_blockchain::metrics
    participant Prom as Prometheus Registry
    participant Server as Metrics HTTP Server

    Main->>Metrics: init()
    Metrics->>Prom: LazyLock::force(&LEAN_ATTESTATIONS_INVALID_TOTAL)<br/>→ register_int_counter! (value = 0)
    Metrics->>Prom: LazyLock::force(&LEAN_PQ_SIG_AGGREGATED_SIGNATURES_INVALID_TOTAL)<br/>→ register_int_counter! (value = 0)
    Metrics->>Prom: LazyLock::force(&LEAN_PQ_SIG_ATTESTATION_SIGNATURES_INVALID_TOTAL)<br/>→ register_int_counter! (value = 0)
    Main->>Metrics: set_node_info() / set_node_start_time() / set_attestation_committee_count()
    Main->>Server: tokio::spawn(start_metrics_server)
    Note over Server,Prom: All three counters already visible at /metrics with value 0
    Server->>Prom: GET /metrics scrape
    Prom-->>Server: lean_attestations_invalid_total 0<br/>lean_pq_sig_aggregated_signatures_invalid_total 0<br/>lean_pq_sig_attestation_signatures_invalid_total 0
Loading

Last reviewed commit: "Eagerly register inv..."

…nit()

  All LazyLock statics are now at module scope instead of function scope,
  and init() forces every metric to register with Prometheus at startup.
  This ensures all metrics appear in /metrics output from the first scrape,
  even counters that may never be incremented (e.g. invalid attestation counters).
  Addresses review feedback to initialize all metrics, not just the invalid ones.
@pablodeymo pablodeymo merged commit dbfee2d into main Mar 20, 2026
2 checks passed
@pablodeymo pablodeymo deleted the eagerly-register-invalid-counters branch March 20, 2026 21:27
pablodeymo added a commit that referenced this pull request Mar 20, 2026
  Resolve conflict in metrics.rs: keep module-scoped LEAN_TABLE_BYTES
  from #245 (eagerly register metrics), remove redundant function-scoped static.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants