Skip to content

[Bug] Bookie (v4.1.2) Restarting Suddenly: ZKRegistrationClient Invalidate Cache / NetworkTopology Node Removal #25433

@amjadali-klarity

Description

@amjadali-klarity

Search before reporting

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

User environment

Pulsar Version: 4.1.2 (Image: docker.io/apachepulsar/pulsar-all:4.1.2)
Platform: Kubernetes (Azure/AKS based on log headers)
Component: BookKeeper / ZooKeeper

Issue Description

Our pulsar-bookie pods are experiencing sudden restarts. The logs indicate that the ZKRegistrationClient is invalidating the cache for the specific bookie address, followed by NetworkTopologyImpl removing the node from the /default-rack.

Error messages

INFO  org.apache.bookkeeper.discover.ZKRegistrationClient - Invalidate cache for pulsar-bookie-1.pulsar-bookie.pulsar.svc.cluster.local:3181
INFO  org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node: /default-rack/pulsar-bookie-1.pulsar-bookie.pulsar.svc.cluster.local:3181

Reproducing the issue

ZooKeeper logs show standard ruok commands but no explicit session expiration immediately preceding the drop.
The Bookie seems to be under normal load (Compaction usage buckets are mostly at 100%).
Config: diskUsageWarnThreshold = 0.9, isForceGCAllowWhenNoSpace = true.

Additional information

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions