[fix] Fix memory leak in TopicName cache and reduce heap memory usage through deduplication#24457
[fix] Fix memory leak in TopicName cache and reduce heap memory usage through deduplication#24457lhotari wants to merge 41 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a memory leak in the Pulsar Proxy and client by reverting the previous caching strategy and introducing a new NameCache implementation that uses soft references and string interning to reduce heap memory usage.
- Introduces NameCache, TopicNameCache, and NamespaceNameCache classes to provide efficient, self-clearing caches.
- Updates TopicName and NamespaceName to use interning when parsing topic names.
- Removes obsolete configuration properties and scheduled tasks related to cache clearing.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicNameCache.java | Added a new cache for TopicName using the NameCache abstraction. |
| pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java | Updated to use TopicNameCache and string interning. |
| pulsar-common/src/main/java/org/apache/pulsar/common/naming/NamespaceNameCache.java | Added a new cache for NamespaceName using the NameCache abstraction. |
| pulsar-common/src/main/java/org/apache/pulsar/common/naming/NamespaceName.java | Updated to use NamespaceNameCache and string interning. |
| pulsar-common/src/main/java/org/apache/pulsar/common/naming/NameCache.java | Introduced the new NameCache base class with soft references and periodic maintenance. |
| pulsar-broker/src/test/resources/configurations/pulsar_broker_test_standalone.conf | Removed obsolete cache configuration properties. |
| pulsar-broker/src/test/resources/configurations/pulsar_broker_test.conf | Removed obsolete cache configuration properties. |
| pulsar-broker/src/test/java/org/apache/pulsar/common/naming/ServiceConfigurationTest.java | Removed tests for now-obsolete cache configuration. |
| pulsar-broker/src/test/java/org/apache/pulsar/broker/service/StandaloneTest.java | Removed assertions related to the obsolete cache configuration. |
| pulsar-broker/src/test/java/org/apache/pulsar/broker/PulsarServiceTest.java | Removed cache configuration tests and related lookup tests. |
| pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java | Removed the scheduled cache clearance task. |
| pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java | Removed obsolete configuration fields concerning the cache. |
| microbench/src/main/java/org/apache/pulsar/common/naming/package-info.java | New file addition for package-level documentation. |
| microbench/src/main/java/org/apache/pulsar/common/naming/TopicNameBenchmark.java | Added a benchmark to validate the performance of TopicName lookup. |
| conf/broker.conf | Removed obsolete cache configuration properties. |
pulsar-common/src/main/java/org/apache/pulsar/common/naming/NameCache.java
Outdated
Show resolved
Hide resolved
|
You can check benchmark.diff.txt for how I set up the tests. BTW, @codelipenghui mentioned "over engineering" here. However, IMO, the cache itself is "over engineering" to me. I checked many places that Both the existing solution and this solution have issues when a lot of different topic names are used. The existing solution relies on manual call of Actually, I think we should remove the cache, which brings many issues that should never happen. From my perspective, introducing the cache is just to get around the poor implementation of |
@BewareMyPower There was no review and there's rationale that this PR makes sense and improves Pulsar without causing performance regressions. I agree that in this case, it might take longer to reach consensus. |
@BewareMyPower How did you setup JMH to run your benchmarks that you shared? JMH results could be skewed for multiple reasons in this case as explained in my previous comment, #24457 (comment) |
@BewareMyPower This is an invalid JMH test case. In JMH, you need to use the "black hole" to consume computed values. @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Benchmark
public void testNoCache(Blackhole blackhole) {
for (int i = 0; i < 1000000; i++) {
blackhole.consume(new TopicName("test-" + i));
}
} |
Another detail is the short benchmark duration. A better choice could be to run a single longer iteration: |
…rences being collected
|
I improved the JMH benchmark and fixed some flaws in it. MacOS results (will soon run on Linux): |
|
Linux results |
|
Well, based on my improvement in #24463, I migrated the cache implementation of this PR. The benchmark ran on a ubuntu 22.04 runner in GitHub Actions, here is the result from See https://github.com/BewareMyPower/pulsar/actions/runs/15872123416/job/44750991782?pr=45: The result is similar to my benchmark on macOS: Both shows getting the topic name from cache is not significantly better than my latest optimization. Please leave comments directly in BewareMyPower#45 if you think there is something wrong with my benchmark code since you said
|
Great work @BewareMyPower . The main aspect why I wouldn't remove the cache is due to deduplication functionality of the cache. If TopicName instances are strongly referenced, the more there are, the worse the problem is.
The benchmark code looks fine now. On Mac OS, the results just could be different than on Linux x86_64 in many cases. Especially if there's any code that accesses |
…rentTimeMillis instead of nanoTime - postpone shrinking the cache until the maintenance task runs
|
@poorbarcode we can proceed with your PR and come back to the usefulness of a soft reference cache and instance deduplication in optimizing memory usage of topic pattern listing / watching use cases. |
|
A few observations if helpful,
|
|
Hi @ben-manes Thanks for your help. |
|
Closing since this is covered in other PRs. |

Fixes #24445
Motivation
There is currently a memory leak in Pulsar Proxy and Pulsar client caused by the TopicName cache. The TopicName cache grows without bounds and eventually consumes all available heap memory when there is high cardinality of topic names resolved during the runtime of a single JVM.
Before PR #23052, Guava's LoadingCache was used to limit the cache size to 100,000 entries. That was replaced with ConcurrentHashMap due to CPU overhead concerns with Guava compared to plain ConcurrentHashMap.
However, #23052 only addressed cache expiration for the broker component. Since the TopicName cache is included in the Pulsar client, it should be self-contained without requiring additional maintenance tasks to keep the cache size bounded.
Modifications
NameCacheclass that uses ConcurrentHashMap with SoftReference values to allow memory pressure-based garbage collection of unreferenced cache entriesKey Benefits
Performance Characteristics
Documentation
docdoc-requireddoc-not-neededdoc-complete