fix(batcher): replace static global with dynamic allocation and refcounting#1556
Merged
fix(batcher): replace static global with dynamic allocation and refcounting#1556
Conversation
Collaborator
Author
|
@sentry review |
Collaborator
Author
|
@cursor review |
Collaborator
Author
|
@sentry review |
Collaborator
Author
|
@cursor review |
Collaborator
Author
|
@sentry review |
Collaborator
Author
|
@cursor review |
Collaborator
Author
|
@sentry review |
Collaborator
Author
|
@cursor review |
Spawns 8 producer threads continuously logging while the main thread does 10 cycles of sentry_init/sentry_close. Reproduces a condvar corruption hang in the batcher under TSan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…unting The static g_batcher persisted across init/close cycles, causing condvar corruption when sentry__batcher_startup re-initialized the condvar while old threads still used it. Dynamic allocation with refcounting solves both the condvar corruption and the lifetime problem. Introduces sentry_batcher_ref_t with acquire/release/swap API that encapsulates a spinlock to make concurrent access from producer threads and shutdown safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the previous startup failed to spawn its thread, shutdown_begin returns false and shutdown_wait is never called. The old batcher stays in g_batcher and leaks when startup swaps it out without releasing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
A producer that acquired a ref before shutdown could enqueue items after the final flush. These items were leaked when the batcher was freed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sentry__cond_destroy for all platforms (pthread_cond_destroy on POSIX, CloseHandle on pre-Vista Windows, no-op on Vista+) and call it in sentry__batcher_release. Without this, each SDK re-initialization leaks kernel handles on pre-Vista Windows and violates POSIX which requires pthread_cond_destroy before freeing memory containing a pthread_cond_t. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 467bfb5ecb2c4528e1048dab1c467203fdc29cdf.
The two-phase shutdown (begin/wait) was designed to signal both the logs and metrics threads in parallel before joining either. With #1558 replacing the condvar with a level-triggered waitable flag, thread wake latency drops to sub-millisecond (futex/WaitOnAddress), making the parallel signaling unnecessary. Merging into a single function also fixes a race where a concurrent sentry_init() could replace g_batcher between the signal and join steps, causing the new batcher to be shut down instead of the old one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a3738d5 to
ba0b566
Compare
Collaborator
Author
|
@cursor review |
The crash-safe flush functions called sentry__batcher_acquire, which spins on a spinlock. If the crash occurs on the thread holding that lock, the signal handler deadlocks. Replace with a new lock-free sentry__batcher_peek that reads ref->ptr via atomic load without taking the spinlock or bumping the refcount (safe because the process is dying). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On Windows x64, `long` is 32-bit but pointers are 64-bit. Use InterlockedCompareExchangePointer on Windows and __atomic_load elsewhere to avoid truncating the pointer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
force_flush_begin and force_flush_wait each acquired the batcher independently. A concurrent sentry_init could swap the batcher between the two calls, causing wait to flush the new empty batcher while the original data is lost. Fix by returning an opaque token (the acquired batcher ref) from begin and passing it to wait, ensuring both operate on the same instance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
Author
|
@cursor review |
mujacica
approved these changes
Mar 7, 2026
supervacuus
approved these changes
Mar 9, 2026
Collaborator
supervacuus
left a comment
There was a problem hiding this comment.
This looks very clean! Only minor comments.
Co-authored-by: Mischan Toosarani-Hausberger <mischan@abovevacant.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Metrics startup swaps NULL batcher on allocation failure
- Updated metrics startup to return early on batcher allocation failure and to shut down any swapped-out batcher before releasing it.
Or push these changes by commenting:
@cursor push e977efb2bc
Preview (e977efb2bc)
diff --git a/src/sentry_metrics.c b/src/sentry_metrics.c
--- a/src/sentry_metrics.c
+++ b/src/sentry_metrics.c
@@ -133,10 +133,17 @@
{
sentry_batcher_t *batcher
= sentry__batcher_new(sentry__envelope_add_metrics);
- if (batcher) {
- sentry__batcher_startup(batcher, options);
+ if (!batcher) {
+ SENTRY_WARN("failed to allocate metrics batcher");
+ return;
}
- sentry__batcher_release(sentry__batcher_swap(&g_batcher, batcher));
+
+ sentry__batcher_startup(batcher, options);
+ sentry_batcher_t *old = sentry__batcher_swap(&g_batcher, batcher);
+ if (old) {
+ sentry__batcher_shutdown(old, 0);
+ }
+ sentry__batcher_release(old);
}
voidCo-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This PR adds a new stress test that spawns multiple producer threads continuously logging while the main thread repeatedly re-inits the SDK, and fixes the resulting issue flagged by TSan: