Skip to content

Add cache usage properties to ChatSampler and warning for cache overflow in prefill#578

Open
Ayushhgit wants to merge 3 commits intogoogle-deepmind:mainfrom
Ayushhgit:feature/chat-cache-observability
Open

Add cache usage properties to ChatSampler and warning for cache overflow in prefill#578
Ayushhgit wants to merge 3 commits intogoogle-deepmind:mainfrom
Ayushhgit:feature/chat-cache-observability

Conversation

@Ayushhgit
Copy link
Copy Markdown

This PR improves cache observability and developer feedback in the sampling pipeline.

Changes:

  • Added cache_used property to ChatSampler to expose the current number of tokens stored in the KV cache.
  • Added cache_used_frac property (0.0–1.0) to indicate relative cache utilization.
  • Replaced a TODO in _prefill.py with a logging.warning that triggers when the prompt length exceeds the configured cache_length.

Motivation:
These additions make it easier to monitor cache usage during long-running conversations and detect potential cache overflow before generation errors occur.

No functional behavior of sampling or prefill logic was modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant