gcp serverless structured logging #1930

hansent · 2026-01-20T19:57:45Z

Summary

Adds a structured logging system for GCP serverless deployments that captures model lifecycle events for observability and analysis. This enables understanding cache hit rates, model thrashing patterns, and memory usage in production.

Key features:

JSON structured logs compatible with GCP Cloud Logging
7 event types covering the full model lifecycle
Tiered sampling to control log volume/costs (high-volume events sampled, lifecycle events always logged)
Tiered memory measurement (basic vs detailed mode)
Request context propagation (correlates events across workflow steps)
Support for both legacy model loading and inference-models paths

Event types:

request_received - Direct inference request (sampled)
workflow_request_received - Workflow execution started (sampled)
model_cache_status - Cache hit/miss per request (sampled)
model_loaded_to_disk - Model artifacts downloaded (always logged)
model_loaded_to_memory - Model loaded to GPU/CPU (always logged)
model_evicted - Model removed from cache (always logged)
inference_completed - Inference finished with timing (sampled)

Configuration (env vars):

GCP_LOGGING_ENABLED - Master switch (default: True when GCP_SERVERLESS)
GCP_LOGGING_SAMPLE_RATE - Sample rate for high-volume events (default: 1.0)
GCP_LOGGING_DETAILED_MEMORY - Enable full memory snapshots (default: False)

Test plan

Run server locally with GCP_SERVERLESS=true GCP_LOGGING_ENABLED=true and verify JSON logs appear in stdout
Make direct inference request → verify request_received, model_cache_status, model_loaded_to_memory, inference_completed events
Make second request to same model → verify cache_hit: true in events
Execute workflow → verify workflow_request_received and events include workflow_instance_id, step_name
Test with USE_INFERENCE_EXP_MODELS=true → verify model_loaded_to_disk event logged via callback
Test with GCP_LOGGING_DETAILED_MEMORY=true → verify memory snapshot fields populated
Test with GCP_LOGGING_SAMPLE_RATE=0.1 → verify ~10% of high-volume events logged
Verify logging disabled when GCP_SERVERLESS=false

Note

Adds a self-contained GCP structured logging system and wires it through key inference paths for observability in serverless deployments.

New inference/core/gcp_logging/ module: JSON logger, event dataclasses, request context, and memory utilities; configurable via GCP_LOGGING_ENABLED, GCP_LOGGING_SAMPLE_RATE, GCP_LOGGING_DETAILED_MEMORY
env.py: adds GCP logging env flags (enabled by default when GCP_SERVERLESS)
HTTP: GCPServerlessMiddleware sets/clears per-request GCPRequestContext; logs request_received and workflow_request_received (sampled)
Model manager: logs model_cache_status (sampled), model_loaded_to_memory (always), and inference_completed (sampled); measures GPU/system memory when detailed mode enabled
Fixed-size cache: logs model_evicted with reason, lifetime, inference count; tracks model load times
Model downloads: logs model_loaded_to_disk from both legacy roboflow.py path and inference_models via GCPLoggingModelAccessManager
Workflows: updates context per step (step_name, workflow_instance_id) for correlated events

^{Written by Cursor Bugbot for commit 974b343. This will update automatically on new commits. Configure here.}

Introduces a separate structured logging system for GCP serverless deployments that captures model lifecycle events to understand cache hit rates, model thrashing, and memory usage patterns. New module: inference/core/gcp_logging/ - events.py: 7 event dataclasses (request_received, workflow_request_received, model_cache_status, model_loaded_to_disk, model_loaded_to_memory, model_evicted, inference_completed) - context.py: Request context tracking via ContextVar - memory.py: Tiered memory measurement (basic vs detailed mode) - logger.py: GCPServerlessLogger singleton with sampling support Environment variables: - GCP_LOGGING_ENABLED: Master switch (default: True when GCP_SERVERLESS) - GCP_LOGGING_SAMPLE_RATE: Sample rate for high-volume events (default: 1.0) - GCP_LOGGING_DETAILED_MEMORY: Enable detailed memory introspection (default: False) Instrumentation points: - HTTP middleware: Sets up request context with correlation ID - Workflow executor: Adds step context for workflow invocations - ModelManager: Logs cache status, model loads, and inference completion - Fixed size cache: Logs eviction events with lifetime and inference count - Roboflow model: Logs disk download events Features: - Tiered sampling: High-volume events sampled, lifecycle events always logged - Tiered memory: Basic mode (cheap) vs detailed mode (GPU sync + syscalls) - Context propagation: request_id, workflow_instance_id, step_name flow through - Cache hit tracking: inference_completed includes whether model was pre-loaded Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When USE_INFERENCE_EXP_MODELS is enabled, the model is loaded through the InferenceExpObjectDetectionModelAdapter which wraps the inference_models AutoModel. This change detects that adapter pattern and: - Sets backend to "inference-models" or "inference-models/{backend}" - Extracts device info from the underlying exp_model This ensures ModelLoadedToMemoryEvent correctly identifies models loaded through the inference-models path vs the legacy ONNX path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When USE_INFERENCE_EXP_MODELS is enabled, models are loaded via the inference-models library's AutoModel.from_pretrained(). This path bypasses the legacy download code, so ModelLoadedToDiskEvent wasn't being logged. This adds GCPLoggingModelAccessManager - a custom ModelAccessManager that hooks into inference-models download callbacks: - on_model_package_access_granted: Start tracking download - on_file_created: Accumulate file sizes and count - on_model_loaded: Log ModelLoadedToDiskEvent with totals The adapter only creates the access manager when GCP logging is enabled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

inference/core/gcp_logging/context.py

+    """
+    if not api_key:
+        return None
+    return hashlib.sha256(api_key.encode()).hexdigest()[:16]


grzegorz-roboflow · 2026-01-22T08:57:36Z

bugbot review

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-22T09:07:44Z

inference/core/managers/decorators/fixed_size_cache.py

            logger.debug(f"Marking new model {queue_id} as most recently used.")
            self._key_queue.append(queue_id)
+            # Track load time for the new model
+            self._model_load_times[queue_id] = time.time()


Memory leak in model load time tracking when logging disabled

Medium Severity

The _model_load_times dictionary is populated unconditionally at line 173 (self._model_load_times[queue_id] = time.time()) for every new model added, but the cleanup logic at line 163 (self._model_load_times.pop(to_remove_model_id, None)) only executes when gcp_logger.enabled is True. When GCP logging is disabled (the default when GCP_SERVERLESS=False), entries are added but never removed during eviction, causing the dictionary to grow unboundedly over the lifetime of the process.

Additional Locations (1)

inference/core/managers/decorators/fixed_size_cache.py#L161-L163

hansent and others added 3 commits January 20, 2026 13:39

github-advanced-security bot found potential problems Jan 20, 2026

View reviewed changes

cursor bot reviewed Jan 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcp serverless structured logging #1930

gcp serverless structured logging #1930

hansent commented Jan 20, 2026 •

edited by cursor bot

Loading

Uh oh!

Check failure

Copilot Autofix

grzegorz-roboflow commented Jan 22, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gcp serverless structured logging #1930

Are you sure you want to change the base?

gcp serverless structured logging #1930

Conversation

hansent commented Jan 20, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

grzegorz-roboflow commented Jan 22, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 22, 2026

Choose a reason for hiding this comment

Memory leak in model load time tracking when logging disabled

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hansent commented Jan 20, 2026 •

edited by cursor bot

Loading