-
Notifications
You must be signed in to change notification settings - Fork 244
gcp serverless structured logging #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduces a separate structured logging system for GCP serverless deployments that captures model lifecycle events to understand cache hit rates, model thrashing, and memory usage patterns. New module: inference/core/gcp_logging/ - events.py: 7 event dataclasses (request_received, workflow_request_received, model_cache_status, model_loaded_to_disk, model_loaded_to_memory, model_evicted, inference_completed) - context.py: Request context tracking via ContextVar - memory.py: Tiered memory measurement (basic vs detailed mode) - logger.py: GCPServerlessLogger singleton with sampling support Environment variables: - GCP_LOGGING_ENABLED: Master switch (default: True when GCP_SERVERLESS) - GCP_LOGGING_SAMPLE_RATE: Sample rate for high-volume events (default: 1.0) - GCP_LOGGING_DETAILED_MEMORY: Enable detailed memory introspection (default: False) Instrumentation points: - HTTP middleware: Sets up request context with correlation ID - Workflow executor: Adds step context for workflow invocations - ModelManager: Logs cache status, model loads, and inference completion - Fixed size cache: Logs eviction events with lifetime and inference count - Roboflow model: Logs disk download events Features: - Tiered sampling: High-volume events sampled, lifecycle events always logged - Tiered memory: Basic mode (cheap) vs detailed mode (GPU sync + syscalls) - Context propagation: request_id, workflow_instance_id, step_name flow through - Cache hit tracking: inference_completed includes whether model was pre-loaded Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When USE_INFERENCE_EXP_MODELS is enabled, the model is loaded through
the InferenceExpObjectDetectionModelAdapter which wraps the inference_models
AutoModel. This change detects that adapter pattern and:
- Sets backend to "inference-models" or "inference-models/{backend}"
- Extracts device info from the underlying exp_model
This ensures ModelLoadedToMemoryEvent correctly identifies models loaded
through the inference-models path vs the legacy ONNX path.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When USE_INFERENCE_EXP_MODELS is enabled, models are loaded via the inference-models library's AutoModel.from_pretrained(). This path bypasses the legacy download code, so ModelLoadedToDiskEvent wasn't being logged. This adds GCPLoggingModelAccessManager - a custom ModelAccessManager that hooks into inference-models download callbacks: - on_model_package_access_granted: Start tracking download - on_file_created: Accumulate file sizes and count - on_model_loaded: Log ModelLoadedToDiskEvent with totals The adapter only creates the access manager when GCP logging is enabled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| """ | ||
| if not api_key: | ||
| return None | ||
| return hashlib.sha256(api_key.encode()).hexdigest()[:16] |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Sensitive data (password)
Sensitive data (password)
Sensitive data (password)
Sensitive data (password)
Sensitive data (password)
Sensitive data (password)
Sensitive data (password)
Copilot Autofix
AI 3 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
|
bugbot review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| logger.debug(f"Marking new model {queue_id} as most recently used.") | ||
| self._key_queue.append(queue_id) | ||
| # Track load time for the new model | ||
| self._model_load_times[queue_id] = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak in model load time tracking when logging disabled
Medium Severity
The _model_load_times dictionary is populated unconditionally at line 173 (self._model_load_times[queue_id] = time.time()) for every new model added, but the cleanup logic at line 163 (self._model_load_times.pop(to_remove_model_id, None)) only executes when gcp_logger.enabled is True. When GCP logging is disabled (the default when GCP_SERVERLESS=False), entries are added but never removed during eviction, causing the dictionary to grow unboundedly over the lifetime of the process.
Summary
Adds a structured logging system for GCP serverless deployments that captures model lifecycle events for observability and analysis. This enables understanding cache hit rates, model thrashing patterns, and memory usage in production.
Key features:
Event types:
Configuration (env vars):
Test plan
Note
Adds a self-contained GCP structured logging system and wires it through key inference paths for observability in serverless deployments.
inference/core/gcp_logging/module: JSON logger, event dataclasses, request context, and memory utilities; configurable viaGCP_LOGGING_ENABLED,GCP_LOGGING_SAMPLE_RATE,GCP_LOGGING_DETAILED_MEMORYenv.py: adds GCP logging env flags (enabled by default whenGCP_SERVERLESS)GCPServerlessMiddlewaresets/clears per-requestGCPRequestContext; logsrequest_receivedandworkflow_request_received(sampled)model_cache_status(sampled),model_loaded_to_memory(always), andinference_completed(sampled); measures GPU/system memory when detailed mode enabledmodel_evictedwith reason, lifetime, inference count; tracks model load timesmodel_loaded_to_diskfrom both legacyroboflow.pypath andinference_modelsviaGCPLoggingModelAccessManagerstep_name,workflow_instance_id) for correlated eventsWritten by Cursor Bugbot for commit 974b343. This will update automatically on new commits. Configure here.