Split run identity from configuration by frazane · Pull Request #122 · MeteoSwiss/evalml

frazane · 2026-03-26T09:20:53Z

Summary

Separates environment identity from run configuration to allow inference environments to be reused across configuration changes, eliminating unnecessary rebuilds of venv and squashfs images. Closes #111

Changes

Add ENV_FIELDS and HASH_EXCLUDE ClassVars to RunConfig documenting the identity contract
Split hashing logic: env_entry_hash() for environment-level changes, run_specific_hash() for configuration changes
Refactor register_run() to compute both env_id and run_id with nested directory structure: data/runs/{env_id}/{config_hash}/
Update inference rules to use {env_id} wildcard for environment artifacts (in data/runs/{env_id}/) and {run_id} for run outputs
Add ENV_CONFIGS global dict and collect_all_envs() function
Add comprehensive unit tests for identity separation

Benefits

Reuses environments across config changes (no squashfs rebuild)
Reduces disk I/O burden on shared filesystems
Clear separation of concerns: environment identity vs. run configuration
Nested directory structure aligns with the proposed design in issue

Testing

All existing tests pass
5 new tests verify identity separation behavior

Separates environment identity (env_id) from run configuration (run_id) to allow inference environments to be reused across configuration changes. This prevents unnecessary rebuilding of venv and squashfs images when only the inference config YAML or steps are modified. Changes: src/evalml/config.py: - Add RunConfig.ENV_FIELDS ClassVar documenting fields that determine the inference environment (checkpoint, extra_requirements, disable_local_eccodes_definitions) - Add RunConfig.HASH_EXCLUDE ClassVar for fields never included in hashing (label, inference_resources) - Export module-level constants RUN_ENV_FIELDS and RUN_HASH_EXCLUDE workflow/rules/common.smk: - Add ENV_HASH_FIELDS and RUN_HASH_EXCLUDE constants - Split hashing logic into two functions: - env_entry_hash(): hashes only environment-determining fields - run_specific_hash(): hashes run-specific fields (config YAML, steps) - Refactor register_run() to compute and store both env_id and run_id in each run config entry. Format: run_id = {env_id}/{config_hash} - Add collect_all_envs() function and ENV_CONFIGS global dict - Update master_hash() to hash both env and run components separately workflow/rules/inference.smk: - Rules using {env_id} wildcard (outputs in data/envs/{env_id}/): - prepare_checkpoint - extract_checkpoint_requirements - create_inference_venv - make_squashfs_image - Rules using {run_id} wildcard with nested config directories: - prepare_inference_forecaster - prepare_inference_interpolator - execute_inference (references env via lookup) - create_inference_sandbox Directory structure change: - Environment artifacts: data/envs/{env_id}/ - Run-specific outputs: data/runs/{env_id}/{config_hash}/{init_time}/ Benefits: - Reuses environments across config changes (no squashfs rebuild) - Reduces disk I/O on shared filesystems - Documents identity contract via ClassVars - Nested directory structure clearly separates concerns Tests: - Add test_run_identity.py with 5 tests validating identity separation - All existing tests pass Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

frazane requested a review from dnerini March 26, 2026 09:20

frazane changed the title ~~Split run identity from configuration (issue #111)~~ Split run identity from configuration Mar 26, 2026

do not use separate env directory

c689c99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split run identity from configuration#122

Split run identity from configuration#122
frazane wants to merge 2 commits intomainfrom
feat/split-run-identity-from-config

frazane commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frazane commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benefits

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frazane commented Mar 26, 2026 •

edited

Loading