Sync and validate Notebook 5 benchmarking updates#6
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Pull request overview
Syncs and hardens the benchmarking notebook workflow by aligning Python/Julia notebook execution and updating the Julia Notebook 5 environment to match new benchmarking/reporting needs.
Changes:
- Update
verify_notebooks.pyto execute Python notebooks via a generated local kernelspec and make the execution timeout configurable viaQUIP_NOTEBOOK_TIMEOUT. - Update the Julia Notebook 5 environment to include additional direct dependencies and refresh the Manifest accordingly.
Reviewed changes
Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| scripts/verify_notebooks.py | Adds local Python kernelspec creation and configurable nbconvert timeout for more reproducible notebook execution. |
| notebooks_jl/envs/5-Benchmarking/Project.toml | Adds new Julia deps needed by the benchmarking notebook updates. |
| notebooks_jl/envs/5-Benchmarking/Manifest.toml | Refreshes the manifest to reflect the updated Project dependencies/artifacts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
bernalde
left a comment
There was a problem hiding this comment.
I reviewed the PR locally and ran the relevant checks on this branch:
.venv/bin/python -m unittest discover -s tests~/.local/bin/uv run --group docs python -m unittest discover -s testsQUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynb- a direct Python-versus-Julia raw-cache comparison for sweeps
10,85,112,500, and1000on instance42 make verify-julia-colab-smokes
The Python suites and both Notebook 5 executions passed, and the checked raw benchmark artifacts still matched exactly. I found two changes that should be made before merge and recorded them inline below.
One additional notebook-specific note that GitHub would not accept as an inline comment because the notebooks_jl/5-Benchmarking.ipynb diff is too large: the Julia notebook introduction still says the benchmarked simulated annealing path is DWave.jl, but the parity-sensitive raw sampling code now goes through direct dwave.samplers calls via DWave.Neal.PythonCall. Please update that notebook prose before merge so it matches the implementation.
scripts/verify_julia_env_smokes.jl
Outdated
| end | ||
|
|
||
| QuIPNotebookBootstrap.instantiate_scripts_project(precompile = false) | ||
| import IJulia |
There was a problem hiding this comment.
This currently breaks make verify-julia-colab-smokes before any smoke test runs: Julia does not allow import inside a function body, so the file fails to parse with syntax: "import" expression not at top level. I reproduced that locally from this branch. If the goal here is to assert that IJulia is available after instantiate_scripts_project(...), please move the import to a top-level location or load it with Core.eval(Main, :(import IJulia)) after instantiation.
There was a problem hiding this comment.
Fixed in bf4a31e. scripts/verify_julia_env_smokes.jl now loads IJulia with Core.eval(Main, :(import IJulia)) after instantiate_scripts_project(...), so make verify-julia-colab-smokes parses and runs again.
I also added regression coverage for this path in tests/test_verify_notebooks.py and in CI through .github/workflows/jupyter-book.yml.
| self.assertNotIn("Random.seed!", julia_source) | ||
| self.assertIn("permutedims(reshape(example2_uniform_values", julia_source) | ||
|
|
||
| def test_julia_profile_cache_path_uses_direct_python_sampler(self) -> None: |
There was a problem hiding this comment.
This only proves that the notebook source still contains the direct Python sampler hook. It does not exercise the cache-miss path at all. Because both notebooks default to overwrite_pickles = false and local notebooks_*/results directories are commonly present, the parity-sensitive sampling code can regress while local execution continues to pass on stale artifacts. Please add a small regression that forces a cache miss in a disposable results directory, or otherwise exercises the direct-sampler branch end to end.
There was a problem hiding this comment.
Fixed in bf4a31e. I kept the source-level check in tests/test_notebook5_notebook_sources.py, and added an end-to-end Julia cache-smoke that forces a cache miss in a disposable directory, writes the Notebook 5 JSON cache, and then verifies the cache-hit path on the second call in scripts/verify_notebook5_julia_cache_smoke.jl.
That smoke is now wired into Makefile and CI through .github/workflows/jupyter-book.yml, so this path is exercised without relying on the local results directories.
|
Addressed the notebook narrative note from review #4022535479 in bf4a31e. The Julia notebook introduction now states explicitly that the Julia workflow still uses For the remaining non-actionable comments: the ReviewNB message and the Copilot overview are informational only, so there was no code change to make for them. The earlier Copilot timeout comment was already addressed in e9692ee. |
Summary
Verification
QUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynbt,p,tts/ttt,ttsci/tttci,best,bestci,min_energy, andrandom_energy3.3120138340301897 sat85sweeps2.130041451027739 sat112sweeps