Sync and validate Notebook 5 benchmarking updates by bernalde · Pull Request #6 · SECQUOIA/QuIP

bernalde · 2026-03-25T01:58:47Z

Summary

sync the compatible Notebook 5 benchmarking changes from Completing Missing Impementations of Notebook 5 in Julia JuliaQUBO/QUBONotebooks#9
clean up the Python and Julia benchmarking notebook narrative so the markdown matches the implemented methodology
fix Notebook 5 reporting cells so the ensemble TTS summaries are computed correctly in both languages
refresh the saved notebook outputs after executing both notebooks end to end
verify that the Python and Julia benchmarking result objects match exactly for instance 42 and the 20-instance ensemble

Verification

QUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynb
compared the Python and Julia cached benchmarking metrics directly across t, p, tts/ttt, ttsci/tttci, best, bestci, min_energy, and random_energy
shared headline values after rerun:
- instance 42 minimum TTS: 3.3120138340301897 s at 85 sweeps
- ensemble median minimum TTS: 2.130041451027739 s at 112 sweeps

review-notebook-app · 2026-03-25T01:58:52Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Copilot

Pull request overview

Syncs and hardens the benchmarking notebook workflow by aligning Python/Julia notebook execution and updating the Julia Notebook 5 environment to match new benchmarking/reporting needs.

Changes:

Update verify_notebooks.py to execute Python notebooks via a generated local kernelspec and make the execution timeout configurable via QUIP_NOTEBOOK_TIMEOUT.
Update the Julia Notebook 5 environment to include additional direct dependencies and refresh the Manifest accordingly.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.

File	Description
scripts/verify_notebooks.py	Adds local Python kernelspec creation and configurable nbconvert timeout for more reproducible notebook execution.
notebooks_jl/envs/5-Benchmarking/Project.toml	Adds new Julia deps needed by the benchmarking notebook updates.
notebooks_jl/envs/5-Benchmarking/Manifest.toml	Refreshes the manifest to reflect the updated Project dependencies/artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scripts/verify_notebooks.py

bernalde

I reviewed the PR locally and ran the relevant checks on this branch:

.venv/bin/python -m unittest discover -s tests
~/.local/bin/uv run --group docs python -m unittest discover -s tests
QUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynb
a direct Python-versus-Julia raw-cache comparison for sweeps 10, 85, 112, 500, and 1000 on instance 42
make verify-julia-colab-smokes

The Python suites and both Notebook 5 executions passed, and the checked raw benchmark artifacts still matched exactly. I found two changes that should be made before merge and recorded them inline below.

One additional notebook-specific note that GitHub would not accept as an inline comment because the notebooks_jl/5-Benchmarking.ipynb diff is too large: the Julia notebook introduction still says the benchmarked simulated annealing path is DWave.jl, but the parity-sensitive raw sampling code now goes through direct dwave.samplers calls via DWave.Neal.PythonCall. Please update that notebook prose before merge so it matches the implementation.

bernalde · 2026-03-27T17:10:25Z

scripts/verify_julia_env_smokes.jl

    end

+    QuIPNotebookBootstrap.instantiate_scripts_project(precompile = false)
+    import IJulia


This currently breaks make verify-julia-colab-smokes before any smoke test runs: Julia does not allow import inside a function body, so the file fails to parse with syntax: "import" expression not at top level. I reproduced that locally from this branch. If the goal here is to assert that IJulia is available after instantiate_scripts_project(...), please move the import to a top-level location or load it with Core.eval(Main, :(import IJulia)) after instantiation.

Fixed in bf4a31e. scripts/verify_julia_env_smokes.jl now loads IJulia with Core.eval(Main, :(import IJulia)) after instantiate_scripts_project(...), so make verify-julia-colab-smokes parses and runs again.

I also added regression coverage for this path in tests/test_verify_notebooks.py and in CI through .github/workflows/jupyter-book.yml.

bernalde · 2026-03-27T17:10:25Z

tests/test_notebook5_notebook_sources.py

+        self.assertNotIn("Random.seed!", julia_source)
+        self.assertIn("permutedims(reshape(example2_uniform_values", julia_source)
+
+    def test_julia_profile_cache_path_uses_direct_python_sampler(self) -> None:


This only proves that the notebook source still contains the direct Python sampler hook. It does not exercise the cache-miss path at all. Because both notebooks default to overwrite_pickles = false and local notebooks_*/results directories are commonly present, the parity-sensitive sampling code can regress while local execution continues to pass on stale artifacts. Please add a small regression that forces a cache miss in a disposable results directory, or otherwise exercises the direct-sampler branch end to end.

Fixed in bf4a31e. I kept the source-level check in tests/test_notebook5_notebook_sources.py, and added an end-to-end Julia cache-smoke that forces a cache miss in a disposable directory, writes the Notebook 5 JSON cache, and then verifies the cache-hit path on the second call in scripts/verify_notebook5_julia_cache_smoke.jl.

That smoke is now wired into Makefile and CI through .github/workflows/jupyter-book.yml, so this path is exercised without relying on the local results directories.

bernalde · 2026-03-27T17:24:35Z

Addressed the notebook narrative note from review #4022535479 in bf4a31e. The Julia notebook introduction now states explicitly that the Julia workflow still uses DWave.jl, but the parity-sensitive raw benchmarking cache is temporarily routed through direct dwave.samplers calls via DWave.Neal.PythonCall while JuliaQUBO/DWave.jl issue #15 is being fixed upstream. The updated note is in notebooks_jl/5-Benchmarking.ipynb, and there is a regression check for that documentation in tests/test_notebook5_notebook_sources.py.

For the remaining non-actionable comments: the ReviewNB message and the Copilot overview are informational only, so there was no code change to make for them. The earlier Copilot timeout comment was already addressed in e9692ee.

bernalde added 3 commits March 24, 2026 18:50

Sync benchmarking notebooks with QUBONotebooks PR 9

397c0d7

Clean up Notebook 5 execution paths

d0d75af

Review Notebook 5 benchmarking content

858b68e

bernalde requested a review from Copilot March 25, 2026 01:59

Copilot started reviewing on behalf of bernalde March 25, 2026 02:00 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

scripts/verify_notebooks.py Outdated Show resolved Hide resolved

bernalde added 5 commits March 24, 2026 22:07

Validate notebook timeout configuration

e9692ee

Align Notebook 5 benchmarking outputs

2502152

Polish Notebook 5 benchmarking review comments

4e34b20

Add numpy to docs test environment

bccb93f

Align Notebook 5 benchmark profiles across Julia and Python

b1246ff

bernalde mentioned this pull request Mar 26, 2026

Neal optimizer path does not match direct dwave.samplers output for the same Ising instance and seed JuliaQUBO/DWave.jl#15

Closed

bernalde commented Mar 27, 2026

View reviewed changes

Fix Julia notebook smoke coverage

bf4a31e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync and validate Notebook 5 benchmarking updates#6

Sync and validate Notebook 5 benchmarking updates#6
bernalde wants to merge 9 commits intomainfrom
qubonotebooks-pr9-benchmarking-sync

bernalde commented Mar 25, 2026

Uh oh!

review-notebook-app bot commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

bernalde left a comment

Uh oh!

bernalde Mar 27, 2026

Uh oh!

bernalde Mar 27, 2026

Uh oh!

bernalde Mar 27, 2026

Uh oh!

bernalde Mar 27, 2026

Uh oh!

bernalde commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernalde commented Mar 25, 2026

Summary

Verification

Uh oh!

review-notebook-app bot commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

bernalde left a comment

Choose a reason for hiding this comment

Uh oh!

bernalde Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

bernalde Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

bernalde Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

bernalde Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

bernalde commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants