Skip to content

Sync and validate Notebook 5 benchmarking updates#6

Open
bernalde wants to merge 9 commits intomainfrom
qubonotebooks-pr9-benchmarking-sync
Open

Sync and validate Notebook 5 benchmarking updates#6
bernalde wants to merge 9 commits intomainfrom
qubonotebooks-pr9-benchmarking-sync

Conversation

@bernalde
Copy link
Copy Markdown
Member

Summary

  • sync the compatible Notebook 5 benchmarking changes from Completing Missing Impementations of Notebook 5 in Julia JuliaQUBO/QUBONotebooks#9
  • clean up the Python and Julia benchmarking notebook narrative so the markdown matches the implemented methodology
  • fix Notebook 5 reporting cells so the ensemble TTS summaries are computed correctly in both languages
  • refresh the saved notebook outputs after executing both notebooks end to end
  • verify that the Python and Julia benchmarking result objects match exactly for instance 42 and the 20-instance ensemble

Verification

  • QUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynb
  • compared the Python and Julia cached benchmarking metrics directly across t, p, tts/ttt, ttsci/tttci, best, bestci, min_energy, and random_energy
  • shared headline values after rerun:
    • instance 42 minimum TTS: 3.3120138340301897 s at 85 sweeps
    • ensemble median minimum TTS: 2.130041451027739 s at 112 sweeps

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Syncs and hardens the benchmarking notebook workflow by aligning Python/Julia notebook execution and updating the Julia Notebook 5 environment to match new benchmarking/reporting needs.

Changes:

  • Update verify_notebooks.py to execute Python notebooks via a generated local kernelspec and make the execution timeout configurable via QUIP_NOTEBOOK_TIMEOUT.
  • Update the Julia Notebook 5 environment to include additional direct dependencies and refresh the Manifest accordingly.

Reviewed changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 1 comment.

File Description
scripts/verify_notebooks.py Adds local Python kernelspec creation and configurable nbconvert timeout for more reproducible notebook execution.
notebooks_jl/envs/5-Benchmarking/Project.toml Adds new Julia deps needed by the benchmarking notebook updates.
notebooks_jl/envs/5-Benchmarking/Manifest.toml Refreshes the manifest to reflect the updated Project dependencies/artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member Author

@bernalde bernalde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR locally and ran the relevant checks on this branch:

  • .venv/bin/python -m unittest discover -s tests
  • ~/.local/bin/uv run --group docs python -m unittest discover -s tests
  • QUIP_NOTEBOOK_TIMEOUT=3600 .venv/bin/python ./scripts/verify_notebooks.py notebooks_py/5-Benchmarking_python.ipynb notebooks_jl/5-Benchmarking.ipynb
  • a direct Python-versus-Julia raw-cache comparison for sweeps 10, 85, 112, 500, and 1000 on instance 42
  • make verify-julia-colab-smokes

The Python suites and both Notebook 5 executions passed, and the checked raw benchmark artifacts still matched exactly. I found two changes that should be made before merge and recorded them inline below.

One additional notebook-specific note that GitHub would not accept as an inline comment because the notebooks_jl/5-Benchmarking.ipynb diff is too large: the Julia notebook introduction still says the benchmarked simulated annealing path is DWave.jl, but the parity-sensitive raw sampling code now goes through direct dwave.samplers calls via DWave.Neal.PythonCall. Please update that notebook prose before merge so it matches the implementation.

end

QuIPNotebookBootstrap.instantiate_scripts_project(precompile = false)
import IJulia
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently breaks make verify-julia-colab-smokes before any smoke test runs: Julia does not allow import inside a function body, so the file fails to parse with syntax: "import" expression not at top level. I reproduced that locally from this branch. If the goal here is to assert that IJulia is available after instantiate_scripts_project(...), please move the import to a top-level location or load it with Core.eval(Main, :(import IJulia)) after instantiation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bf4a31e. scripts/verify_julia_env_smokes.jl now loads IJulia with Core.eval(Main, :(import IJulia)) after instantiate_scripts_project(...), so make verify-julia-colab-smokes parses and runs again.

I also added regression coverage for this path in tests/test_verify_notebooks.py and in CI through .github/workflows/jupyter-book.yml.

self.assertNotIn("Random.seed!", julia_source)
self.assertIn("permutedims(reshape(example2_uniform_values", julia_source)

def test_julia_profile_cache_path_uses_direct_python_sampler(self) -> None:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only proves that the notebook source still contains the direct Python sampler hook. It does not exercise the cache-miss path at all. Because both notebooks default to overwrite_pickles = false and local notebooks_*/results directories are commonly present, the parity-sensitive sampling code can regress while local execution continues to pass on stale artifacts. Please add a small regression that forces a cache miss in a disposable results directory, or otherwise exercises the direct-sampler branch end to end.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bf4a31e. I kept the source-level check in tests/test_notebook5_notebook_sources.py, and added an end-to-end Julia cache-smoke that forces a cache miss in a disposable directory, writes the Notebook 5 JSON cache, and then verifies the cache-hit path on the second call in scripts/verify_notebook5_julia_cache_smoke.jl.

That smoke is now wired into Makefile and CI through .github/workflows/jupyter-book.yml, so this path is exercised without relying on the local results directories.

Copy link
Copy Markdown
Member Author

Addressed the notebook narrative note from review #4022535479 in bf4a31e. The Julia notebook introduction now states explicitly that the Julia workflow still uses DWave.jl, but the parity-sensitive raw benchmarking cache is temporarily routed through direct dwave.samplers calls via DWave.Neal.PythonCall while JuliaQUBO/DWave.jl issue #15 is being fixed upstream. The updated note is in notebooks_jl/5-Benchmarking.ipynb, and there is a regression check for that documentation in tests/test_notebook5_notebook_sources.py.

For the remaining non-actionable comments: the ReviewNB message and the Copilot overview are informational only, so there was no code change to make for them. The earlier Copilot timeout comment was already addressed in e9692ee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants