feat: v0.6.5 bidirectional CUDA IPC with Python exporter and receiver mode#3
Merged
feat: v0.6.5 bidirectional CUDA IPC with Python exporter and receiver mode#3
Conversation
…ation - Added cudaMemory() timing instrumentation to prove it's NOT the 8ms bottleneck (215-275us) - Implemented TD 2025+ modoutsidecook receiver path for Execute DAT direct import - Fixed receiver resolution delay by swapping callback order (import_frame before update_receiver_resolution) - Added backward compatibility for TD 2023 (force-cook fallback) - Updated TOX_BUILD_GUIDE.md with Step 6b for modoutsidecook setup - Added comprehensive stat files for sender and receiver analysis - Updated .toe file with latest component state - Updated package version in pyproject.toml Performance findings: - cudaMemory(): 215-275us regardless of resolution (1024² to 4K) - GPU D2D memcpy: 65us (1024²), 1045-1067us (4K) - TD Execute DAT overhead: ~8ms (resolution-independent, not optimizable) - Receiver modoutsidecook: 0.167ms vs 0.149ms force-cook - Resolution change delay: reduced from 3-4 frames to 1-2 frames Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…de, CI/CD and dev tooling
|
The pull request is too large to review automatically due to GitHub's line limit. Please consider breaking it into smaller PRs for a more effective review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CUDAIPCExporter(Python → TD direction) for sending AI-generated frames back to TouchDesigner via zero-copy CUDA IPCCUDAIPCExtensionTD component, enabling both directionsbuild_wheel.cmdfor PEP 517 wheel distribution (cuda_link-0.6.5-py3-none-any.whl)Changes
src/cuda_link/cuda_ipc_exporter.py— New Python-side exporter classtd_exporter/CUDAIPCExtension.py— Dual mode (Sender/Receiver) with ring buffertd_exporter/parexecute_callbacks.py— Mode parameter callbacktests/test_roundtrip.py— End-to-end Python↔TD roundtrip teststests/test_cuda_ipc_exporter_python.py— Python exporter unit testsbuild_wheel.cmd— Wheel builder scriptpyproject.toml— v0.6.5 with dev extrasTOXES/— All component versions v0.6.0–v0.6.5Test Plan
pytest tests/ -v -m "not requires_cuda"— protocol and unit tests (no GPU needed)pytest tests/ -v -m "requires_cuda"— CUDA integration tests (needs GPU)🤖 Generated with Claude Code