feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment by forkni · Pull Request #6 · forkni/cuda-link

forkni · 2026-03-21T06:24:54Z

Summary

fix: Replace dynamic Null TOP format converter with permanent dtype_converter Transform TOP — eliminates per-frame oscillation and same-cook-frame failure for rgba16float sources in TD 2025 (CUDA 12.8)
fix: Auto-detect unsupported float16 pixel formats from source TOP (upstream of converter), skip one transition frame, then operate stably without oscillation
feat: Bump version to v0.7.0 across all canonical sources (pyproject.toml, __init__.py, wheel filenames, tox filename)
docs: Add dtype_converter Transform TOP to TOX Build Guide component structure and assembly steps
docs: Add missing files (parexecute_callbacks.py, script_top_callbacks.py, benchmark_timestamp.py) to file reference table
docs: Fix tox save path (TOXES/ subfolder), CUDA DLL error message (cudart64_110.dll)
docs: Add uint16 dtype_code (3) to ARCHITECTURE.md protocol table; add float16 format compatibility note

Changes

Commit	Description
`ddd8f0f`	fix: improve dtype detection, add metadata-only re-init, optimize sync event
`14a6d26`	fix: replace dynamic Null TOP format converter with permanent dtype_converter Transform TOP
`1d8823a`	feat: bump version to v0.7.0, align docs with dtype_converter and format auto-conversion

Test plan

Verify from cuda_link import __version__ returns "0.7.0"
Verify no 0.6.9 references remain in committed files
Verify TOXES/CUDAIPCLink_v0.7.0.tox is present
In TouchDesigner: feed rgba16float source → confirm one-frame skip + stable operation, no oscillation
In TouchDesigner: feed rgba32float source → confirm dtype_converter stays at useinput
Run test suite: pytest tests/ -v -m "not requires_cuda"

🤖 Generated with Claude Code

…c event

…onverter Transform TOP

…mat auto-conversion

github-actions · 2026-03-21T06:25:03Z

🤖 Hi @forkni, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-03-21T06:26:11Z

🤖 I'm sorry @forkni, but I was unable to process your request. Please see the logs for more details.

charliecreates

The new uint16 Torch fallback to torch.int16 will silently corrupt values for half the range and should not ship as-is. The sender-side dtype_converter toggling looks like it can be bypassed depending on what TOP is passed into export_frame(), which can make float16 sources fail permanently. Exception handling is overly broad (BaseException) and shutdown paths for pinned memory cleanup should be more defensive. Docs should explicitly connect the new ExportBuffer/dtype_converter graph to what node is actually exported.

Additional notes (1)

Compatibility | src/cuda_link/cuda_ipc_importer.py:171-176
The uint16 fallback to torch.int16 is silently wrong for consumers: values >= 32768 will become negative, so downstream code will see different numerical values even though the bit-width matches. This is a correctness issue and will be very hard to diagnose because it “works” but produces corrupted semantics.

Summary of changes

Bumped release artifacts and docs from v0.6.8 → v0.7.0 (README, build script, pyproject.toml, __init__, .gitignore) and added TOXES/CUDAIPCLink_v0.7.0.tox.
Added uint16 support to the shared-memory protocol/docs and importer backends (CuPy + __cuda_array_interface__ + Torch dtype mapping).
Introduced a non-timing CUDA event helper (create_sync_event()) and switched exporter cross-stream sync to use it.
Updated TouchDesigner sender to use a permanent dtype_converter Transform TOP for auto pixel-format conversion (notably float16 → rgba32float) and added receiver-side pinned host buffer optimization for float16 D2H conversion.

Docs alignment

Updated TOX build guide to include dtype_converter, ExportBuffer, and additional callback files; fixed TOX save path and CUDA DLL troubleshooting details.

charliecreates · 2026-03-21T06:30:22Z

+            # Ensure CUDA runtime and stream exist BEFORE first cudaMemory() call.
+            # Always use a non-blocking stream (never None/default stream) for TD 2025 compat.
+            if self.cuda is None:
+                self.cuda = get_cuda_runtime()
+            if not hasattr(self, "ipc_stream") or self.ipc_stream is None:
+                self.ipc_stream = self.cuda.create_stream(flags=0x01)  # cudaStreamNonBlocking
+                self._log(
+                    f"Created IPC stream (pre-init): 0x{int(self.ipc_stream.value):016x}",
+                    force=True,
+                )
+
+            # TD 2025 rejects float16 pixel formats from cudaMemory().
+            # dtype_converter Transform TOP sits before ExportBuffer — toggle its format param.
+            # Check source TOP (upstream of converter), not ExportBuffer (downstream).
+            effective_top = top_op
+            fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME)
+            if fmt_transform is not None:
+                source_top = fmt_transform.inputs[0] if fmt_transform.inputs else top_op
+                if self._needs_format_conversion(source_top):
+                    if not self._fmt_conv_active:
+                        fmt_transform.par.format = "rgba32float"
+                        self._fmt_conv_active = True
+                        self._log(
+                            f"Pixel format '{getattr(source_top, 'pixelFormat', '?')}' unsupported "
+                            f"by cudaMemory() — dtype_converter set to rgba32float, skipping frame",
+                            force=True,
+                        )
+                        return False  # dtype_converter cooks next frame
+                else:
+                    if self._fmt_conv_active:
+                        fmt_transform.par.format = "useinput"
+                        self._fmt_conv_active = False
+                        self._log(
+                            "Source format CUDA-compatible — dtype_converter set to useinput",
+                            force=True,
+                        )
+                        return False  # format reverts next cook
+
+            # Time cudaMemory() call (OpenGL→CUDA interop)
            if self.verbose_performance:
                cuda_mem_start = time.perf_counter()

-            # Get TOP's CUDA memory (pass stream for proper synchronization per TD docs)
-            cuda_mem = top_op.cudaMemory(
-                stream=int(self.ipc_stream.value) if self._initialized else None,
-            )
+            # Get TOP's CUDA memory — always pass a valid stream (never None)
+            try:
+                cuda_mem = effective_top.cudaMemory(
+                    stream=int(self.ipc_stream.value),
+                )
+            except BaseException as cuda_err:


dtype_converter is toggled based on the upstream source format, but cudaMemory() is still taken from effective_top which remains top_op. If callers pass the input TOP (as many TD callback templates do), the conversion will never be used and float16 sources will continue to fail indefinitely. The exporter should consistently pull CUDA memory from the post-conversion node inside the component (e.g., ExportBuffer or dtype_converter) whenever it exists, rather than trusting the caller-provided top_op.

Suggestion

Resolve the export TOP internally when dtype_converter (and/or ExportBuffer) exists, and use that for cudaMemory().

fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME) export_top = self.ownerComp.op("ExportBuffer") or fmt_transform effective_top = export_top or top_op source_top = fmt_transform.inputs[0] if (fmt_transform and fmt_transform.inputs) else top_op # ...format detection/toggling based on source_top... cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value))

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

charliecreates · 2026-03-21T06:30:22Z

+            try:
+                cuda_mem = effective_top.cudaMemory(
+                    stream=int(self.ipc_stream.value),
+                )
+            except BaseException as cuda_err:
+                pixel_fmt = getattr(effective_top, "pixelFormat", "unknown")
+                err_msg = f"cudaMemory() failed (pixelFormat={pixel_fmt}): {cuda_err}"
+                if err_msg != getattr(self, "_last_cuda_mem_err", ""):
+                    self._log(err_msg, force=True)
+                    self._last_cuda_mem_err = err_msg
+                return False


Catching BaseException around cudaMemory() will also swallow KeyboardInterrupt/SystemExit, making TD sessions harder to stop/debug and potentially hiding fatal shutdown conditions. This should catch Exception (or specific TD/CUDA exceptions) instead.

Suggestion

Narrow the exception handling to Exception.

try: cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value)) except Exception as cuda_err: ...

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

charliecreates · 2026-03-21T06:30:22Z

+    def _needs_format_conversion(self, top_op: TOP) -> bool:
+        """Return True if the TOP's pixel format is unsupported by cudaMemory() in TD 2025.
+
+        TD 2025 (CUDA 12.8) rejects float16 formats from cudaMemory().
+        uint8, uint16 (fixed), and float32 are supported.
+        """
+        pixel_fmt = getattr(top_op, "pixelFormat", "")
+        return any(unsupported in pixel_fmt for unsupported in _CUDA_UNSUPPORTED_PIXEL_FORMATS)
+


_needs_format_conversion() does a case-sensitive substring check against top_op.pixelFormat. If TouchDesigner returns formats like RGBA16Float / rgba16float / RGBA 16-bit Float (or any casing variation), this can miss the conversion and you’re back to cudaMemory() failures/oscillation.

This is a correctness/stability issue because it gates the entire TD 2025 workaround.

Suggestion

Normalize the pixel format string before matching.

pixel_fmt = str(getattr(top_op, "pixelFormat", "")).lower() unsupported = ("16-bit float", "16float", "rgba16float") return any(s in pixel_fmt for s in unsupported)

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

charliecreates · 2026-03-21T06:30:22Z

+### Step 6b: Add dtype_converter Transform TOP (Sender mode)
+
+Inside the `CUDAIPCExporter` COMP, add a **Transform TOP** named `dtype_converter`:
+
+1. Create a **Transform TOP**, rename to `dtype_converter`
+2. Set the **Pixel Format** parameter to `"Use Input"` (default — pass-through, zero overhead)
+3. Wire input: `input` In TOP → `dtype_converter`
+4. Wire output: `dtype_converter` → `ExportBuffer` (Null TOP or the node that feeds `cudaMemory()`)
+
+**Purpose**: TouchDesigner 2025 (CUDA 12.8) rejects `rgba16float` formats from `cudaMemory()`. The extension automatically detects unsupported source formats (float16) and sets `dtype_converter.par.format = "rgba32float"` on the first affected frame — skipping that one frame while the conversion takes effect. For all other formats (uint8, uint16 fixed, float32) the node stays at `"Use Input"` with zero overhead.
+
+**This node is managed automatically** — no manual format changes are needed.
+


The build guide introduces dtype_converter + ExportBuffer as the node feeding cudaMemory(), but it doesn’t explicitly instruct readers to ensure the sender callback calls export_frame() on that post-conversion node (or that the extension resolves it internally). Without this, users can wire the graph correctly but still export from input and bypass conversion.

Suggestion

Add an explicit line in Step 6b stating that the sender callback should pass op('ExportBuffer') (or op('dtype_converter')) into export_frame(), or document that the extension resolves the correct node internally.

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this doc tweak.

…kflows

… fallback error, stale doc

forkni · 2026-03-21T16:15:02Z

@CharlieHelps Analyze new commits

charliecreates · 2026-03-21T16:15:08Z

Expand this to see my work.

Read through prior diffs in cuda_ipc_importer.py, CUDAIPCExtension, and the TOX build guide docs to map out how exception handling, dtype mapping, and format auto-conversion have evolved in PR #6.
Inspected commit 3e03a93 to confirm it only introduces .gemini prompt files and updated .github/workflows/gemini-*.yml CI workflows, with no runtime changes to cuda_link modules.
Reviewed commit f4c78d9 to see the PEP 604 type-hint modernization across cuda_ipc_exporter.py, cuda_ipc_importer.py, cuda_ipc_wrapper.py, debug_utils.py, CUDAIPCExtension.py, and CUDAIPCWrapper.py, along with updated CUDA_Link_Example.*.toe and TOXES/CUDAIPCLink_v0.7.0.tox.
Searched within td_exporter/CUDAIPCExtension.py for _CUDA_UNSUPPORTED_PIXEL_FORMATS, _FMT_TRANSFORM_NAME, and export_frame( and skimmed the top ~120 lines to verify how the dtype_converter Transform TOP is wired into export_frame() and how unsupported 16-bit float formats are filtered before calling cudaMemory().
Reviewed td_exporter/callbacks_template.py to confirm onFrameEnd() now calls ext.export_frame(op("ExportBuffer")) and that onFrameStart() handles receiver-side import_frame() and update_receiver_resolution() with the modoutsidecook optimization path.
Ran a ripgrep search in src/cuda_link for uint16/int16 mentions to confirm the current mapping uses torch.uint16 and cp.uint16 without any remaining uint16→int16 fallback paths in cuda_ipc_exporter.py or cuda_ipc_importer.py.
Searched docs and td_exporter for export_frame(op('input')) references and verified docs/TOX_BUILD_GUIDE.md now documents the correct flow input → dtype_converter → ExportBuffer → export_frame() and explicitly calls out export_frame(op('ExportBuffer')).
Checked README.md and docs/ARCHITECTURE.md for uint16 and PyTorch references, confirming the architecture notes that TD 2025 rejects rgba16float from cudaMemory() and that the sender auto-converts to rgba32float while treating uint16 as a supported format in the updated pipeline.

charliecreates · 2026-03-21T16:17:17Z

#6 (comment)

Looked through the new commits on development (latest: 4e68343).

4e68343 directly addresses the main correctness/stability points from the earlier review:
- No more silent uint16 corruption in Torch: CUDAIPCImporter._torch_dtype() no longer falls back to torch.int16 when torch.uint16 isn’t available; it now raises a clear error telling users to use get_frame_numpy() or upgrade PyTorch. (src/cuda_link/cuda_ipc_importer.py)
- Format detection is now case-insensitive: _needs_format_conversion() lowercases pixelFormat before substring matching, so it should trigger reliably across TD’s formatting/casing variants. (td_exporter/CUDAIPCExtension.py)
- Exception handling narrowed: the cudaMemory() wrapper now catches Exception instead of BaseException, so it won’t swallow KeyboardInterrupt / SystemExit. (td_exporter/CUDAIPCExtension.py)
- Docs now force the correct “post-conversion” export node: the TOX build guide explicitly says onFrameEnd must call export_frame(op('ExportBuffer')) and shows the intended graph input → dtype_converter → ExportBuffer → export_frame(). (docs/TOX_BUILD_GUIDE.md)
3e03a93 is CI/docs-only: adds .gemini/* prompt files and updates Gemini workflows to include checkout + some small plumbing changes.

One remaining footgun (now mostly doc-covered)

CUDAIPCExtension.export_frame() still uses the caller-provided top_op for top_op.cudaMemory(...).

With the updated build guide (and the current td_exporter/callbacks_template.py), this is fine because the callback passes ExportBuffer (downstream of dtype_converter). But any custom callback that passes op('input') (or some upstream TOP) can still bypass conversion and re-trigger the TD 2025 float16 failure mode.

If you want to make this “hard to misuse,” the next incremental hardening would be: when the extension sees dtype_converter / ExportBuffer exist, resolve the effective export TOP internally (and optionally warn when the passed-in top_op doesn’t match).

web-flow added 3 commits March 19, 2026 14:20

fix: improve dtype detection, add metadata-only re-init, optimize syn…

ddd8f0f

…c event

fix: replace dynamic Null TOP format converter with permanent dtype_c…

14a6d26

…onverter Transform TOP

feat: bump version to v0.7.0, align docs with dtype_converter and for…

1d8823a

…mat auto-conversion

charliecreates Bot requested a review from CharlieHelps March 21, 2026 06:24

chore: modernize type hints (PEP 604), update example toe and tox v0.7.0

f4c78d9

charliecreates Bot reviewed Mar 21, 2026

View reviewed changes

charliecreates Bot removed the request for review from CharlieHelps March 21, 2026 06:30

web-flow added 2 commits March 21, 2026 02:42

fix: add .gemini prompt files and checkout steps to fix Gemini CI wor…

3e03a93

…kflows

fix: narrow exception handling, case-insensitive format check, uint16…

4e68343

… fallback error, stale doc

forkni closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6

feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6
forkni wants to merge 6 commits intomasterfrom
development

forkni commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

charliecreates Bot left a comment

Uh oh!

charliecreates Bot Mar 21, 2026

Uh oh!

charliecreates Bot Mar 21, 2026

Uh oh!

charliecreates Bot Mar 21, 2026

Uh oh!

charliecreates Bot Mar 21, 2026

Uh oh!

forkni commented Mar 21, 2026

Uh oh!

charliecreates Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

charliecreates Bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

forkni commented Mar 21, 2026

Summary

Changes

Test plan

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

github-actions Bot commented Mar 21, 2026

Uh oh!

charliecreates Bot left a comment

Choose a reason for hiding this comment

Summary of changes

Docs alignment

Uh oh!

charliecreates Bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

charliecreates Bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

charliecreates Bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

charliecreates Bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

forkni commented Mar 21, 2026

Uh oh!

charliecreates Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charliecreates Bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charliecreates Bot commented Mar 21, 2026 •

edited

Loading