feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6
feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6
Conversation
…onverter Transform TOP
…mat auto-conversion
|
🤖 Hi @forkni, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
|
🤖 I'm sorry @forkni, but I was unable to process your request. Please see the logs for more details. |
There was a problem hiding this comment.
The new uint16 Torch fallback to torch.int16 will silently corrupt values for half the range and should not ship as-is. The sender-side dtype_converter toggling looks like it can be bypassed depending on what TOP is passed into export_frame(), which can make float16 sources fail permanently. Exception handling is overly broad (BaseException) and shutdown paths for pinned memory cleanup should be more defensive. Docs should explicitly connect the new ExportBuffer/dtype_converter graph to what node is actually exported.
Additional notes (1)
- Compatibility |
src/cuda_link/cuda_ipc_importer.py:171-176
Theuint16fallback totorch.int16is silently wrong for consumers: values>= 32768will become negative, so downstream code will see different numerical values even though the bit-width matches. This is a correctness issue and will be very hard to diagnose because it “works” but produces corrupted semantics.
Summary of changes
Summary of changes
- Bumped release artifacts and docs from
v0.6.8→v0.7.0(README, build script,pyproject.toml,__init__,.gitignore) and addedTOXES/CUDAIPCLink_v0.7.0.tox. - Added
uint16support to the shared-memory protocol/docs and importer backends (CuPy +__cuda_array_interface__+ Torch dtype mapping). - Introduced a non-timing CUDA event helper (
create_sync_event()) and switched exporter cross-stream sync to use it. - Updated TouchDesigner sender to use a permanent
dtype_converterTransform TOP for auto pixel-format conversion (notably float16 →rgba32float) and added receiver-side pinned host buffer optimization for float16 D2H conversion.
Docs alignment
- Updated TOX build guide to include
dtype_converter,ExportBuffer, and additional callback files; fixed TOX save path and CUDA DLL troubleshooting details.
| # Ensure CUDA runtime and stream exist BEFORE first cudaMemory() call. | ||
| # Always use a non-blocking stream (never None/default stream) for TD 2025 compat. | ||
| if self.cuda is None: | ||
| self.cuda = get_cuda_runtime() | ||
| if not hasattr(self, "ipc_stream") or self.ipc_stream is None: | ||
| self.ipc_stream = self.cuda.create_stream(flags=0x01) # cudaStreamNonBlocking | ||
| self._log( | ||
| f"Created IPC stream (pre-init): 0x{int(self.ipc_stream.value):016x}", | ||
| force=True, | ||
| ) | ||
|
|
||
| # TD 2025 rejects float16 pixel formats from cudaMemory(). | ||
| # dtype_converter Transform TOP sits before ExportBuffer — toggle its format param. | ||
| # Check source TOP (upstream of converter), not ExportBuffer (downstream). | ||
| effective_top = top_op | ||
| fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME) | ||
| if fmt_transform is not None: | ||
| source_top = fmt_transform.inputs[0] if fmt_transform.inputs else top_op | ||
| if self._needs_format_conversion(source_top): | ||
| if not self._fmt_conv_active: | ||
| fmt_transform.par.format = "rgba32float" | ||
| self._fmt_conv_active = True | ||
| self._log( | ||
| f"Pixel format '{getattr(source_top, 'pixelFormat', '?')}' unsupported " | ||
| f"by cudaMemory() — dtype_converter set to rgba32float, skipping frame", | ||
| force=True, | ||
| ) | ||
| return False # dtype_converter cooks next frame | ||
| else: | ||
| if self._fmt_conv_active: | ||
| fmt_transform.par.format = "useinput" | ||
| self._fmt_conv_active = False | ||
| self._log( | ||
| "Source format CUDA-compatible — dtype_converter set to useinput", | ||
| force=True, | ||
| ) | ||
| return False # format reverts next cook | ||
|
|
||
| # Time cudaMemory() call (OpenGL→CUDA interop) | ||
| if self.verbose_performance: | ||
| cuda_mem_start = time.perf_counter() | ||
|
|
||
| # Get TOP's CUDA memory (pass stream for proper synchronization per TD docs) | ||
| cuda_mem = top_op.cudaMemory( | ||
| stream=int(self.ipc_stream.value) if self._initialized else None, | ||
| ) | ||
| # Get TOP's CUDA memory — always pass a valid stream (never None) | ||
| try: | ||
| cuda_mem = effective_top.cudaMemory( | ||
| stream=int(self.ipc_stream.value), | ||
| ) | ||
| except BaseException as cuda_err: |
There was a problem hiding this comment.
dtype_converter is toggled based on the upstream source format, but cudaMemory() is still taken from effective_top which remains top_op. If callers pass the input TOP (as many TD callback templates do), the conversion will never be used and float16 sources will continue to fail indefinitely. The exporter should consistently pull CUDA memory from the post-conversion node inside the component (e.g., ExportBuffer or dtype_converter) whenever it exists, rather than trusting the caller-provided top_op.
Suggestion
Resolve the export TOP internally when dtype_converter (and/or ExportBuffer) exists, and use that for cudaMemory().
fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME)
export_top = self.ownerComp.op("ExportBuffer") or fmt_transform
effective_top = export_top or top_op
source_top = fmt_transform.inputs[0] if (fmt_transform and fmt_transform.inputs) else top_op
# ...format detection/toggling based on source_top...
cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value))Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.
| try: | ||
| cuda_mem = effective_top.cudaMemory( | ||
| stream=int(self.ipc_stream.value), | ||
| ) | ||
| except BaseException as cuda_err: | ||
| pixel_fmt = getattr(effective_top, "pixelFormat", "unknown") | ||
| err_msg = f"cudaMemory() failed (pixelFormat={pixel_fmt}): {cuda_err}" | ||
| if err_msg != getattr(self, "_last_cuda_mem_err", ""): | ||
| self._log(err_msg, force=True) | ||
| self._last_cuda_mem_err = err_msg | ||
| return False |
There was a problem hiding this comment.
Catching BaseException around cudaMemory() will also swallow KeyboardInterrupt/SystemExit, making TD sessions harder to stop/debug and potentially hiding fatal shutdown conditions. This should catch Exception (or specific TD/CUDA exceptions) instead.
Suggestion
Narrow the exception handling to Exception.
try:
cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value))
except Exception as cuda_err:
...Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.
| def _needs_format_conversion(self, top_op: TOP) -> bool: | ||
| """Return True if the TOP's pixel format is unsupported by cudaMemory() in TD 2025. | ||
|
|
||
| TD 2025 (CUDA 12.8) rejects float16 formats from cudaMemory(). | ||
| uint8, uint16 (fixed), and float32 are supported. | ||
| """ | ||
| pixel_fmt = getattr(top_op, "pixelFormat", "") | ||
| return any(unsupported in pixel_fmt for unsupported in _CUDA_UNSUPPORTED_PIXEL_FORMATS) | ||
|
|
There was a problem hiding this comment.
_needs_format_conversion() does a case-sensitive substring check against top_op.pixelFormat. If TouchDesigner returns formats like RGBA16Float / rgba16float / RGBA 16-bit Float (or any casing variation), this can miss the conversion and you’re back to cudaMemory() failures/oscillation.
This is a correctness/stability issue because it gates the entire TD 2025 workaround.
Suggestion
Normalize the pixel format string before matching.
pixel_fmt = str(getattr(top_op, "pixelFormat", "")).lower()
unsupported = ("16-bit float", "16float", "rgba16float")
return any(s in pixel_fmt for s in unsupported)Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.
| ### Step 6b: Add dtype_converter Transform TOP (Sender mode) | ||
|
|
||
| Inside the `CUDAIPCExporter` COMP, add a **Transform TOP** named `dtype_converter`: | ||
|
|
||
| 1. Create a **Transform TOP**, rename to `dtype_converter` | ||
| 2. Set the **Pixel Format** parameter to `"Use Input"` (default — pass-through, zero overhead) | ||
| 3. Wire input: `input` In TOP → `dtype_converter` | ||
| 4. Wire output: `dtype_converter` → `ExportBuffer` (Null TOP or the node that feeds `cudaMemory()`) | ||
|
|
||
| **Purpose**: TouchDesigner 2025 (CUDA 12.8) rejects `rgba16float` formats from `cudaMemory()`. The extension automatically detects unsupported source formats (float16) and sets `dtype_converter.par.format = "rgba32float"` on the first affected frame — skipping that one frame while the conversion takes effect. For all other formats (uint8, uint16 fixed, float32) the node stays at `"Use Input"` with zero overhead. | ||
|
|
||
| **This node is managed automatically** — no manual format changes are needed. | ||
|
|
There was a problem hiding this comment.
The build guide introduces dtype_converter + ExportBuffer as the node feeding cudaMemory(), but it doesn’t explicitly instruct readers to ensure the sender callback calls export_frame() on that post-conversion node (or that the extension resolves it internally). Without this, users can wire the graph correctly but still export from input and bypass conversion.
Suggestion
Add an explicit line in Step 6b stating that the sender callback should pass op('ExportBuffer') (or op('dtype_converter')) into export_frame(), or document that the extension resolves the correct node internally.
Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this doc tweak.
… fallback error, stale doc
|
@CharlieHelps Analyze new commits |
Expand this to see my work.
|
|
Looked through the new commits on
One remaining footgun (now mostly doc-covered)
With the updated build guide (and the current If you want to make this “hard to misuse,” the next incremental hardening would be: when the extension sees |
Summary
dtype_converterTransform TOP — eliminates per-frame oscillation and same-cook-frame failure forrgba16floatsources in TD 2025 (CUDA 12.8)pyproject.toml,__init__.py, wheel filenames, tox filename)dtype_converterTransform TOP to TOX Build Guide component structure and assembly stepsparexecute_callbacks.py,script_top_callbacks.py,benchmark_timestamp.py) to file reference tableTOXES/subfolder), CUDA DLL error message (cudart64_110.dll)uint16dtype_code (3) to ARCHITECTURE.md protocol table; add float16 format compatibility noteChanges
ddd8f0f14a6d261d8823aTest plan
from cuda_link import __version__returns"0.7.0"0.6.9references remain in committed filesTOXES/CUDAIPCLink_v0.7.0.toxis presentrgba16floatsource → confirm one-frame skip + stable operation, no oscillationrgba32floatsource → confirmdtype_converterstays atuseinputpytest tests/ -v -m "not requires_cuda"🤖 Generated with Claude Code