Skip to content

feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6

Closed
forkni wants to merge 6 commits intomasterfrom
development
Closed

feat: v0.7.0 — dtype_converter format auto-conversion, docs alignment#6
forkni wants to merge 6 commits intomasterfrom
development

Conversation

@forkni
Copy link
Copy Markdown
Owner

@forkni forkni commented Mar 21, 2026

Summary

  • fix: Replace dynamic Null TOP format converter with permanent dtype_converter Transform TOP — eliminates per-frame oscillation and same-cook-frame failure for rgba16float sources in TD 2025 (CUDA 12.8)
  • fix: Auto-detect unsupported float16 pixel formats from source TOP (upstream of converter), skip one transition frame, then operate stably without oscillation
  • feat: Bump version to v0.7.0 across all canonical sources (pyproject.toml, __init__.py, wheel filenames, tox filename)
  • docs: Add dtype_converter Transform TOP to TOX Build Guide component structure and assembly steps
  • docs: Add missing files (parexecute_callbacks.py, script_top_callbacks.py, benchmark_timestamp.py) to file reference table
  • docs: Fix tox save path (TOXES/ subfolder), CUDA DLL error message (cudart64_110.dll)
  • docs: Add uint16 dtype_code (3) to ARCHITECTURE.md protocol table; add float16 format compatibility note

Changes

Commit Description
ddd8f0f fix: improve dtype detection, add metadata-only re-init, optimize sync event
14a6d26 fix: replace dynamic Null TOP format converter with permanent dtype_converter Transform TOP
1d8823a feat: bump version to v0.7.0, align docs with dtype_converter and format auto-conversion

Test plan

  • Verify from cuda_link import __version__ returns "0.7.0"
  • Verify no 0.6.9 references remain in committed files
  • Verify TOXES/CUDAIPCLink_v0.7.0.tox is present
  • In TouchDesigner: feed rgba16float source → confirm one-frame skip + stable operation, no oscillation
  • In TouchDesigner: feed rgba32float source → confirm dtype_converter stays at useinput
  • Run test suite: pytest tests/ -v -m "not requires_cuda"

🤖 Generated with Claude Code

@charliecreates charliecreates Bot requested a review from CharlieHelps March 21, 2026 06:24
@github-actions
Copy link
Copy Markdown

🤖 Hi @forkni, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions
Copy link
Copy Markdown

🤖 I'm sorry @forkni, but I was unable to process your request. Please see the logs for more details.

Copy link
Copy Markdown

@charliecreates charliecreates Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new uint16 Torch fallback to torch.int16 will silently corrupt values for half the range and should not ship as-is. The sender-side dtype_converter toggling looks like it can be bypassed depending on what TOP is passed into export_frame(), which can make float16 sources fail permanently. Exception handling is overly broad (BaseException) and shutdown paths for pinned memory cleanup should be more defensive. Docs should explicitly connect the new ExportBuffer/dtype_converter graph to what node is actually exported.

Additional notes (1)
  • Compatibility | src/cuda_link/cuda_ipc_importer.py:171-176
    The uint16 fallback to torch.int16 is silently wrong for consumers: values >= 32768 will become negative, so downstream code will see different numerical values even though the bit-width matches. This is a correctness issue and will be very hard to diagnose because it “works” but produces corrupted semantics.
Summary of changes

Summary of changes

  • Bumped release artifacts and docs from v0.6.8v0.7.0 (README, build script, pyproject.toml, __init__, .gitignore) and added TOXES/CUDAIPCLink_v0.7.0.tox.
  • Added uint16 support to the shared-memory protocol/docs and importer backends (CuPy + __cuda_array_interface__ + Torch dtype mapping).
  • Introduced a non-timing CUDA event helper (create_sync_event()) and switched exporter cross-stream sync to use it.
  • Updated TouchDesigner sender to use a permanent dtype_converter Transform TOP for auto pixel-format conversion (notably float16 → rgba32float) and added receiver-side pinned host buffer optimization for float16 D2H conversion.

Docs alignment

  • Updated TOX build guide to include dtype_converter, ExportBuffer, and additional callback files; fixed TOX save path and CUDA DLL troubleshooting details.

Comment thread td_exporter/CUDAIPCExtension.py Outdated
Comment on lines +562 to +609
# Ensure CUDA runtime and stream exist BEFORE first cudaMemory() call.
# Always use a non-blocking stream (never None/default stream) for TD 2025 compat.
if self.cuda is None:
self.cuda = get_cuda_runtime()
if not hasattr(self, "ipc_stream") or self.ipc_stream is None:
self.ipc_stream = self.cuda.create_stream(flags=0x01) # cudaStreamNonBlocking
self._log(
f"Created IPC stream (pre-init): 0x{int(self.ipc_stream.value):016x}",
force=True,
)

# TD 2025 rejects float16 pixel formats from cudaMemory().
# dtype_converter Transform TOP sits before ExportBuffer — toggle its format param.
# Check source TOP (upstream of converter), not ExportBuffer (downstream).
effective_top = top_op
fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME)
if fmt_transform is not None:
source_top = fmt_transform.inputs[0] if fmt_transform.inputs else top_op
if self._needs_format_conversion(source_top):
if not self._fmt_conv_active:
fmt_transform.par.format = "rgba32float"
self._fmt_conv_active = True
self._log(
f"Pixel format '{getattr(source_top, 'pixelFormat', '?')}' unsupported "
f"by cudaMemory() — dtype_converter set to rgba32float, skipping frame",
force=True,
)
return False # dtype_converter cooks next frame
else:
if self._fmt_conv_active:
fmt_transform.par.format = "useinput"
self._fmt_conv_active = False
self._log(
"Source format CUDA-compatible — dtype_converter set to useinput",
force=True,
)
return False # format reverts next cook

# Time cudaMemory() call (OpenGL→CUDA interop)
if self.verbose_performance:
cuda_mem_start = time.perf_counter()

# Get TOP's CUDA memory (pass stream for proper synchronization per TD docs)
cuda_mem = top_op.cudaMemory(
stream=int(self.ipc_stream.value) if self._initialized else None,
)
# Get TOP's CUDA memory — always pass a valid stream (never None)
try:
cuda_mem = effective_top.cudaMemory(
stream=int(self.ipc_stream.value),
)
except BaseException as cuda_err:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype_converter is toggled based on the upstream source format, but cudaMemory() is still taken from effective_top which remains top_op. If callers pass the input TOP (as many TD callback templates do), the conversion will never be used and float16 sources will continue to fail indefinitely. The exporter should consistently pull CUDA memory from the post-conversion node inside the component (e.g., ExportBuffer or dtype_converter) whenever it exists, rather than trusting the caller-provided top_op.

Suggestion

Resolve the export TOP internally when dtype_converter (and/or ExportBuffer) exists, and use that for cudaMemory().

fmt_transform = self.ownerComp.op(_FMT_TRANSFORM_NAME)
export_top = self.ownerComp.op("ExportBuffer") or fmt_transform
effective_top = export_top or top_op

source_top = fmt_transform.inputs[0] if (fmt_transform and fmt_transform.inputs) else top_op
# ...format detection/toggling based on source_top...

cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value))

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

Comment on lines +605 to +615
try:
cuda_mem = effective_top.cudaMemory(
stream=int(self.ipc_stream.value),
)
except BaseException as cuda_err:
pixel_fmt = getattr(effective_top, "pixelFormat", "unknown")
err_msg = f"cudaMemory() failed (pixelFormat={pixel_fmt}): {cuda_err}"
if err_msg != getattr(self, "_last_cuda_mem_err", ""):
self._log(err_msg, force=True)
self._last_cuda_mem_err = err_msg
return False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching BaseException around cudaMemory() will also swallow KeyboardInterrupt/SystemExit, making TD sessions harder to stop/debug and potentially hiding fatal shutdown conditions. This should catch Exception (or specific TD/CUDA exceptions) instead.

Suggestion

Narrow the exception handling to Exception.

try:
    cuda_mem = effective_top.cudaMemory(stream=int(self.ipc_stream.value))
except Exception as cuda_err:
    ...

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

Comment on lines +227 to +235
def _needs_format_conversion(self, top_op: TOP) -> bool:
"""Return True if the TOP's pixel format is unsupported by cudaMemory() in TD 2025.

TD 2025 (CUDA 12.8) rejects float16 formats from cudaMemory().
uint8, uint16 (fixed), and float32 are supported.
"""
pixel_fmt = getattr(top_op, "pixelFormat", "")
return any(unsupported in pixel_fmt for unsupported in _CUDA_UNSUPPORTED_PIXEL_FORMATS)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_needs_format_conversion() does a case-sensitive substring check against top_op.pixelFormat. If TouchDesigner returns formats like RGBA16Float / rgba16float / RGBA 16-bit Float (or any casing variation), this can miss the conversion and you’re back to cudaMemory() failures/oscillation.

This is a correctness/stability issue because it gates the entire TD 2025 workaround.

Suggestion

Normalize the pixel format string before matching.

pixel_fmt = str(getattr(top_op, "pixelFormat", "")).lower()
unsupported = ("16-bit float", "16float", "rgba16float")
return any(s in pixel_fmt for s in unsupported)

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.

Comment thread docs/TOX_BUILD_GUIDE.md
Comment on lines +117 to +129
### Step 6b: Add dtype_converter Transform TOP (Sender mode)

Inside the `CUDAIPCExporter` COMP, add a **Transform TOP** named `dtype_converter`:

1. Create a **Transform TOP**, rename to `dtype_converter`
2. Set the **Pixel Format** parameter to `"Use Input"` (default — pass-through, zero overhead)
3. Wire input: `input` In TOP → `dtype_converter`
4. Wire output: `dtype_converter` → `ExportBuffer` (Null TOP or the node that feeds `cudaMemory()`)

**Purpose**: TouchDesigner 2025 (CUDA 12.8) rejects `rgba16float` formats from `cudaMemory()`. The extension automatically detects unsupported source formats (float16) and sets `dtype_converter.par.format = "rgba32float"` on the first affected frame — skipping that one frame while the conversion takes effect. For all other formats (uint8, uint16 fixed, float32) the node stays at `"Use Input"` with zero overhead.

**This node is managed automatically** — no manual format changes are needed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build guide introduces dtype_converter + ExportBuffer as the node feeding cudaMemory(), but it doesn’t explicitly instruct readers to ensure the sender callback calls export_frame() on that post-conversion node (or that the extension resolves it internally). Without this, users can wire the graph correctly but still export from input and bypass conversion.

Suggestion

Add an explicit line in Step 6b stating that the sender callback should pass op('ExportBuffer') (or op('dtype_converter')) into export_frame(), or document that the extension resolves the correct node internally.

Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this doc tweak.

@charliecreates charliecreates Bot removed the request for review from CharlieHelps March 21, 2026 06:30
@forkni
Copy link
Copy Markdown
Owner Author

forkni commented Mar 21, 2026

@CharlieHelps Analyze new commits

@charliecreates
Copy link
Copy Markdown

charliecreates Bot commented Mar 21, 2026

Expand this to see my work.
  • Read through prior diffs in cuda_ipc_importer.py, CUDAIPCExtension, and the TOX build guide docs to map out how exception handling, dtype mapping, and format auto-conversion have evolved in PR #6.
  • Inspected commit 3e03a93 to confirm it only introduces .gemini prompt files and updated .github/workflows/gemini-*.yml CI workflows, with no runtime changes to cuda_link modules.
  • Reviewed commit f4c78d9 to see the PEP 604 type-hint modernization across cuda_ipc_exporter.py, cuda_ipc_importer.py, cuda_ipc_wrapper.py, debug_utils.py, CUDAIPCExtension.py, and CUDAIPCWrapper.py, along with updated CUDA_Link_Example.*.toe and TOXES/CUDAIPCLink_v0.7.0.tox.
  • Searched within td_exporter/CUDAIPCExtension.py for _CUDA_UNSUPPORTED_PIXEL_FORMATS, _FMT_TRANSFORM_NAME, and export_frame( and skimmed the top ~120 lines to verify how the dtype_converter Transform TOP is wired into export_frame() and how unsupported 16-bit float formats are filtered before calling cudaMemory().
  • Reviewed td_exporter/callbacks_template.py to confirm onFrameEnd() now calls ext.export_frame(op("ExportBuffer")) and that onFrameStart() handles receiver-side import_frame() and update_receiver_resolution() with the modoutsidecook optimization path.
  • Ran a ripgrep search in src/cuda_link for uint16/int16 mentions to confirm the current mapping uses torch.uint16 and cp.uint16 without any remaining uint16→int16 fallback paths in cuda_ipc_exporter.py or cuda_ipc_importer.py.
  • Searched docs and td_exporter for export_frame(op('input')) references and verified docs/TOX_BUILD_GUIDE.md now documents the correct flow input → dtype_converter → ExportBuffer → export_frame() and explicitly calls out export_frame(op('ExportBuffer')).
  • Checked README.md and docs/ARCHITECTURE.md for uint16 and PyTorch references, confirming the architecture notes that TD 2025 rejects rgba16float from cudaMemory() and that the sender auto-converts to rgba32float while treating uint16 as a supported format in the updated pipeline.

@charliecreates
Copy link
Copy Markdown

#6 (comment)

Looked through the new commits on development (latest: 4e68343).

  • 4e68343 directly addresses the main correctness/stability points from the earlier review:

    • No more silent uint16 corruption in Torch: CUDAIPCImporter._torch_dtype() no longer falls back to torch.int16 when torch.uint16 isn’t available; it now raises a clear error telling users to use get_frame_numpy() or upgrade PyTorch. (src/cuda_link/cuda_ipc_importer.py)
    • Format detection is now case-insensitive: _needs_format_conversion() lowercases pixelFormat before substring matching, so it should trigger reliably across TD’s formatting/casing variants. (td_exporter/CUDAIPCExtension.py)
    • Exception handling narrowed: the cudaMemory() wrapper now catches Exception instead of BaseException, so it won’t swallow KeyboardInterrupt / SystemExit. (td_exporter/CUDAIPCExtension.py)
    • Docs now force the correct “post-conversion” export node: the TOX build guide explicitly says onFrameEnd must call export_frame(op('ExportBuffer')) and shows the intended graph input → dtype_converter → ExportBuffer → export_frame(). (docs/TOX_BUILD_GUIDE.md)
  • 3e03a93 is CI/docs-only: adds .gemini/* prompt files and updates Gemini workflows to include checkout + some small plumbing changes.

One remaining footgun (now mostly doc-covered)

CUDAIPCExtension.export_frame() still uses the caller-provided top_op for top_op.cudaMemory(...).

With the updated build guide (and the current td_exporter/callbacks_template.py), this is fine because the callback passes ExportBuffer (downstream of dtype_converter). But any custom callback that passes op('input') (or some upstream TOP) can still bypass conversion and re-trigger the TD 2025 float16 failure mode.

If you want to make this “hard to misuse,” the next incremental hardening would be: when the extension sees dtype_converter / ExportBuffer exist, resolve the effective export TOP internally (and optionally warn when the passed-in top_op doesn’t match).

@forkni forkni closed this Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants