Hi z-lab team — thanks for the DFlash + DDTree work.
I'm running Qwen/Qwen3.6-35B-A3B on an NVIDIA GB10 (DGX Spark, 122.5GB unified memory) and would love to use DFlash for it. You already ship a draft for Qwen3.5-35B-A3B; since 3.6 keeps the same hybrid Gated DeltaNet + MoE architecture class and the same 3B active-parameter budget, publishing a 3.6 draft should be a relatively incremental extension.
Why it matters for 3.6 specifically: The hybrid SSM/MoE layers break plain autoregressive speculative decoding (llama.cpp and other engines cannot do partial state rollback across recurrent layers). DFlash's one-pass diffusion drafter sidesteps that blocker entirely, so it's one of the few speculative methods that can actually accelerate this architecture class.
Ask: Could you add a DFlash draft checkpoint for Qwen/Qwen3.6-35B-A3B to your HF collection? Happy to benchmark on GB10 and report tok/s numbers once it's published.
Thanks!
Hi z-lab team — thanks for the DFlash + DDTree work.
I'm running Qwen/Qwen3.6-35B-A3B on an NVIDIA GB10 (DGX Spark, 122.5GB unified memory) and would love to use DFlash for it. You already ship a draft for
Qwen3.5-35B-A3B; since 3.6 keeps the same hybrid Gated DeltaNet + MoE architecture class and the same 3B active-parameter budget, publishing a 3.6 draft should be a relatively incremental extension.Why it matters for 3.6 specifically: The hybrid SSM/MoE layers break plain autoregressive speculative decoding (llama.cpp and other engines cannot do partial state rollback across recurrent layers). DFlash's one-pass diffusion drafter sidesteps that blocker entirely, so it's one of the few speculative methods that can actually accelerate this architecture class.
Ask: Could you add a DFlash draft checkpoint for
Qwen/Qwen3.6-35B-A3Bto your HF collection? Happy to benchmark on GB10 and report tok/s numbers once it's published.Thanks!