Request: DFlash draft checkpoint for Qwen3.6-35B-A3B

Hi z-lab team — thanks for the DFlash + DDTree work.

I'm running **Qwen/Qwen3.6-35B-A3B** on an NVIDIA GB10 (DGX Spark, 122.5GB unified memory) and would love to use DFlash for it. You already ship a draft for `Qwen3.5-35B-A3B`; since 3.6 keeps the same hybrid Gated DeltaNet + MoE architecture class and the same 3B active-parameter budget, publishing a 3.6 draft should be a relatively incremental extension.

**Why it matters for 3.6 specifically:** The hybrid SSM/MoE layers break plain autoregressive speculative decoding (llama.cpp and other engines cannot do partial state rollback across recurrent layers). DFlash's one-pass diffusion drafter sidesteps that blocker entirely, so it's one of the few speculative methods that can actually accelerate this architecture class.

**Ask:** Could you add a DFlash draft checkpoint for `Qwen/Qwen3.6-35B-A3B` to your HF collection? Happy to benchmark on GB10 and report tok/s numbers once it's published.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: DFlash draft checkpoint for Qwen3.6-35B-A3B #68

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request: DFlash draft checkpoint for Qwen3.6-35B-A3B #68

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions