Skip to content

Request: DFlash draft checkpoint for Qwen3.6-35B-A3B #68

@afifsagent

Description

@afifsagent

Hi z-lab team — thanks for the DFlash + DDTree work.

I'm running Qwen/Qwen3.6-35B-A3B on an NVIDIA GB10 (DGX Spark, 122.5GB unified memory) and would love to use DFlash for it. You already ship a draft for Qwen3.5-35B-A3B; since 3.6 keeps the same hybrid Gated DeltaNet + MoE architecture class and the same 3B active-parameter budget, publishing a 3.6 draft should be a relatively incremental extension.

Why it matters for 3.6 specifically: The hybrid SSM/MoE layers break plain autoregressive speculative decoding (llama.cpp and other engines cannot do partial state rollback across recurrent layers). DFlash's one-pass diffusion drafter sidesteps that blocker entirely, so it's one of the few speculative methods that can actually accelerate this architecture class.

Ask: Could you add a DFlash draft checkpoint for Qwen/Qwen3.6-35B-A3B to your HF collection? Happy to benchmark on GB10 and report tok/s numbers once it's published.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions