Skip to content

feat: add Qwen Image 2512 txt2img support#132

Open
lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512
Open

feat: add Qwen Image 2512 txt2img support#132
lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented Mar 27, 2026

Summary

Adds Qwen Image 2512 text-to-image support by reusing the existing Qwen Image Edit infrastructure. Both models share the same base type (qwen-image-edit) since they use identical architecture (transformer, VAE, text encoder, scheduler).

Depends on: #131 (Qwen Image Edit 2511)

Changes

  • Text encoder: Auto-selects prompt template based on whether reference images are provided. Edit mode uses the image-editing system prompt (drop_idx=64); generate mode uses the "describe the image" prompt (drop_idx=34).
  • Denoise: Detects zero_cond_t on the transformer to decide whether to concatenate reference latents. Txt2img models (zero_cond_t=False) pass only noisy patches with a single-entry img_shapes.
  • Model config: Accepts QwenImagePipeline in addition to QwenImageEditPlusPipeline for Diffusers model detection.
  • LoRA: Handles transformer. key prefix from some training frameworks; updated config detection.
  • Starter models: Qwen-Image-2512 full Diffusers + 4 GGUF variants (Q2_K, Q4_K_M, Q6_K, Q8_0) + Lightning V2.0 LoRAs (4-step, 8-step bf16), all added to the Qwen Image Edit bundle.

Testing

  1. Install "Qwen Image 2512" from Starter Models (or a GGUF variant + the Diffusers model as Component Source)
  2. Enter a text prompt and generate — no reference image needed
  3. Test with Lightning LoRA: Steps=4, CFG=1, Shift Override=3
  4. Verify the Qwen Image Edit model still works correctly with reference images

🤖 Generated with Claude Code

Shares the QwenImageEdit base type and infrastructure with the edit model.
Key changes:

- Text encoder: auto-selects prompt template based on reference images —
  edit template (drop_idx=64) when images present, generate template
  (drop_idx=34) when absent
- Denoise: detects zero_cond_t to determine whether to concatenate
  reference latents; txt2img models pass only noisy patches with a
  single-entry img_shapes
- Model config: accept QwenImagePipeline in addition to
  QwenImageEditPlusPipeline
- LoRA: handle "transformer." key prefix from some training frameworks,
  add to config detection
- Starter models: Qwen-Image-2512 full + 4 GGUF variants + Lightning
  V2.0 LoRAs (4-step, 8-step), all added to the Qwen Image Edit bundle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@lstein lstein force-pushed the feat/qwen-image-2512 branch from dfe597f to 2f10d83 Compare March 28, 2026 02:53
lstein and others added 16 commits March 27, 2026 22:57
…geEditMainModelConfig)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator
tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model
detection for GGUF and Diffusers models. Making variant optional with
default=None restores the correct tags (main.gguf_quantized.qwen-image).

The variant is still set during Diffusers model probing via
_get_qwen_image_variant() and can be manually set for GGUF models.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rename from qwen_image_edit -> qwen_image caused variable name
collisions with the txt2img starter models. Give edit models the
qwen_image_edit_* prefix to distinguish from qwen_image_* (txt2img).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…URLs

The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511'
inside the HuggingFace URLs, but the actual files on HF still have 'edit'
in their names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When switching from an edit model to a generate model, reference images
remain in state but the panel is hidden. Prevent them from being passed
to the text encoder and VAE encoder by checking the model variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the
transformer to double the timestep batch and create modulation indices
for non-existent reference patches, producing noise output. Now checks
the config variant before enabling it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n, shift)

- Save qwen_image_component_source, qwen_image_quantization, and
  qwen_image_shift in generation metadata
- Add metadata recall handlers so remix/recall restores these settings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which
contain "transformer_blocks." as a substring, falsely matching the
Qwen Image LoRA detection. Add single_transformer_blocks to the Flux
exclusion set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the graph builder passed the output canvas dimensions to the
I2L node, which resized the reference image to match — distorting its
aspect ratio when they differed. Now the reference is encoded at its
native size. The denoise node already handles dimension mismatches via
bilinear interpolation in latent space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… without component source

Addresses two reviewer findings:

1. denoising_start/denoising_end were ignored — the full sigma schedule
   was always used regardless of img2img strength. Now clip the scheduler's
   sigmas to the fractional range before stepping, and use manual Euler
   steps with the clipped schedule (scheduler.step() can't handle clipped
   schedules due to internal index tracking).

2. GGUF Qwen Image models could be enqueued without a Component Source,
   deferring the error to runtime. Added readiness checks on both the
   Generate and Canvas tabs that block enqueue when a GGUF model is
   selected but no Diffusers component source is configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All invocation nodes work with both Qwen Image (txt2img) and Qwen Image
Edit models. Rename titles and docstrings from "Qwen Image Edit" to
"Qwen Image" to avoid confusion. Also remove duplicate GGUF readiness
check in the Generate tab.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The negative conditioning node was always added to the graph, causing
the text encoder to be loaded twice even when CFG=1 (where the negative
prediction is unused). Now only adds the negative node when cfg_scale > 1.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kohya LoRAs use underscore-separated keys like
lora_unet_transformer_blocks_0_attn_to_k.lokr_w1 instead of the
diffusers dot-separated format. Add:

- Kohya key detection (lora_unet_transformer_blocks_*)
- Key conversion mapping from Kohya underscores to model dot-paths
- Updated LoRA config detection to recognize Kohya format + LoKR suffixes
- Flux Kohya exclusion (lora_unet_double_blocks, lora_unet_single_blocks)
- Test model for Kohya LoKR identification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant