feat: add Qwen Image 2512 txt2img support by lstein · Pull Request #132 · lstein/InvokeAI

lstein · 2026-03-27T23:57:23Z

Summary

Adds Qwen Image 2512 text-to-image support by reusing the existing Qwen Image Edit infrastructure. Both models share the same base type (qwen-image-edit) since they use identical architecture (transformer, VAE, text encoder, scheduler).

Depends on: #131 (Qwen Image Edit 2511)

Changes

Text encoder: Auto-selects prompt template based on whether reference images are provided. Edit mode uses the image-editing system prompt (drop_idx=64); generate mode uses the "describe the image" prompt (drop_idx=34).
Denoise: Detects zero_cond_t on the transformer to decide whether to concatenate reference latents. Txt2img models (zero_cond_t=False) pass only noisy patches with a single-entry img_shapes.
Model config: Accepts QwenImagePipeline in addition to QwenImageEditPlusPipeline for Diffusers model detection.
LoRA: Handles transformer. key prefix from some training frameworks; updated config detection.
Starter models: Qwen-Image-2512 full Diffusers + 4 GGUF variants (Q2_K, Q4_K_M, Q6_K, Q8_0) + Lightning V2.0 LoRAs (4-step, 8-step bf16), all added to the Qwen Image Edit bundle.

Testing

Install "Qwen Image 2512" from Starter Models (or a GGUF variant + the Diffusers model as Component Source)
Enter a text prompt and generate — no reference image needed
Test with Lightning LoRA: Steps=4, CFG=1, Shift Override=3
Verify the Qwen Image Edit model still works correctly with reference images

🤖 Generated with Claude Code

Shares the QwenImageEdit base type and infrastructure with the edit model. Key changes: - Text encoder: auto-selects prompt template based on reference images — edit template (drop_idx=64) when images present, generate template (drop_idx=34) when absent - Denoise: detects zero_cond_t to determine whether to concatenate reference latents; txt2img models pass only noisy patches with a single-entry img_shapes - Model config: accept QwenImagePipeline in addition to QwenImageEditPlusPipeline - LoRA: handle "transformer." key prefix from some training frameworks, add to config detection - Starter models: Qwen-Image-2512 full + 4 GGUF variants + Lightning V2.0 LoRAs (4-step, 8-step), all added to the Qwen Image Edit bundle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…geEditMainModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The rename from qwen_image_edit -> qwen_image caused variable name collisions with the txt2img starter models. Give edit models the qwen_image_edit_* prefix to distinguish from qwen_image_* (txt2img). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When switching from an edit model to a generate model, reference images remain in state but the panel is hidden. Prevent them from being passed to the text encoder and VAE encoder by checking the model variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n, shift) - Save qwen_image_component_source, qwen_image_quantization, and qwen_image_shift in generation metadata - Add metadata recall handlers so remix/recall restores these settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously the graph builder passed the output canvas dimensions to the I2L node, which resized the reference image to match — distorting its aspect ratio when they differed. Now the reference is encoded at its native size. The denoise node already handles dimension mismatches via bilinear interpolation in latent space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… without component source Addresses two reviewer findings: 1. denoising_start/denoising_end were ignored — the full sigma schedule was always used regardless of img2img strength. Now clip the scheduler's sigmas to the fractional range before stepping, and use manual Euler steps with the clipped schedule (scheduler.step() can't handle clipped schedules due to internal index tracking). 2. GGUF Qwen Image models could be enqueued without a Component Source, deferring the error to runtime. Added readiness checks on both the Generate and Canvas tabs that block enqueue when a GGUF model is selected but no Diffusers component source is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All invocation nodes work with both Qwen Image (txt2img) and Qwen Image Edit models. Rename titles and docstrings from "Qwen Image Edit" to "Qwen Image" to avoid confusion. Also remove duplicate GGUF readiness check in the Generate tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The negative conditioning node was always added to the graph, causing the text encoder to be loaded twice even when CFG=1 (where the negative prediction is unused). Now only adds the negative node when cfg_scale > 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Kohya LoRAs use underscore-separated keys like lora_unet_transformer_blocks_0_attn_to_k.lokr_w1 instead of the diffusers dot-separated format. Add: - Kohya key detection (lora_unet_transformer_blocks_*) - Key conversion mapping from Kohya underscores to model dot-paths - Updated LoRA config detection to recognize Kohya format + LoKR suffixes - Flux Kohya exclusion (lora_unet_double_blocks, lora_unet_single_blocks) - Test model for Kohya LoKR identification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added python invocations backend labels Mar 27, 2026

lstein force-pushed the feat/qwen-image-2512 branch from d7ff2ef to 1426ede Compare March 28, 2026 02:17

github-actions bot added services frontend labels Mar 28, 2026

lstein force-pushed the feat/qwen-image-2512 branch from dfe597f to 2f10d83 Compare March 28, 2026 02:53

lstein and others added 16 commits March 27, 2026 22:57

chore: ruff & lint:prettier

8b9e36f

fix: remove unused frontend exports (zQwenImageVariantType, isQwenIma…

25b45ca

…geEditMainModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove unnecessary async from QwenImageComponentSource parse

18d038c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: ruff

5c6ca30

github-actions bot added the python-tests label Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen Image 2512 txt2img support#132

feat: add Qwen Image 2512 txt2img support#132
lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
feat/qwen-image-2512

lstein commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lstein commented Mar 27, 2026

Summary

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant