feat: add Qwen Image 2512 txt2img support#132
Open
lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
Open
feat: add Qwen Image 2512 txt2img support#132lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
lstein wants to merge 17 commits intofeat/qwen-image-edit-2511from
Conversation
d7ff2ef to
1426ede
Compare
Shares the QwenImageEdit base type and infrastructure with the edit model. Key changes: - Text encoder: auto-selects prompt template based on reference images — edit template (drop_idx=64) when images present, generate template (drop_idx=34) when absent - Denoise: detects zero_cond_t to determine whether to concatenate reference latents; txt2img models pass only noisy patches with a single-entry img_shapes - Model config: accept QwenImagePipeline in addition to QwenImageEditPlusPipeline - LoRA: handle "transformer." key prefix from some training frameworks, add to config detection - Starter models: Qwen-Image-2512 full + 4 GGUF variants + Lightning V2.0 LoRAs (4-step, 8-step), all added to the Qwen Image Edit bundle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dfe597f to
2f10d83
Compare
…geEditMainModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The rename from qwen_image_edit -> qwen_image caused variable name collisions with the txt2img starter models. Give edit models the qwen_image_edit_* prefix to distinguish from qwen_image_* (txt2img). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When switching from an edit model to a generate model, reference images remain in state but the panel is hidden. Prevent them from being passed to the text encoder and VAE encoder by checking the model variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n, shift) - Save qwen_image_component_source, qwen_image_quantization, and qwen_image_shift in generation metadata - Add metadata recall handlers so remix/recall restores these settings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the graph builder passed the output canvas dimensions to the I2L node, which resized the reference image to match — distorting its aspect ratio when they differed. Now the reference is encoded at its native size. The denoise node already handles dimension mismatches via bilinear interpolation in latent space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… without component source Addresses two reviewer findings: 1. denoising_start/denoising_end were ignored — the full sigma schedule was always used regardless of img2img strength. Now clip the scheduler's sigmas to the fractional range before stepping, and use manual Euler steps with the clipped schedule (scheduler.step() can't handle clipped schedules due to internal index tracking). 2. GGUF Qwen Image models could be enqueued without a Component Source, deferring the error to runtime. Added readiness checks on both the Generate and Canvas tabs that block enqueue when a GGUF model is selected but no Diffusers component source is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All invocation nodes work with both Qwen Image (txt2img) and Qwen Image Edit models. Rename titles and docstrings from "Qwen Image Edit" to "Qwen Image" to avoid confusion. Also remove duplicate GGUF readiness check in the Generate tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The negative conditioning node was always added to the graph, causing the text encoder to be loaded twice even when CFG=1 (where the negative prediction is unused). Now only adds the negative node when cfg_scale > 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kohya LoRAs use underscore-separated keys like lora_unet_transformer_blocks_0_attn_to_k.lokr_w1 instead of the diffusers dot-separated format. Add: - Kohya key detection (lora_unet_transformer_blocks_*) - Key conversion mapping from Kohya underscores to model dot-paths - Updated LoRA config detection to recognize Kohya format + LoKR suffixes - Flux Kohya exclusion (lora_unet_double_blocks, lora_unet_single_blocks) - Test model for Kohya LoKR identification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Qwen Image 2512 text-to-image support by reusing the existing Qwen Image Edit infrastructure. Both models share the same base type (
qwen-image-edit) since they use identical architecture (transformer, VAE, text encoder, scheduler).Depends on: #131 (Qwen Image Edit 2511)
Changes
zero_cond_ton the transformer to decide whether to concatenate reference latents. Txt2img models (zero_cond_t=False) pass only noisy patches with a single-entryimg_shapes.QwenImagePipelinein addition toQwenImageEditPlusPipelinefor Diffusers model detection.transformer.key prefix from some training frameworks; updated config detection.Testing
🤖 Generated with Claude Code