feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
Conversation
|
@copilot When I attempt to run a generation with Qwen Image (Diffusers version). I get this stack trace: |
The crash was caused by Fixed in |
|
@copilot Can you figure out where the WARNING in the fourth line of log output is coming from and remove it? |
The warning came from |
|
@copilot I'm getting the following error upon generating with the model: |
Fixed in |
|
@copilot I'm still getting tokenizer errors: |
... Fixed in |
|
@copilot |
Fixed in Processor loading: Replaced Reference images UI: Added txt2img without reference images: No warning or block when no reference images are set — the text encoder works with an empty list, so txt2img works fine. |
|
@copilot New error: |
e897fa0 to
bc82599
Compare
- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype before torch_dtype for forward compat - Denoise: load scheduler config from disk with GGUF fallback, inline calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens - Text encoder: resize reference images to ~512x512 before VL encoding to prevent vision tokens from overwhelming the text prompt - Picker badges: wrap to next line instead of truncating labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove module-level cache for quantized encoders — load fresh each invocation and free VRAM via cleanup callback (gc + empty_cache) - Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the base model type from "qwen-image-edit" to "qwen-image" to
reflect that the Qwen Image family includes both txt2img and image
editing models. The edit models are a specific use case within the
broader Qwen Image architecture.
- BaseModelType.QwenImageEdit -> BaseModelType.QwenImage ("qwen-image")
- All Python files, classes, variables, and invocation names renamed
- All TypeScript/React components, selectors, and state fields renamed
- Frontend display: "Qwen Image" in model picker, "QwenImg" badge
- Starter model bundle: "Qwen Image"
- File renames: qwen_image_edit_* -> qwen_image_*
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- QwenImageVariantType enum: Generate (txt2img) and Edit (image editing) - Diffusers models: auto-detect variant from model_index.json pipeline class (QwenImagePipeline → Generate, QwenImageEditPlusPipeline → Edit) - GGUF models: default to Generate (can't detect from state dict) - Frontend: hide reference image panel when a Generate variant is selected - Variant display names: "Qwen Image" / "Qwen Image Edit" - ModelRecordChanges: include QwenImageVariantType in variant union Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents variable name collisions when the txt2img branch adds qwen_image_* variables for the Qwen Image 2512 models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add optional variant field to StarterModelWithoutDependencies - Tag all Qwen Image Edit starter models (Diffusers + GGUF) with variant=QwenImageVariantType.Edit - Frontend passes variant through to the install endpoint config so GGUF edit models get the correct variant set on install Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add stripped test models for: - Qwen Image Lightning LoRA (transformer_blocks.* keys) - Qwen Image community LoRA (transformer.transformer_blocks.* keys) Both should be detected as base=qwen-image, type=lora, format=lycoris. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously the graph builder passed the output canvas dimensions to the I2L node, which resized the reference image to match — distorting its aspect ratio when they differed. Now the reference is encoded at its native size. The denoise node already handles dimension mismatches via bilinear interpolation in latent space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… without component source Addresses two reviewer findings: 1. denoising_start/denoising_end were ignored — the full sigma schedule was always used regardless of img2img strength. Now clip the scheduler's sigmas to the fractional range before stepping, and use manual Euler steps with the clipped schedule (scheduler.step() can't handle clipped schedules due to internal index tracking). 2. GGUF Qwen Image models could be enqueued without a Component Source, deferring the error to runtime. Added readiness checks on both the Generate and Canvas tabs that block enqueue when a GGUF model is selected but no Diffusers component source is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Denoise tests (13): - _prepare_cfg_scale: scalar, list, mismatch, invalid type - _compute_sigmas: default schedule, shift override, terminal stretch, monotonicity, step counts, image_seq_len affecting mu - _pack/_unpack_latents: roundtrip, shape verification Model loader tests (5): - Diffusers model extracts all components from itself - Diffusers model ignores component_source when provided - GGUF with Diffusers component source succeeds - GGUF without component source raises ValueError - GGUF with non-Diffusers source raises ValueError Text encoder tests (13): - _build_prompt: 0/1/many images, template structure, special chars - _resize_for_vl_encoder: large/small images, aspect ratio preservation, dimension rounding, square/portrait/landscape orientations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All invocation nodes work with both Qwen Image (txt2img) and Qwen Image Edit models. Rename titles and docstrings from "Qwen Image Edit" to "Qwen Image" to avoid confusion. Also remove duplicate GGUF readiness check in the Generate tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The negative conditioning node was always added to the graph, causing the text encoder to be loaded twice even when CFG=1 (where the negative prediction is unused). Now only adds the negative node when cfg_scale > 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kohya LoRAs use underscore-separated keys like lora_unet_transformer_blocks_0_attn_to_k.lokr_w1 instead of the diffusers dot-separated format. Add: - Kohya key detection (lora_unet_transformer_blocks_*) - Key conversion mapping from Kohya underscores to model dot-paths - Updated LoRA config detection to recognize Kohya format + LoKR suffixes - Flux Kohya exclusion (lora_unet_double_blocks, lora_unet_single_blocks) - Test model for Kohya LoKR identification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete implementation of the Qwen Image Edit 2511 pipeline for InvokeAI, including text-to-image generation, image editing with reference images, LoRA support (including Lightning distillation), GGUF quantized transformers, and BitsAndBytes encoder quantization.
Key Features
Backend Changes
zero_cond_tmodulation, LoRA application via LayerPatcher with sidecar patching for GGUF, shift override for LightningModelLoader),zero_cond_t=True, correctin_channels>=4.56.0(the video processor fallback imports already handle this)Frontend Changes
qwenImageEditComponentSource,qwenImageEditQuantization,qwenImageEditShiftin params slice with persistence and model-switch cleanupFunctional Testing Guide
1. Text-to-Image Generation (Basic)
2. GGUF Quantized Transformer
3. BitsAndBytes Encoder Quantization
4. LoRA Support
5. Image Editing with Reference Image
6. Multiple Reference Images
7. Model Switching Cleanup
🤖 Generated with Claude Code