feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI by lstein · Pull Request #131 · lstein/InvokeAI

lstein · 2026-03-24T22:04:32Z

Summary

Complete implementation of the Qwen Image Edit 2511 pipeline for InvokeAI, including text-to-image generation, image editing with reference images, LoRA support (including Lightning distillation), GGUF quantized transformers, and BitsAndBytes encoder quantization.

Key Features

Text-to-image and image editing via Qwen Image Edit 2511 model
GGUF quantized transformer support with separate Diffusers model as component source for VAE/text encoder
LoRA support including Lightning distillation LoRAs for fast 4/8-step generation
BitsAndBytes encoder quantization (int8/nf4) to reduce VRAM usage for the Qwen2.5-VL text encoder
Shift override for Lightning LoRA sigma schedules
Starter models: full Diffusers model, 4 GGUF quantization levels, 2 Lightning LoRAs, and a bundle
Frontend UI: Advanced panel with component source selector, encoder quantization dropdown, shift override control; scheduler hidden for this model type

Backend Changes

Denoise: FlowMatchEulerDiscreteScheduler integration, 2x2 patch packing/unpacking, reference latent concatenation along sequence dim, zero_cond_t modulation, LoRA application via LayerPatcher with sidecar patching for GGUF, shift override for Lightning
Text encoder: Edit-specific system prompt template, vision token expansion per reference image, hidden state extraction with token trimming, attention mask output, BitsAndBytes quantization support
VAE encode/decode: AutoencoderKLQwenImage with per-channel latents_mean/latents_std normalization, 5D frame dimension, pixel-space resize before encoding
Model loader: Component source field for GGUF models to provide VAE + text encoder from a separate Diffusers model
LoRA: Loader invocations (single + collection), config detection, conversion utils, prefix constants, registered in factory discriminated union
GGUF loader fixes: Correct base class (ModelLoader), zero_cond_t=True, correct in_channels
Starter models: Lightning LoRAs (4-step, 8-step bf16), model bundle
transformers requirement: Downgraded back to >=4.56.0 (the video processor fallback imports already handle this)

Frontend Changes

Advanced settings: Component source selector (Diffusers model for GGUF), encoder quantization dropdown (None/int8/nf4), shift override checkbox+slider
Generation settings: Scheduler hidden for Qwen Image Edit
Graph builder: LoRA wiring via collection loader, reference image VAE encoding with pixel-space resize, shift/quantization passthrough
State management: qwenImageEditComponentSource, qwenImageEditQuantization, qwenImageEditShift in params slice with persistence and model-switch cleanup

Functional Testing Guide

1. Text-to-Image Generation (Basic)

Install the "Qwen Image Edit 2511" Diffusers model from Starter Models
Select it as the main model
Enter a prompt (e.g., "Persian cat sitting on a red velvet cushion")
Set Steps=30, CFG=4
Generate — should produce a coherent image matching the prompt

2. GGUF Quantized Transformer

Install a GGUF variant (e.g., Q4_K_M) from Starter Models
Also install the full Diffusers model (needed as Component Source)
Select the GGUF model as the main model
In Advanced settings, set "Component Source (Diffusers)" to the full Diffusers model
Generate with the same prompt — quality should be close to the full model

3. BitsAndBytes Encoder Quantization

With the full Diffusers model selected, open Advanced settings
Set "Encoder Quantization" to "4-bit (nf4)"
Generate — should produce similar quality with reduced VRAM for the text encoder
Test "8-bit (int8)" as well

4. LoRA Support

Install "Qwen Image Edit Lightning (4-step, bf16)" from Starter Models
Enable the LoRA in the LoRA panel with weight=1.0
Set Steps=4, CFG=1
In Advanced, check "Shift Override" and set to 3.0
Generate — should produce a coherent image in 4 steps (~10x faster)
Test with "8-step" variant (Steps=8, CFG=1, Shift=3)

5. Image Editing with Reference Image

Add a Qwen Image Edit reference image in the reference images panel
Set the canvas dimensions to match the reference image's aspect ratio
Enter an edit prompt (e.g., "Change the background to a beach scene")
Generate with Steps=30, CFG=1 (matching the diffusers pipeline default)
The output should preserve the reference image's content while applying the edit
Note: Edit quality varies by seed and prompt complexity — try different seeds if results are unsatisfactory

6. Multiple Reference Images

Add 2+ reference images
All images are sent to the text encoder for vision-language conditioning
The first reference image is VAE-encoded as spatial conditioning for the transformer

7. Model Switching Cleanup

Switch from Qwen Image Edit to another model type (e.g., FLUX)
The component source, quantization, and shift settings should be cleared automatically
Switch back — settings should need to be re-selected

🤖 Generated with Claude Code

lstein · 2026-03-25T03:27:08Z

@copilot When I attempt to run a generation with Qwen Image (Diffusers version). I get this stack trace:

[2026-03-24 23:25:33,875]::[InvokeAI]::ERROR --> Error while invoking session 58d5a101-921c-4f77-a1e3-2df705e3e458, invocation 90bb0da8-c432-4197-9b93-42e9a8fea02f (qwen_image_edit_text_encoder): Received a NoneType for argument image_processor, but a ImageProcessingMixin was expected.
[2026-03-24 23:25:33,875]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/services/session_processor/session_processor_default.py", line 130, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/baseinvocation.py", line 244, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 56, in invoke
    conditioning_data = ConditioningFieldData(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 94, in _encode
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/models/qwen2_vl/processing_qwen2_vl.py", line 91, in __init__
    super().__init__(image_processor, tokenizer, video_processor, chat_template=chat_template)
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 534, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 614, in check_argument_for_proper_class
    raise TypeError(
TypeError: Received a NoneType for argument image_processor, but a ImageProcessingMixin was expected.

Copilot · 2026-03-25T03:34:58Z

@copilot When I attempt to run a generation with Qwen Image (Diffusers version). I get this stack trace:
[2026-03-24 23:25:33,875]::[InvokeAI]::ERROR --> Error while invoking session 58d5a101-92...

The crash was caused by Qwen2VLProcessor(tokenizer=tokenizer, image_processor=None) in the fallback branch. The tokenizer submodel for this model is a Qwen2_5_VLProcessor (Qwen2.5-VL class), not Qwen2VLProcessor, so the isinstance check was failing and hitting the broken fallback.

Fixed in 666fdfa by replacing the broken isinstance check + manual constructor with AutoProcessor.from_pretrained() using the tokenizer submodel path — the same pattern used in llava_onevision_vllm.py. This correctly loads whichever processor class the model uses with all components (image_processor, tokenizer, video_processor) properly initialized.

lstein · 2026-03-25T13:50:35Z

@copilot Can you figure out where the WARNING in the fourth line of log output is coming from and remove it?
``
$ invokeai-web
[2026-03-25 09:49:24,865]::[InvokeAI]::INFO --> Using torch device: NVIDIA Graphics Device
[2026-03-25 09:49:24,869]::[InvokeAI]::INFO --> cuDNN version: 90701
[2026-03-25 09:49:25,315]::[InvokeAI]::INFO --> Patchmatch initialized
[2026-03-25 09:49:26,223]::[InvokeAI]::WARNING --> "default_factory" is not supported, calling it now to set "default"
[2026-03-25 09:49:26,692]::[InvokeAI]::INFO --> InvokeAI version 6.12.0
[2026-03-25 09:49:26,692]::[InvokeAI]::INFO --> Root directory = /home/lstein/invokeai-lstein
[2026-03-25 09:49:26,693]::[InvokeAI]::INFO --> Initializing in-memory database
[2026-03-25 09:49:26,694]::[InvokeAI]::INFO --> Database update needed
[2026-03-25 09:49:26,694]::[InvokeAI]::INFO --> Using in-memory database, no backup needed
[2026-03-25 09:49:26,698]::[InvokeAI]::INFO --> Removing models/.cache directory. Converted models will now be cached in .convert_cache.
[2026-03-25 09:49:26,698]::[InvokeAI]::INFO --> Removing legacy just-in-time models. Downloaded models will now be cached in .download_cache.
[2026-03-25 09:49:26,698]::[InvokeAI]::INFO --> Removing defunct core models.
[2026-03-25 09:49:26,701]::[InvokeAI]::INFO --> Removing UNIQUE(name, base, type) constraint from models table

Copilot · 2026-03-25T13:52:37Z

@copilot Can you figure out where the WARNING in the fourth line of log output is coming from and remove it?
``
$ invokeai-web
...

The warning came from invokeai/app/invocations/fields.py:744 — InputField() doesn't support default_factory and logs a warning when it receives one. The reference_images field in qwen_image_edit_text_encoder.py was using default_factory=list instead of default=[] (which is the pattern used everywhere else in the codebase). Fixed in 62aebab.

lstein · 2026-03-25T14:59:33Z

@copilot I'm getting the following error upon generating with the model:

ValueError: Unrecognized model in /home/lstein/invokeai-main/models/a28d6553-3e88-4d32-a006-24a5f3cc4144/tokenizer. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: aimv2, aimv2_vision_model, albert, align, altclip, apertus, arcee, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, bitnet, blenderbot, blenderbot-small, blip, blip-2, blip_2_qformer, bloom, blt, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, cohere2_vision, colpali, colqwen2, conditional_detr, convbert, convnext, convnextv2, cpmant, csm, ctrl, cvt, d_fine, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v2, deepseek_v3, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, deta, detr, dia, diffllama, dinat, dinov2, dinov2_with_registers, dinov3_convnext, dinov3_vit, distilbert, doge, donut-swin, dots1, dpr, dpt, edgetam, edgetam_video, edgetam_vision_model, efficientformer, efficientloftr, efficientnet, electra, emu3, encodec, encoder-decoder, eomt, ernie, ernie4_5, ernie4_5_moe, ernie_m, esm, evolla, exaone4, falcon, falcon_h1, falcon_mamba, fastspeech2_conformer, fastspeech2_conformer_with_hifigan, flaubert, flava, flex_olmo, florence2, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, gemma3n, gemma3n_audio, gemma3n_text, gemma3n_vision, git, glm, glm4, glm4_moe, glm4v, glm4v_moe, glm4v_moe_text, glm4v_text, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gpt_oss, gptj, gptsan-japanese, granite, granite_speech, granitemoe, granitemoehybrid, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hgnet_v2, hiera, hubert, hunyuan_v1_dense, hunyuan_v1_moe, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, internvl, internvl_vision, jamba, janus, jetmoe, jukebox, kosmos-2, kosmos-2.5, kyutai_speech_to_text, layoutlm, layoutlmv2, layoutlmv3, led, levit, lfm2, lfm2_vl, lightglue, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longcat_flash, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, metaclip_2, mgp-str, mimi, minimax, ministral, mistral, mistral3, mixtral, mlcd, mllama, mm-grounding-dino, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, modernbert-decoder, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmo3, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, ovis2, owlv2, owlvit, paligemma, parakeet_ctc, parakeet_encoder, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, perception_encoder, perception_lm, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen2_vl_text, qwen3, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl, qwen3_vl_moe, qwen3_vl_moe_text, qwen3_vl_text, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam2, sam2_hiera_det_model, sam2_video, sam2_vision_model, sam_hq, sam_hq_vision_model, sam_vision_model, seamless_m4t, seamless_m4t_v2, seed_oss, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip2_vision_model, siglip_vision_model, smollm3, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, t5gemma, table-transformer, tapas, textnet, time_series_transformer, timesfm, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, vaultgemma, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, vjepa2, voxtral, voxtral_encoder, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xcodec, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xlstm, xmod, yolos, yoso, zamba, zamba2, zoedepth

Copilot · 2026-03-25T15:02:03Z

@copilot I'm getting the following error upon generating with the model:
ValueError: Unrecognized model in /home/lstein/invokeai-main/models/a28d6553-3e88-4d32-a006-24a5f3cc414...

Fixed in 8bbb06d. The tokenizer subdirectory contains tokenizer/preprocessor files but no config.json with model_type, which AutoProcessor requires to identify the processor class via AutoConfig. Switched to Qwen2_5_VLProcessor.from_pretrained directly — it loads from the tokenizer subdirectory without needing that type-detection step.

lstein · 2026-03-25T15:08:49Z

@copilot I'm still getting tokenizer errors:


Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/services/session_processor/session_processor_default.py", line 130, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/baseinvocation.py", line 244, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 53, in invoke
    prompt_embeds = self._encode(context, pil_images)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 81, in _encode
    processor = Qwen2_5_VLProcessor.from_pretrained(str(tokenizer_abs_path), local_files_only=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1394, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1453, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 489, in from_pretrained
    raise initial_exception
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 476, in from_pretrained
    config_dict, _ = ImageProcessingMixin.get_image_processor_dict(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/image_processing_base.py", line 361, in get_image_processor_dict
    raise OSError(
OSError: Can't load image processor for '/home/lstein/invokeai-main/models/a28d6553-3e88-4d32-a006-24a5f3cc4144/tokenizer'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/home/lstein/invokeai-main/models/a28d6553-3e88-4d32-a006-24a5f3cc4144/tokenizer' is the correct path to a directory containing a preprocessor_config.json file

Copilot · 2026-03-25T15:11:05Z

@copilot I'm still getting tokenizer errors:

...

Fixed in ab7e1ec. The tokenizer/ subdirectory only holds tokenizer files — preprocessor_config.json lives at the model root. Removed the / "tokenizer" suffix so Qwen2_5_VLProcessor.from_pretrained is now called on the root directory. It finds preprocessor_config.json (image processor) at the root and tokenizer files inside tokenizer/ automatically, while still skipping the model_type detection that AutoProcessor requires.

lstein · 2026-03-25T15:17:59Z

@copilot
The next error involves the image processor. Note that I have not loaded any reference images because the user interface is not letting me add reference images for this model type. Could you modify the UI to allow reference images to be uploaded? Please see the FLUX2 model support for an example of the way to do this.
Also, please handle the scenario in which the user invokes the model without providing any reference images; let's see if the model can perform txt2image without a reference image to work on.

[2026-03-25 11:13:46,158]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/image_processing_base.py", line 354, in get_image_processor_dict
    resolved_image_processor_file = resolved_image_processor_files[0]
                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Copilot · 2026-03-25T15:40:35Z

@copilot
The next error involves the image processor. Note that I have not loaded any reference images because the user interface is not letting me add reference images for this mod...

Fixed in dec4a46.

Processor loading: Replaced Qwen2_5_VLProcessor.from_pretrained(root_dir) with a component-based approach that searches for preprocessor_config.json in three locations (tokenizer/, model root, image_processor/) and falls back to Qwen2_5_VLImageProcessor() defaults if not found. This works regardless of how the model was downloaded.

Reference images UI: Added qwen_image_edit_reference_image support following the FLUX2 pattern — clicking "Add Reference Image" when a Qwen model is selected now shows a reference image slot with just an image picker (no model selector needed). Reference images are collected and wired into the text encoder's reference_images input in the graph.

txt2img without reference images: No warning or block when no reference images are set — the text encoder works with an empty list, so txt2img works fine.

lstein · 2026-03-25T16:13:54Z

@copilot New error:

[2026-03-25 12:13:00,527]::[InvokeAI]::ERROR --> Error while invoking session aeec3262-439f-4507-9207-fbf879faae9d, invocation 46994740-f85e-4f29-bb6c-ededfd5c1e00 (qwen_image_edit_text_encoder): Received a NoneType for argument video_processor, but a BaseVideoProcessor was expected.
[2026-03-25 12:13:00,528]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/services/session_processor/session_processor_default.py", line 130, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/baseinvocation.py", line 244, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 52, in invoke
    prompt_embeds = self._encode(context, pil_images)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/Projects/InvokeAI-lstein/invokeai/app/invocations/qwen_image_edit_text_encoder.py", line 101, in _encode
    processor = Qwen2_5_VLProcessor(tokenizer=tokenizer, image_processor=image_processor)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 96, in __init__
    super().__init__(image_processor, tokenizer, video_processor, chat_template=chat_template)
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 534, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
  File "/home/lstein/invokeai-lstein/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 614, in check_argument_for_proper_class
    raise TypeError(
TypeError: Received a NoneType for argument video_processor, but a BaseVideoProcessor was expected.

- GGUF loader: handle zero_cond_t absence in diffusers 0.36, try dtype before torch_dtype for forward compat - Denoise: load scheduler config from disk with GGUF fallback, inline calculate_shift to avoid pipeline import, remove deprecated txt_seq_lens - Text encoder: resize reference images to ~512x512 before VL encoding to prevent vision tokens from overwhelming the text prompt - Picker badges: wrap to next line instead of truncating labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove module-level cache for quantized encoders — load fresh each invocation and free VRAM via cleanup callback (gc + empty_cache) - Suppress harmless BnB MatMul8bitLt bfloat16→float16 cast warning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename the base model type from "qwen-image-edit" to "qwen-image" to reflect that the Qwen Image family includes both txt2img and image editing models. The edit models are a specific use case within the broader Qwen Image architecture. - BaseModelType.QwenImageEdit -> BaseModelType.QwenImage ("qwen-image") - All Python files, classes, variables, and invocation names renamed - All TypeScript/React components, selectors, and state fields renamed - Frontend display: "Qwen Image" in model picker, "QwenImg" badge - Starter model bundle: "Qwen Image" - File renames: qwen_image_edit_* -> qwen_image_* Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- QwenImageVariantType enum: Generate (txt2img) and Edit (image editing) - Diffusers models: auto-detect variant from model_index.json pipeline class (QwenImagePipeline → Generate, QwenImageEditPlusPipeline → Edit) - GGUF models: default to Generate (can't detect from state dict) - Frontend: hide reference image panel when a Generate variant is selected - Variant display names: "Qwen Image" / "Qwen Image Edit" - ModelRecordChanges: include QwenImageVariantType in variant union Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The variant field with a default value was appended to the discriminator tag (e.g. main.gguf_quantized.qwen-image.generate), breaking model detection for GGUF and Diffusers models. Making variant optional with default=None restores the correct tags (main.gguf_quantized.qwen-image). The variant is still set during Diffusers model probing via _get_qwen_image_variant() and can be manually set for GGUF models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevents variable name collisions when the txt2img branch adds qwen_image_* variables for the Qwen Image 2512 models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…URLs The global rename sed changed 'qwen-image-edit-2511' to 'qwen-image-2511' inside the HuggingFace URLs, but the actual files on HF still have 'edit' in their names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add optional variant field to StarterModelWithoutDependencies - Tag all Qwen Image Edit starter models (Diffusers + GGUF) with variant=QwenImageVariantType.Edit - Frontend passes variant through to the install endpoint config so GGUF edit models get the correct variant set on install Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The txt2img model doesn't use zero_cond_t — setting it causes the transformer to double the timestep batch and create modulation indices for non-existent reference patches, producing noise output. Now checks the config variant before enabling it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Flux PEFT LoRAs use transformer.single_transformer_blocks.* keys which contain "transformer_blocks." as a substring, falsely matching the Qwen Image LoRA detection. Add single_transformer_blocks to the Flux exclusion set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add stripped test models for: - Qwen Image Lightning LoRA (transformer_blocks.* keys) - Qwen Image community LoRA (transformer.transformer_blocks.* keys) Both should be detected as base=qwen-image, type=lora, format=lycoris. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously the graph builder passed the output canvas dimensions to the I2L node, which resized the reference image to match — distorting its aspect ratio when they differed. Now the reference is encoded at its native size. The denoise node already handles dimension mismatches via bilinear interpolation in latent space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… without component source Addresses two reviewer findings: 1. denoising_start/denoising_end were ignored — the full sigma schedule was always used regardless of img2img strength. Now clip the scheduler's sigmas to the fractional range before stepping, and use manual Euler steps with the clipped schedule (scheduler.step() can't handle clipped schedules due to internal index tracking). 2. GGUF Qwen Image models could be enqueued without a Component Source, deferring the error to runtime. Added readiness checks on both the Generate and Canvas tabs that block enqueue when a GGUF model is selected but no Diffusers component source is configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Denoise tests (13): - _prepare_cfg_scale: scalar, list, mismatch, invalid type - _compute_sigmas: default schedule, shift override, terminal stretch, monotonicity, step counts, image_seq_len affecting mu - _pack/_unpack_latents: roundtrip, shape verification Model loader tests (5): - Diffusers model extracts all components from itself - Diffusers model ignores component_source when provided - GGUF with Diffusers component source succeeds - GGUF without component source raises ValueError - GGUF with non-Diffusers source raises ValueError Text encoder tests (13): - _build_prompt: 0/1/many images, template structure, special chars - _resize_for_vl_encoder: large/small images, aspect ratio preservation, dimension rounding, square/portrait/landscape orientations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All invocation nodes work with both Qwen Image (txt2img) and Qwen Image Edit models. Rename titles and docstrings from "Qwen Image Edit" to "Qwen Image" to avoid confusion. Also remove duplicate GGUF readiness check in the Generate tab. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The negative conditioning node was always added to the graph, causing the text encoder to be loaded twice even when CFG=1 (where the negative prediction is unused). Now only adds the negative node when cfg_scale > 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Kohya LoRAs use underscore-separated keys like lora_unet_transformer_blocks_0_attn_to_k.lokr_w1 instead of the diffusers dot-separated format. Add: - Kohya key detection (lora_unet_transformer_blocks_*) - Key conversion mapping from Kohya underscores to model dot-paths - Updated LoRA config detection to recognize Kohya format + LoKR suffixes - Flux Kohya exclusion (lora_unet_double_blocks, lora_unet_single_blocks) - Test model for Kohya LoKR identification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added python invocations backend frontend api labels Mar 24, 2026

Copilot started work on behalf of lstein March 25, 2026 03:27 View session

Copilot finished work on behalf of lstein March 25, 2026 03:36

Copilot started work on behalf of lstein March 25, 2026 13:50 View session

Copilot finished work on behalf of lstein March 25, 2026 13:54

Copilot started work on behalf of lstein March 25, 2026 14:40 View session

Copilot finished work on behalf of lstein March 25, 2026 14:44

Copilot started work on behalf of lstein March 25, 2026 14:59 View session

Copilot finished work on behalf of lstein March 25, 2026 15:03

Copilot started work on behalf of lstein March 25, 2026 15:09 View session

Copilot finished work on behalf of lstein March 25, 2026 15:12

Copilot started work on behalf of lstein March 25, 2026 15:18 View session

Copilot finished work on behalf of lstein March 25, 2026 15:42

Copilot started work on behalf of lstein March 25, 2026 16:00 View session

Copilot finished work on behalf of lstein March 25, 2026 16:03

lstein force-pushed the feat/qwen-image-edit-2511 branch from e897fa0 to bc82599 Compare March 27, 2026 01:53

lstein and others added 6 commits March 26, 2026 23:18

chore: ruff & uv lock

5cafafe

Merge branch 'main' into feat/qwen-image-edit-2511

073b08b

fix: bypass HfApi in model install test

c9f781e

feat(backend): add support for a Comfy LoRA variant

ceaf97d

lstein mentioned this pull request Mar 27, 2026

feat: add Qwen Image 2512 txt2img support #132

Open

lstein and others added 10 commits March 27, 2026 20:32

fix: rename edit starter model variables to qwen_image_edit_* prefix

87bda37

Prevents variable name collisions when the txt2img branch adds qwen_image_* variables for the Qwen Image 2512 models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove unused exports (zQwenImageVariantType, isQwenImageEditMai…

62ccb75

…nModelConfig) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added the python-tests label Mar 28, 2026

lstein and others added 11 commits March 28, 2026 01:33

chore: ruff

153f260

chore: ruff

2d89174

chore: ruff

61d3a79

chore: ruff

3d92014

Merge branch 'main' into feat/qwen-image-edit-2511

ef4e41c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131

feat: Qwen Image Edit 2511 — full pipeline with LoRA, GGUF, quantization, and UI#131
lstein wants to merge 28 commits intomainfrom
feat/qwen-image-edit-2511

lstein commented Mar 24, 2026 •

edited

Loading

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lstein commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Backend Changes

Frontend Changes

Functional Testing Guide

1. Text-to-Image Generation (Basic)

2. GGUF Quantized Transformer

3. BitsAndBytes Encoder Quantization

4. LoRA Support

5. Image Editing with Reference Image

6. Multiple Reference Images

7. Model Switching Cleanup

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Copilot AI commented Mar 25, 2026

Uh oh!

lstein commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lstein commented Mar 24, 2026 •

edited

Loading