Multi Model Conversation v2
The tool is functionally good, but there are a few avoidable performance issues in the hot path that become noticeable with long conversations, multiple participants, manager mode, and tool use.
This issue is only about low-risk performance / quality fixes that should not change the tool's behavior.
Main problems
- Full transcript is rebuilt and re-sent on every streamed chunk
In pipe() / _stream_and_accumulate(), the tool keeps appending to total_emitted and calls emit_replace(total_emitted) repeatedly for:
- participant titles
- streamed content chunks
- reasoning updates
- tool execution details
- fallback content
This means each new chunk sends the entire accumulated conversation again, so cost grows with transcript size. On long runs this becomes unnecessarily expensive in UI rendering, CPU, and memory.
Why this matters:
- latency grows over time
- browser/UI updates get heavier every turn
- manager mode and multi-round conversations amplify the problem
Low-risk fix:
- avoid duplicate
emit_replace() calls when content has not changed
- throttle / batch replace updates during streaming
- keep the same UX, just reduce full-message replaces
- Repeated per-turn model lookups/checks in the hot path
Inside the participant loop, the tool repeatedly does work that could be resolved once per model / participant at setup time, for example:
Models.get_model_by_id(model)
- checking native function calling support
- re-reading model system prompt / params
- repeated feature/tool-related lookups
Why this matters:
- adds avoidable DB/runtime overhead every turn
- gets worse with more rounds and participants
Low-risk fix:
- cache per-participant/per-model metadata once at the start of the pipe run
- reuse cached values in the conversation loop
- Tool/model setup work is heavier than necessary per run
The participant setup phase merges tool IDs and features from multiple sources (MODELS runtime state + DB model info + metadata/body fallbacks), then loads built-in/imported tools.
This is correct functionally, but it does more work than necessary and could be simplified/cached within a single run.
Why this matters:
- adds startup latency before the conversation even begins
- especially painful in environments where models may also load/unload between turns
Low-risk fix:
- normalize and cache participant config once
- avoid reprocessing the same participant model more than needed in one execution
- keep current behavior, just reduce repeated setup work
- Logging is too verbose in hot paths
There are many logger.info(...) calls in per-run / per-model / tool-loading paths. For a production interactive tool this creates unnecessary journal noise and some overhead.
Low-risk fix:
- downgrade repetitive operational logs from
info to debug
- keep only important lifecycle events at
info
Not part of this issue
- Redesigning manager mode behavior
- Changing conversation logic / rounds / participant order
- Removing tool support
- Solving backend-specific model load/unload behavior completely
Note on environment-dependent slowness
There is also an environment/runtime factor: if different models are constantly loaded/unloaded between turns, that will always add latency. This issue is not asking to solve that completely, only to remove avoidable overhead from the tool itself so that those environments suffer less.
Suggested acceptance criteria
- Long conversations no longer trigger a full
emit_replace() flood for every tiny chunk
- Per-participant model metadata is resolved once and reused
- No behavior change in conversation flow, tools, or manager mode
- Hot-path logs are reduced to reasonable levels
Multi Model Conversation v2
The tool is functionally good, but there are a few avoidable performance issues in the hot path that become noticeable with long conversations, multiple participants, manager mode, and tool use.
This issue is only about low-risk performance / quality fixes that should not change the tool's behavior.
Main problems
In
pipe()/_stream_and_accumulate(), the tool keeps appending tototal_emittedand callsemit_replace(total_emitted)repeatedly for:This means each new chunk sends the entire accumulated conversation again, so cost grows with transcript size. On long runs this becomes unnecessarily expensive in UI rendering, CPU, and memory.
Why this matters:
Low-risk fix:
emit_replace()calls when content has not changedInside the participant loop, the tool repeatedly does work that could be resolved once per model / participant at setup time, for example:
Models.get_model_by_id(model)Why this matters:
Low-risk fix:
The participant setup phase merges tool IDs and features from multiple sources (
MODELSruntime state + DB model info + metadata/body fallbacks), then loads built-in/imported tools.This is correct functionally, but it does more work than necessary and could be simplified/cached within a single run.
Why this matters:
Low-risk fix:
There are many
logger.info(...)calls in per-run / per-model / tool-loading paths. For a production interactive tool this creates unnecessary journal noise and some overhead.Low-risk fix:
infotodebuginfoNot part of this issue
Note on environment-dependent slowness
There is also an environment/runtime factor: if different models are constantly loaded/unloaded between turns, that will always add latency. This issue is not asking to solve that completely, only to remove avoidable overhead from the tool itself so that those environments suffer less.
Suggested acceptance criteria
emit_replace()flood for every tiny chunk