Skip to content

Performance hot-path issues in Multi Model Conversation v2 #65

@dhaern

Description

@dhaern

Multi Model Conversation v2

The tool is functionally good, but there are a few avoidable performance issues in the hot path that become noticeable with long conversations, multiple participants, manager mode, and tool use.

This issue is only about low-risk performance / quality fixes that should not change the tool's behavior.

Main problems

  1. Full transcript is rebuilt and re-sent on every streamed chunk

In pipe() / _stream_and_accumulate(), the tool keeps appending to total_emitted and calls emit_replace(total_emitted) repeatedly for:

  • participant titles
  • streamed content chunks
  • reasoning updates
  • tool execution details
  • fallback content

This means each new chunk sends the entire accumulated conversation again, so cost grows with transcript size. On long runs this becomes unnecessarily expensive in UI rendering, CPU, and memory.

Why this matters:

  • latency grows over time
  • browser/UI updates get heavier every turn
  • manager mode and multi-round conversations amplify the problem

Low-risk fix:

  • avoid duplicate emit_replace() calls when content has not changed
  • throttle / batch replace updates during streaming
  • keep the same UX, just reduce full-message replaces
  1. Repeated per-turn model lookups/checks in the hot path

Inside the participant loop, the tool repeatedly does work that could be resolved once per model / participant at setup time, for example:

  • Models.get_model_by_id(model)
  • checking native function calling support
  • re-reading model system prompt / params
  • repeated feature/tool-related lookups

Why this matters:

  • adds avoidable DB/runtime overhead every turn
  • gets worse with more rounds and participants

Low-risk fix:

  • cache per-participant/per-model metadata once at the start of the pipe run
  • reuse cached values in the conversation loop
  1. Tool/model setup work is heavier than necessary per run

The participant setup phase merges tool IDs and features from multiple sources (MODELS runtime state + DB model info + metadata/body fallbacks), then loads built-in/imported tools.

This is correct functionally, but it does more work than necessary and could be simplified/cached within a single run.

Why this matters:

  • adds startup latency before the conversation even begins
  • especially painful in environments where models may also load/unload between turns

Low-risk fix:

  • normalize and cache participant config once
  • avoid reprocessing the same participant model more than needed in one execution
  • keep current behavior, just reduce repeated setup work
  1. Logging is too verbose in hot paths

There are many logger.info(...) calls in per-run / per-model / tool-loading paths. For a production interactive tool this creates unnecessary journal noise and some overhead.

Low-risk fix:

  • downgrade repetitive operational logs from info to debug
  • keep only important lifecycle events at info

Not part of this issue

  • Redesigning manager mode behavior
  • Changing conversation logic / rounds / participant order
  • Removing tool support
  • Solving backend-specific model load/unload behavior completely

Note on environment-dependent slowness

There is also an environment/runtime factor: if different models are constantly loaded/unloaded between turns, that will always add latency. This issue is not asking to solve that completely, only to remove avoidable overhead from the tool itself so that those environments suffer less.

Suggested acceptance criteria

  • Long conversations no longer trigger a full emit_replace() flood for every tiny chunk
  • Per-participant model metadata is resolved once and reused
  • No behavior change in conversation flow, tools, or manager mode
  • Hot-path logs are reduced to reasonable levels

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions