Skip to content

Governed Memory (RAG) for Talon - Minimal Design #58

@sergeyenin

Description

@sergeyenin

Problem / use case

Talon already provides strong governance for agent execution (policy checks, evidence, tenant scoping), but memory retrieval is still limited to lightweight index-based context injection. This creates a gap:

  • Retrieved context quality is not strong enough for semantic recall (paraphrase and concept-level matching).
  • Memory writes are governed, but memory read influence on final output is not fully traceable.
  • Current behavior can drift toward either too little memory utility or over-scoped memory complexity.

The goal is to add governed RAG capabilities that improve context quality while preserving Talon's core role as a governance/control plane, not a generic memory framework.

Success criteria:

  • Better retrieval relevance than title/keyword-only methods.
  • Full policy and audit controls on memory write and read paths.
  • Clear tenant isolation and deletion/retention lifecycle.
  • Minimal architecture changes that keep Talon single-binary and SQLite-first.

Compliance requirement (if applicable)

This feature is compliance-sensitive because memory persists model-influencing data over time.

GDPR and privacy

  • PII at write time: Memory candidates can contain PII from model output, attachments, or user text. Memory write must remain blocked on detected PII unless policy allows a reviewed path.
  • Data minimization: Only store what is needed for retrieval and audit; avoid persisting unnecessary raw content.
  • Right to deletion: Memory must support deterministic removal workflows (retention, rollback/invalidation, targeted deletion strategy for DSAR use cases).

Auditability and AI Act transparency

  • Traceability: Every response should be auditable with "which memory entries were retrieved and injected."
  • Lineage: Memory entry provenance should include tenant, agent, source evidence, and lifecycle state.
  • Integrity: Memory and evidence records should remain tamper-evident and queryable for investigations.

Tenant and governance boundaries

  • Strict tenant isolation: Read/write queries must always be scoped by tenant_id.
  • Governed reads, not just writes: Retrieval and prompt injection must be policy-constrained (category filters, trust thresholds, review state).
  • Fail-closed defaults: Unknown config states should not allow unsafe memory injection.

Proposed solution

Implement governed memory RAG as an extension of existing Talon components, not a new framework.

A. Minimal data model extension

Extend memory entries with the smallest useful additions:

  • embedding_data (BLOB) for semantic retrieval.
  • state (active | pending_review | shadow) as a normalized governance state.

Keep and reuse existing fields (tenant_id, agent_id, memory_type, trust_score, created_at, evidence_id, review and consolidation metadata).

B. Governed write pipeline

candidate -> governance -> embed -> persist -> evidence

  1. Build memory candidate from run output.
  2. Apply governance checks:
    • category allow/forbid
    • PII scan
    • policy engine memory-write decision
    • conflict handling and trust scoring
  3. Route to pending_review when policy requires approval.
  4. Generate embedding only after governance pass.
  5. Persist memory entry with provenance.
  6. Emit evidence for memory write linkage.

C. Governed read pipeline

query -> retrieve -> filter -> rank -> inject -> evidence

  1. Build query embedding from request prompt.
  2. Retrieve top candidates in tenant/agent scope.
  3. Filter by:
    • state (active only for injection)
    • review status (exclude pending)
    • allowed categories
    • trust threshold
  4. Rank by semantic score + trust + recency + memory type weight.
  5. Inject a compact memory context with token budgeting.
  6. Record retrieved memory references in evidence for output traceability.

D. Injection point and budget

Reuse the current pre-LLM enrichment point in runner flow. Keep a strict token cap (max_prompt_tokens) and only inject governed entries. This avoids architectural churn and preserves existing policy-before-execution guarantees.

E. Storage choice

For MVP, stay SQLite-first to preserve Talon's single-binary and low-friction setup. Do not require Postgres/pgvector in initial rollout. Keep future adapter points open, but avoid introducing operational dependencies now.

F. Out of scope (explicitly not building)

  • No generic memory platform abstraction.
  • No autonomous "remember everything" extraction.
  • No orchestration/pipeline engine.
  • No multi-store complexity in MVP.

This keeps Talon focused: policy-governed AI execution with auditable, constrained memory retrieval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions