Problem / use case
Talon already provides strong governance for agent execution (policy checks, evidence, tenant scoping), but memory retrieval is still limited to lightweight index-based context injection. This creates a gap:
- Retrieved context quality is not strong enough for semantic recall (paraphrase and concept-level matching).
- Memory writes are governed, but memory read influence on final output is not fully traceable.
- Current behavior can drift toward either too little memory utility or over-scoped memory complexity.
The goal is to add governed RAG capabilities that improve context quality while preserving Talon's core role as a governance/control plane, not a generic memory framework.
Success criteria:
- Better retrieval relevance than title/keyword-only methods.
- Full policy and audit controls on memory write and read paths.
- Clear tenant isolation and deletion/retention lifecycle.
- Minimal architecture changes that keep Talon single-binary and SQLite-first.
Compliance requirement (if applicable)
This feature is compliance-sensitive because memory persists model-influencing data over time.
GDPR and privacy
- PII at write time: Memory candidates can contain PII from model output, attachments, or user text. Memory write must remain blocked on detected PII unless policy allows a reviewed path.
- Data minimization: Only store what is needed for retrieval and audit; avoid persisting unnecessary raw content.
- Right to deletion: Memory must support deterministic removal workflows (retention, rollback/invalidation, targeted deletion strategy for DSAR use cases).
Auditability and AI Act transparency
- Traceability: Every response should be auditable with "which memory entries were retrieved and injected."
- Lineage: Memory entry provenance should include tenant, agent, source evidence, and lifecycle state.
- Integrity: Memory and evidence records should remain tamper-evident and queryable for investigations.
Tenant and governance boundaries
- Strict tenant isolation: Read/write queries must always be scoped by
tenant_id.
- Governed reads, not just writes: Retrieval and prompt injection must be policy-constrained (category filters, trust thresholds, review state).
- Fail-closed defaults: Unknown config states should not allow unsafe memory injection.
Proposed solution
Implement governed memory RAG as an extension of existing Talon components, not a new framework.
A. Minimal data model extension
Extend memory entries with the smallest useful additions:
embedding_data (BLOB) for semantic retrieval.
state (active | pending_review | shadow) as a normalized governance state.
Keep and reuse existing fields (tenant_id, agent_id, memory_type, trust_score, created_at, evidence_id, review and consolidation metadata).
B. Governed write pipeline
candidate -> governance -> embed -> persist -> evidence
- Build memory candidate from run output.
- Apply governance checks:
- category allow/forbid
- PII scan
- policy engine memory-write decision
- conflict handling and trust scoring
- Route to
pending_review when policy requires approval.
- Generate embedding only after governance pass.
- Persist memory entry with provenance.
- Emit evidence for memory write linkage.
C. Governed read pipeline
query -> retrieve -> filter -> rank -> inject -> evidence
- Build query embedding from request prompt.
- Retrieve top candidates in tenant/agent scope.
- Filter by:
- state (
active only for injection)
- review status (exclude pending)
- allowed categories
- trust threshold
- Rank by semantic score + trust + recency + memory type weight.
- Inject a compact memory context with token budgeting.
- Record retrieved memory references in evidence for output traceability.
D. Injection point and budget
Reuse the current pre-LLM enrichment point in runner flow. Keep a strict token cap (max_prompt_tokens) and only inject governed entries. This avoids architectural churn and preserves existing policy-before-execution guarantees.
E. Storage choice
For MVP, stay SQLite-first to preserve Talon's single-binary and low-friction setup. Do not require Postgres/pgvector in initial rollout. Keep future adapter points open, but avoid introducing operational dependencies now.
F. Out of scope (explicitly not building)
- No generic memory platform abstraction.
- No autonomous "remember everything" extraction.
- No orchestration/pipeline engine.
- No multi-store complexity in MVP.
This keeps Talon focused: policy-governed AI execution with auditable, constrained memory retrieval.
Problem / use case
Talon already provides strong governance for agent execution (policy checks, evidence, tenant scoping), but memory retrieval is still limited to lightweight index-based context injection. This creates a gap:
The goal is to add governed RAG capabilities that improve context quality while preserving Talon's core role as a governance/control plane, not a generic memory framework.
Success criteria:
Compliance requirement (if applicable)
This feature is compliance-sensitive because memory persists model-influencing data over time.
GDPR and privacy
Auditability and AI Act transparency
Tenant and governance boundaries
tenant_id.Proposed solution
Implement governed memory RAG as an extension of existing Talon components, not a new framework.
A. Minimal data model extension
Extend memory entries with the smallest useful additions:
embedding_data(BLOB) for semantic retrieval.state(active | pending_review | shadow) as a normalized governance state.Keep and reuse existing fields (
tenant_id,agent_id,memory_type,trust_score,created_at,evidence_id, review and consolidation metadata).B. Governed write pipeline
candidate -> governance -> embed -> persist -> evidencepending_reviewwhen policy requires approval.C. Governed read pipeline
query -> retrieve -> filter -> rank -> inject -> evidenceactiveonly for injection)D. Injection point and budget
Reuse the current pre-LLM enrichment point in runner flow. Keep a strict token cap (
max_prompt_tokens) and only inject governed entries. This avoids architectural churn and preserves existing policy-before-execution guarantees.E. Storage choice
For MVP, stay SQLite-first to preserve Talon's single-binary and low-friction setup. Do not require Postgres/pgvector in initial rollout. Keep future adapter points open, but avoid introducing operational dependencies now.
F. Out of scope (explicitly not building)
This keeps Talon focused: policy-governed AI execution with auditable, constrained memory retrieval.