Governed Memory (RAG) for Talon - Minimal Design

### Problem / use case


Talon already provides strong governance for agent execution (policy checks, evidence, tenant scoping), but memory retrieval is still limited to lightweight index-based context injection. This creates a gap:

- Retrieved context quality is not strong enough for semantic recall (paraphrase and concept-level matching).
- Memory writes are governed, but memory read influence on final output is not fully traceable.
- Current behavior can drift toward either too little memory utility or over-scoped memory complexity.

The goal is to add governed RAG capabilities that improve context quality while preserving Talon's core role as a governance/control plane, not a generic memory framework.

Success criteria:

- Better retrieval relevance than title/keyword-only methods.
- Full policy and audit controls on memory write and read paths.
- Clear tenant isolation and deletion/retention lifecycle.
- Minimal architecture changes that keep Talon single-binary and SQLite-first.


### Compliance requirement (if applicable)


This feature is compliance-sensitive because memory persists model-influencing data over time.

### GDPR and privacy

- **PII at write time:** Memory candidates can contain PII from model output, attachments, or user text. Memory write must remain blocked on detected PII unless policy allows a reviewed path.
- **Data minimization:** Only store what is needed for retrieval and audit; avoid persisting unnecessary raw content.
- **Right to deletion:** Memory must support deterministic removal workflows (retention, rollback/invalidation, targeted deletion strategy for DSAR use cases).

### Auditability and AI Act transparency

- **Traceability:** Every response should be auditable with "which memory entries were retrieved and injected."
- **Lineage:** Memory entry provenance should include tenant, agent, source evidence, and lifecycle state.
- **Integrity:** Memory and evidence records should remain tamper-evident and queryable for investigations.

### Tenant and governance boundaries

- **Strict tenant isolation:** Read/write queries must always be scoped by `tenant_id`.
- **Governed reads, not just writes:** Retrieval and prompt injection must be policy-constrained (category filters, trust thresholds, review state).
- **Fail-closed defaults:** Unknown config states should not allow unsafe memory injection.

### Proposed solution


Implement governed memory RAG as an extension of existing Talon components, not a new framework.

### A. Minimal data model extension

Extend memory entries with the smallest useful additions:

- `embedding_data` (BLOB) for semantic retrieval.
- `state` (`active | pending_review | shadow`) as a normalized governance state.

Keep and reuse existing fields (`tenant_id`, `agent_id`, `memory_type`, `trust_score`, `created_at`, `evidence_id`, review and consolidation metadata).

### B. Governed write pipeline

`candidate -> governance -> embed -> persist -> evidence`

1. Build memory candidate from run output.
2. Apply governance checks:
   - category allow/forbid
   - PII scan
   - policy engine memory-write decision
   - conflict handling and trust scoring
3. Route to `pending_review` when policy requires approval.
4. Generate embedding only after governance pass.
5. Persist memory entry with provenance.
6. Emit evidence for memory write linkage.

### C. Governed read pipeline

`query -> retrieve -> filter -> rank -> inject -> evidence`

1. Build query embedding from request prompt.
2. Retrieve top candidates in tenant/agent scope.
3. Filter by:
   - state (`active` only for injection)
   - review status (exclude pending)
   - allowed categories
   - trust threshold
4. Rank by semantic score + trust + recency + memory type weight.
5. Inject a compact memory context with token budgeting.
6. Record retrieved memory references in evidence for output traceability.

### D. Injection point and budget

Reuse the current pre-LLM enrichment point in runner flow. Keep a strict token cap (`max_prompt_tokens`) and only inject governed entries. This avoids architectural churn and preserves existing policy-before-execution guarantees.

### E. Storage choice

For MVP, stay SQLite-first to preserve Talon's single-binary and low-friction setup. Do not require Postgres/pgvector in initial rollout. Keep future adapter points open, but avoid introducing operational dependencies now.

### F. Out of scope (explicitly not building)

- No generic memory platform abstraction.
- No autonomous "remember everything" extraction.
- No orchestration/pipeline engine.
- No multi-store complexity in MVP.

This keeps Talon focused: policy-governed AI execution with auditable, constrained memory retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Governed Memory (RAG) for Talon - Minimal Design #58

Problem / use case

Compliance requirement (if applicable)

GDPR and privacy

Auditability and AI Act transparency

Tenant and governance boundaries

Proposed solution

A. Minimal data model extension

B. Governed write pipeline

C. Governed read pipeline

D. Injection point and budget

E. Storage choice

F. Out of scope (explicitly not building)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Governed Memory (RAG) for Talon - Minimal Design #58

Description

Problem / use case

Compliance requirement (if applicable)

GDPR and privacy

Auditability and AI Act transparency

Tenant and governance boundaries

Proposed solution

A. Minimal data model extension

B. Governed write pipeline

C. Governed read pipeline

D. Injection point and budget

E. Storage choice

F. Out of scope (explicitly not building)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions