Add VIBE√ safety check to /cq:reflect (#240) by dni138 · Pull Request #270 · mozilla-ai/cq

dni138 · 2026-04-13T17:55:16Z

Summary

Implements a first-cut spec for a VIBE√ (Vulnerabilities, Impact, Biases, Edge cases) safety check inside plugins/cq/commands/cq-reflect.md, addressing #240.

This is a markdown-only change intended to seed discussion before any code lands. The shape of the proposal:

Step 3 / VIBE√ subsection defines the four dimensions as the agent's mental model for evaluating candidates.
Step 3.5 classifies each finding as a hard finding (sanitized rewrite generated) or soft concern (one-line flag). No candidate is ever auto-dropped — /cq:reflect writes to the user's local cq tier, and the user owns the storage decision.
Step 4 presents three candidate templates (clean / soft-flagged / hard-finding-with-both-versions). For hard findings the user picks N original or N sanitized.
Step 7 records VIBE√ provenance per stored ID (clean | soft | sanitized | original) for an honest audit trail.
Edge cases include a fallback for when no coherent sanitized rewrite is possible.

Design context

The conversation behind this PR explored three paths from #240:

Pre-summary filter — rejected: filters upstream of where leak risk crystallizes; risks over-filtering.
Post-candidate gate — adopted: acts on the smallest, most structured artifact that would actually leave the machine.
Standalone pre-flight tool — deferred: opt-in safety is weak safety; better suited to a future tier-promotion / graduation flow.

A key architectural choice: /cq:reflect only proposes to the local tier per docs/CQ-Proposal.md §3.5, so commons protection actually belongs at the graduation step (not here). That's why this spec never auto-drops — the user may legitimately want certain candidates in their local store, even if those same candidates should never reach commons.

Discussion points

These are the open questions worth working through in review before any of this should be considered settled:

Should V/I/B/E letters appear in the user-facing flag? Currently no — the user sees ⚠️ {concern} without categorization. Trade-off: cleaner output vs. easier triage at scale when candidate lists get long.
Would a UI element / triage interface help reviewers? The current spec assumes terminal output. Larger candidate sets with mixed clean/soft/hard findings could benefit from richer affordances (filter by tier, side-by-side diff for the original/sanitized comparison, etc.).
The "never drop" posture. This deliberately leaves commons protection to a future graduation gate. Want to confirm the team agrees this is the right place to draw the line — alternatives include a hard-drop fallback for the most egregious categories (raw credentials), or a separate pre-flight skill.
PII list framing. The hard-findings list is illustrative ("real names, email addresses, phone numbers, government IDs, physical addresses") rather than tied to specific regulated categories (GDPR special categories, HIPAA PHI, etc.). We aren't lawyers; making narrower legal claims felt risky. Worth a sanity check.

Test plan

Re-read plugins/cq/commands/cq-reflect.md end-to-end and confirm the 7+1 step flow still reads coherently with Step 3.5 inserted.
Mental dry-run with three synthetic sessions: (a) routine session with no candidates (Step 3.5 is a no-op), (b) a candidate containing a literal API key (presented with both original and sanitized; user can still pick original for local storage), (c) a candidate framed as "always use Vendor X" (presented as soft-flagged, no rewrite).
Run /cq:reflect against a real session post-merge to confirm an agent reads and applies Step 3.5 in practice.
Resolve the four discussion points above before treating this as final.

🤖 Generated with Claude Code

Inserts a Step 3.5 gate that evaluates each candidate against four safety dimensions (Vulnerabilities, Impact, Biases, Edge cases). Hard findings get a sanitized rewrite generated alongside the original; the user picks which (if either) to store. Soft concerns are flagged for awareness without modification. Step 7 records VIBE√ provenance per stored ID so the audit trail reflects each version chosen. The skill never auto-drops, since /cq:reflect writes to the local cq tier and commons exposure happens at graduation. Markdown-only change intended as a discussion starter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dni138 requested a review from peteski22 April 13, 2026 17:55

dni138 self-assigned this Apr 13, 2026

peteski22 force-pushed the vibe-check-reflect-240 branch from 573e26b to 86f4bad Compare April 17, 2026 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VIBE√ safety check to /cq:reflect (#240)#270

Add VIBE√ safety check to /cq:reflect (#240)#270
dni138 wants to merge 1 commit intomainfrom
vibe-check-reflect-240

dni138 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dni138 commented Apr 13, 2026

Summary

Design context

Discussion points

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant