Skip to content

Add VIBE√ safety check to /cq:reflect (#240)#270

Open
dni138 wants to merge 1 commit intomainfrom
vibe-check-reflect-240
Open

Add VIBE√ safety check to /cq:reflect (#240)#270
dni138 wants to merge 1 commit intomainfrom
vibe-check-reflect-240

Conversation

@dni138
Copy link
Copy Markdown
Contributor

@dni138 dni138 commented Apr 13, 2026

Summary

Implements a first-cut spec for a VIBE√ (Vulnerabilities, Impact, Biases, Edge cases) safety check inside plugins/cq/commands/cq-reflect.md, addressing #240.

This is a markdown-only change intended to seed discussion before any code lands. The shape of the proposal:

  • Step 3 / VIBE√ subsection defines the four dimensions as the agent's mental model for evaluating candidates.
  • Step 3.5 classifies each finding as a hard finding (sanitized rewrite generated) or soft concern (one-line flag). No candidate is ever auto-dropped/cq:reflect writes to the user's local cq tier, and the user owns the storage decision.
  • Step 4 presents three candidate templates (clean / soft-flagged / hard-finding-with-both-versions). For hard findings the user picks N original or N sanitized.
  • Step 7 records VIBE√ provenance per stored ID (clean | soft | sanitized | original) for an honest audit trail.
  • Edge cases include a fallback for when no coherent sanitized rewrite is possible.

Design context

The conversation behind this PR explored three paths from #240:

  1. Pre-summary filter — rejected: filters upstream of where leak risk crystallizes; risks over-filtering.
  2. Post-candidate gate — adopted: acts on the smallest, most structured artifact that would actually leave the machine.
  3. Standalone pre-flight tool — deferred: opt-in safety is weak safety; better suited to a future tier-promotion / graduation flow.

A key architectural choice: /cq:reflect only proposes to the local tier per docs/CQ-Proposal.md §3.5, so commons protection actually belongs at the graduation step (not here). That's why this spec never auto-drops — the user may legitimately want certain candidates in their local store, even if those same candidates should never reach commons.

Discussion points

These are the open questions worth working through in review before any of this should be considered settled:

  1. Should V/I/B/E letters appear in the user-facing flag? Currently no — the user sees ⚠️ {concern} without categorization. Trade-off: cleaner output vs. easier triage at scale when candidate lists get long.
  2. Would a UI element / triage interface help reviewers? The current spec assumes terminal output. Larger candidate sets with mixed clean/soft/hard findings could benefit from richer affordances (filter by tier, side-by-side diff for the original/sanitized comparison, etc.).
  3. The "never drop" posture. This deliberately leaves commons protection to a future graduation gate. Want to confirm the team agrees this is the right place to draw the line — alternatives include a hard-drop fallback for the most egregious categories (raw credentials), or a separate pre-flight skill.
  4. PII list framing. The hard-findings list is illustrative ("real names, email addresses, phone numbers, government IDs, physical addresses") rather than tied to specific regulated categories (GDPR special categories, HIPAA PHI, etc.). We aren't lawyers; making narrower legal claims felt risky. Worth a sanity check.

Test plan

  • Re-read plugins/cq/commands/cq-reflect.md end-to-end and confirm the 7+1 step flow still reads coherently with Step 3.5 inserted.
  • Mental dry-run with three synthetic sessions: (a) routine session with no candidates (Step 3.5 is a no-op), (b) a candidate containing a literal API key (presented with both original and sanitized; user can still pick original for local storage), (c) a candidate framed as "always use Vendor X" (presented as soft-flagged, no rewrite).
  • Run /cq:reflect against a real session post-merge to confirm an agent reads and applies Step 3.5 in practice.
  • Resolve the four discussion points above before treating this as final.

🤖 Generated with Claude Code

@dni138 dni138 requested a review from peteski22 April 13, 2026 17:55
@dni138 dni138 self-assigned this Apr 13, 2026
Inserts a Step 3.5 gate that evaluates each candidate against four
safety dimensions (Vulnerabilities, Impact, Biases, Edge cases). Hard
findings get a sanitized rewrite generated alongside the original; the
user picks which (if either) to store. Soft concerns are flagged for
awareness without modification. Step 7 records VIBE√ provenance per
stored ID so the audit trail reflects each version chosen.

The skill never auto-drops, since /cq:reflect writes to the local cq
tier and commons exposure happens at graduation. Markdown-only change
intended as a discussion starter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@peteski22 peteski22 force-pushed the vibe-check-reflect-240 branch from 573e26b to 86f4bad Compare April 17, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant