Skip to content

Security: mataeil/OODA-loop

Security

SECURITY.md

Security Policy — OODA-loop

Reporting Vulnerabilities

Do not open public GitHub issues for security bugs.

Security vulnerabilities in OODA-loop should be reported privately via GitHub Security Advisories.

Include a description of the issue, steps to reproduce, and the potential impact. We practice responsible disclosure: reporters will receive acknowledgment within 48 hours and a resolution timeline within 7 days. We ask that you hold off on public disclosure until a fix is available.


Threat Model

OODA-loop is an autonomous AI agent that can read your codebase, open pull requests, run deployment workflows, and at Level 3 merge PRs without human approval. This capability creates a threat surface that differs from ordinary software. The threats and mitigations below reflect deliberate design decisions.

Threat 1: Self-Modification

Risk. The agent modifies its own safety rules, contracts, or core engine (agent/safety/*, skills/evolve/*, agent/contracts/*), removing guardrails that govern its own behavior.

Mitigation. Those paths are declared as protected_paths in config. Any PR that touches protected paths requires explicit human review and cannot be auto-merged, regardless of the current complexity level. The agent cannot modify its own policy files without human approval.


Threat 2: Secret Exposure

Risk. The agent stages API keys, tokens, or other secrets into a git commit and exposes them in the repository or PR diff.

Mitigation.

  • git add -A is forbidden in all skills. Only explicitly named files may be staged.
  • config.json (which holds runtime config) is in .gitignore and must never be committed.
  • config.example.json uses $ENV_VAR placeholders for all credentials. The runtime config resolves those references from environment variables at load time — secrets never appear as literals in config files.

Threat 3: Unbounded Autonomous Action

Risk. The agent creates excessive pull requests, deploys broken code repeatedly, or accumulates large API costs without operator awareness.

Mitigation.

  • Maximum 1 PR per cycle (max_prs_per_cycle: 1).
  • Maximum 20 files and 500 lines per PR (max_files_per_pr, max_lines_per_pr). Exceeding either limit forces Level 3 review.
  • Minimum 30-minute interval between cycles (min_cycle_interval_minutes: 30).
  • Daily API cost cap of $10 USD by default (cost.daily_limit_usd). A warning fires at 80% of the limit; a hard stop fires when the limit is reached.

Threat 4: Confidence Manipulation

Risk. The agent inflates its own confidence scores to exceed the confidence_threshold, causing it to act on domains where its track record is poor.

Mitigation. Confidence updates are formula-driven and non-negotiable:

  • Merged PR: +0.1 (confidence.merge_boost)
  • Rejected PR: -0.2 (confidence.reject_penalty)
  • All values are clamped to the range [0.1, 1.0].

The agent cannot set confidence scores directly. The formulas are applied by the evolve engine after each PR outcome, not by skill logic.


Threat 5: HALT Bypass

Risk. The agent ignores the HALT file, overwrites it, or deletes it as part of cleanup logic, removing the operator's ability to stop the agent immediately.

Mitigation.

  • The HALT check is Step 0 of every cycle — it runs before any observation, strategy, or action logic. There is no code path that bypasses it.
  • The check is a simple file-existence test (test -f agent/safety/HALT), not a complex condition that could be circumvented by edge cases.
  • agent/safety/* is a protected path. The agent cannot open a PR that deletes or modifies the HALT file without triggering Level 3 review.

Threat 6: Skill Injection

Risk. An unregistered or malicious skill is loaded and executed by the evolve engine, running arbitrary code under the agent's permissions.

Mitigation. The skill_allowlist in config limits which registered skills evolve may call. The default config.example.json ships with an explicit allowlist containing all built-in skills. When the allowlist is empty, all currently registered skills are eligible — but "registered" means explicitly present in the skills directory with a valid SKILL.md contract. Arbitrary shell commands or unregistered paths cannot be injected.

In production, keep skill_allowlist set to an explicit list of skill names.


Threat 7: Adaptive Lens Poisoning

Risk. The Adaptive Lens learns incorrect thresholds or focus items from anomalous data, causing the agent to miss real problems or chase false positives persistently.

Mitigation.

  • Bad learning decays 2x faster than good learning grows (asymmetric confidence: disconfirming observation applies -0.2, confirming applies +0.1).
  • Items that drop below confidence 0.1 are moved to deprecated_items.
  • Maximum 50 items per lens; lowest-confidence items are pruned first.
  • Lens files (agent/state/*/lens.json) are gitignored, so corruption does not propagate to the repository. If a lens is deleted, the skill falls back to its base (lens-free) behavior.

Threat 8: Cross-Domain Cascade Amplification (v1.1.0)

Risk. A cascade event (e.g., entity rename) triggers +3.0 score bonus on multiple dependent domains simultaneously. If cascades are chained, bonuses can stack, causing runaway domain selection and bypassing normal rotation.

Mitigation.

  • Cascade bonus is one-shot: consumed after the affected domain runs once.
  • Cascades auto-resolve when all affected domains have executed since the event.
  • domain_dependencies is explicitly configured by the operator (not auto-detected).
  • Maximum cascade chain depth matches max_chain_depth (default 3).

Threat 9: Rollback Abuse (v1.1.0)

Risk. The rollback protocol (/ooda-config rollback {cycle}) could revert intentional changes or be triggered repeatedly to stall progress.

Mitigation.

  • Rollback is opt-in (safety.enable_rollback: false by default).
  • Manual rollback requires explicit confirmation ("yes/no" prompt).
  • Auto-rollback only fires after auto-merged PRs with health check failure — not for draft PRs or manually-merged PRs.
  • Only 5 checkpoints are retained. Rollback beyond 5 cycles is not possible.
  • Rollback creates a HALT file, requiring human review before resuming.

Threat 10: Observation Saturation Bypass (v1.1.0)

Risk. The saturation circuit breaker auto-generates a HALT file at 15 consecutive observe-only cycles. A malicious or misconfigured skill could produce minimal "output" to reset the counter while not producing genuinely useful work, effectively bypassing the breaker.

Mitigation.

  • The saturation counter resets only on meaningful events: PR created, actions extracted, new alerts generated, or confidence changed.
  • "Success" with unchanged state does NOT reset the counter.
  • The warn/boost/halt thresholds are configurable via config.saturation so operators can tighten them for their use case.

Safety Mechanisms

HALT File

Property Value
Location agent/safety/HALT
Trigger File exists (any content)
Effect evolve stops at Step 0, no cycle proceeds
Scope Protected path — cannot be modified by the agent

To stop the agent immediately:

touch agent/safety/HALT

To resume:

rm agent/safety/HALT

The HALT file is the "big red button". It requires no config change, no restart, and no understanding of internal state. Any operator can create the file and the agent stops on its next heartbeat.


Progressive Complexity Levels

The agent operates at one of four levels. Start at Level 0 and advance only when you are confident in the agent's behavior.

Level Name Domains Active Implementation Auto-Merge
0 Just watching 1 No No
1 Watching + testing 2 No No
2 Full observation All No No
3 Autonomous All Yes Yes

At Levels 0–2 the agent only observes and proposes. No PRs are merged without human approval. Level 3 enables auto-merge and is the only level where the agent can make changes to your codebase without a human click.

Protected paths always require human review, even at Level 3.


Protected Paths

Changes to the following paths always require human review, regardless of the current complexity level:

Path Contains
agent/safety/* Safety policy, HALT file
skills/evolve/* Core evolve engine
agent/contracts/* Skill interface contracts

Any PR touching these paths requires explicit human review and cannot be auto-merged, regardless of the current complexity level.


PR Limits

Limit Default Config Key
PRs per cycle 1 safety.max_prs_per_cycle
Files per PR 20 safety.max_files_per_pr
Lines per PR 500 safety.max_lines_per_pr

Exceeding any limit automatically escalates the PR to Level 3 for human review. Limits can be tightened but should not be loosened without careful consideration.


Confidence Threshold

Actions for a domain are skipped when that domain's confidence score is below safety.confidence_threshold (default 0.6). Confidence falls when PRs are rejected, preventing the agent from repeatedly acting on domains where its judgment is unreliable. Scores recover gradually as PRs are merged.


Cycle Lock Timeout

A cycle lock prevents concurrent evolve runs. If a cycle crashes or hangs, the lock expires after safety.lock_timeout_minutes (default 30). This prevents a stale lock from permanently blocking all future cycles.


Contrarian Check

Every memory.contrarian_check_interval cycles (default 10), the engine generates a counter-argument to its dominant strategy and stores it as a memo with type "contrarian". This is a deliberate cognitive safeguard: it forces the agent to question its own priorities periodically, reducing the risk of tunnel vision or runaway focus on a single domain.


Cost Controls

Setting Default Config Key
Daily limit $10 USD cost.daily_limit_usd
Warning threshold 80% cost.warning_threshold_pct
Per-skill cost cap varies safety.cost_limit_usd per contract

At 80% of the daily limit a warning is logged. When the limit is reached the cycle is halted for the remainder of the day. Adjust the limit in config to match your budget; setting it to 0 disables the hard stop (not recommended).

Individual skills can also declare a safety.cost_limit_usd in their contract (e.g., scan-health defaults to $0.02, dev-cycle to $0.10). This provides a per-invocation cap in addition to the global daily limit.


Saturation Circuit Breaker (v1.1.0)

Prevents the engine from running indefinitely in observe-only mode without producing actionable output:

Threshold Cycles Action
Warn 5 Memo: "Observation saturation detected"
Boost 10 Action queue items +5.0, implementation boosted
Halt 15 HALT file auto-created (requires human review)

Thresholds are configurable via config.saturation. Set auto_halt: false to disable the automatic HALT at 15 cycles (warning and boost still fire).


Alert Recency Dampener (v1.1.0)

Prevents alert-driven domain monopoly:

  • Alert bonus is dampened for recently-executed domains (linear decay over signals.alert_cooldown_hours, default 4 hours).
  • After signals.max_consecutive_alert_cycles (default 3) consecutive alert-driven selections, the alert auto-acknowledges.
  • Critical severity alerts bypass the dampener entirely.

Entropy Balance Penalty (v1.1.0)

Prevents any single domain from monopolizing cycle execution:

  • Formula: -B × (domain_share - 1/N) where B = scoring.balance_weight (default 5.0).
  • A domain running 50% of cycles in a 5-domain setup (expected: 20%) gets -1.5 penalty per cycle until it drops closer to fair share.
  • Domains running less than expected get a proportional bonus.

Rollback Protocol (v1.1.0)

Opt-in via safety.enable_rollback: true. Creates pre-action checkpoints (last 5 retained) for post-merge recovery:

  • Auto-revert: if health check fails after auto-merge, reverts the merge commit, restores state, and creates HALT file.
  • Manual: /ooda-config rollback {cycle} restores from checkpoint.

Best Practices for Operators

  1. Start at Level 0. Run at least three cycles and review the decision logs before advancing to a higher level.

  2. Keep first_cycle_observe_only: true. The first cycle should never produce a PR. Verify that the agent's observations make sense for your project before allowing any action.

  3. Use --dry-run before enabling autonomous mode. Run /evolve --dry-run to see what actions the agent would take without actually creating PRs or making changes.

  4. Keep an explicit skill_allowlist in production. The default config ships with all built-in skills listed. Review and trim it to only the skills your project actually needs.

  5. Monitor agent/state/evolve/state.json regularly. Anomalies in domain scores, confidence values, or action counts are early indicators of unexpected behavior.

  6. Back up state files before major config changes. State is stored in agent/state/. Copy it before editing config.json or changing complexity levels so you can roll back if needed.

  7. Review the first few cycles' decision logs manually. Even after advancing past Level 0, periodic manual review of evolve output keeps you informed of what the agent is prioritizing.

  8. Never store secrets in config.json. Use $ENV_VAR references and pass values through environment variables. Treat config.json as you would a .env file: never commit it.

  9. Spot-check Adaptive Lens files periodically. Review agent/state/*/lens.json after the first 10-20 cycles to verify that learned thresholds and focus items match your expectations. Delete a lens file to reset a domain to base behavior.


Supported Versions

Only the latest release of OODA-loop receives security patches. If you are running an older version, upgrade before reporting a vulnerability or applying a fix.

There aren’t any published security advisories