Do not open public GitHub issues for security bugs.
Security vulnerabilities in OODA-loop should be reported privately via GitHub Security Advisories.
Include a description of the issue, steps to reproduce, and the potential impact. We practice responsible disclosure: reporters will receive acknowledgment within 48 hours and a resolution timeline within 7 days. We ask that you hold off on public disclosure until a fix is available.
OODA-loop is an autonomous AI agent that can read your codebase, open pull requests, run deployment workflows, and at Level 3 merge PRs without human approval. This capability creates a threat surface that differs from ordinary software. The threats and mitigations below reflect deliberate design decisions.
Risk. The agent modifies its own safety rules, contracts, or core engine
(agent/safety/*, skills/evolve/*, agent/contracts/*), removing
guardrails that govern its own behavior.
Mitigation. Those paths are declared as protected_paths in config. Any PR
that touches protected paths requires explicit human review and cannot be
auto-merged, regardless of the current complexity level. The agent cannot
modify its own policy files without human approval.
Risk. The agent stages API keys, tokens, or other secrets into a git commit and exposes them in the repository or PR diff.
Mitigation.
git add -Ais forbidden in all skills. Only explicitly named files may be staged.config.json(which holds runtime config) is in.gitignoreand must never be committed.config.example.jsonuses$ENV_VARplaceholders for all credentials. The runtime config resolves those references from environment variables at load time — secrets never appear as literals in config files.
Risk. The agent creates excessive pull requests, deploys broken code repeatedly, or accumulates large API costs without operator awareness.
Mitigation.
- Maximum 1 PR per cycle (
max_prs_per_cycle: 1). - Maximum 20 files and 500 lines per PR (
max_files_per_pr,max_lines_per_pr). Exceeding either limit forces Level 3 review. - Minimum 30-minute interval between cycles (
min_cycle_interval_minutes: 30). - Daily API cost cap of $10 USD by default (
cost.daily_limit_usd). A warning fires at 80% of the limit; a hard stop fires when the limit is reached.
Risk. The agent inflates its own confidence scores to exceed the
confidence_threshold, causing it to act on domains where its track record is
poor.
Mitigation. Confidence updates are formula-driven and non-negotiable:
- Merged PR:
+0.1(confidence.merge_boost) - Rejected PR:
-0.2(confidence.reject_penalty) - All values are clamped to the range
[0.1, 1.0].
The agent cannot set confidence scores directly. The formulas are applied by the evolve engine after each PR outcome, not by skill logic.
Risk. The agent ignores the HALT file, overwrites it, or deletes it as part of cleanup logic, removing the operator's ability to stop the agent immediately.
Mitigation.
- The HALT check is Step 0 of every cycle — it runs before any observation, strategy, or action logic. There is no code path that bypasses it.
- The check is a simple file-existence test (
test -f agent/safety/HALT), not a complex condition that could be circumvented by edge cases. agent/safety/*is a protected path. The agent cannot open a PR that deletes or modifies the HALT file without triggering Level 3 review.
Risk. An unregistered or malicious skill is loaded and executed by the evolve engine, running arbitrary code under the agent's permissions.
Mitigation. The skill_allowlist in config limits which registered skills
evolve may call. The default config.example.json ships with an explicit
allowlist containing all built-in skills. When the allowlist is empty, all
currently registered skills are eligible — but "registered" means explicitly
present in the skills directory with a valid SKILL.md contract. Arbitrary
shell commands or unregistered paths cannot be injected.
In production, keep skill_allowlist set to an explicit list of skill names.
Risk. The Adaptive Lens learns incorrect thresholds or focus items from anomalous data, causing the agent to miss real problems or chase false positives persistently.
Mitigation.
- Bad learning decays 2x faster than good learning grows (asymmetric
confidence: disconfirming observation applies
-0.2, confirming applies+0.1). - Items that drop below confidence
0.1are moved todeprecated_items. - Maximum 50 items per lens; lowest-confidence items are pruned first.
- Lens files (
agent/state/*/lens.json) are gitignored, so corruption does not propagate to the repository. If a lens is deleted, the skill falls back to its base (lens-free) behavior.
Risk. A cascade event (e.g., entity rename) triggers +3.0 score bonus on multiple dependent domains simultaneously. If cascades are chained, bonuses can stack, causing runaway domain selection and bypassing normal rotation.
Mitigation.
- Cascade bonus is one-shot: consumed after the affected domain runs once.
- Cascades auto-resolve when all affected domains have executed since the event.
domain_dependenciesis explicitly configured by the operator (not auto-detected).- Maximum cascade chain depth matches
max_chain_depth(default 3).
Risk. The rollback protocol (/ooda-config rollback {cycle}) could revert
intentional changes or be triggered repeatedly to stall progress.
Mitigation.
- Rollback is opt-in (
safety.enable_rollback: falseby default). - Manual rollback requires explicit confirmation ("yes/no" prompt).
- Auto-rollback only fires after auto-merged PRs with health check failure — not for draft PRs or manually-merged PRs.
- Only 5 checkpoints are retained. Rollback beyond 5 cycles is not possible.
- Rollback creates a HALT file, requiring human review before resuming.
Risk. The saturation circuit breaker auto-generates a HALT file at 15 consecutive observe-only cycles. A malicious or misconfigured skill could produce minimal "output" to reset the counter while not producing genuinely useful work, effectively bypassing the breaker.
Mitigation.
- The saturation counter resets only on meaningful events: PR created, actions extracted, new alerts generated, or confidence changed.
- "Success" with unchanged state does NOT reset the counter.
- The warn/boost/halt thresholds are configurable via
config.saturationso operators can tighten them for their use case.
| Property | Value |
|---|---|
| Location | agent/safety/HALT |
| Trigger | File exists (any content) |
| Effect | evolve stops at Step 0, no cycle proceeds |
| Scope | Protected path — cannot be modified by the agent |
To stop the agent immediately:
touch agent/safety/HALT
To resume:
rm agent/safety/HALT
The HALT file is the "big red button". It requires no config change, no restart, and no understanding of internal state. Any operator can create the file and the agent stops on its next heartbeat.
The agent operates at one of four levels. Start at Level 0 and advance only when you are confident in the agent's behavior.
| Level | Name | Domains Active | Implementation | Auto-Merge |
|---|---|---|---|---|
| 0 | Just watching | 1 | No | No |
| 1 | Watching + testing | 2 | No | No |
| 2 | Full observation | All | No | No |
| 3 | Autonomous | All | Yes | Yes |
At Levels 0–2 the agent only observes and proposes. No PRs are merged without human approval. Level 3 enables auto-merge and is the only level where the agent can make changes to your codebase without a human click.
Protected paths always require human review, even at Level 3.
Changes to the following paths always require human review, regardless of the current complexity level:
| Path | Contains |
|---|---|
agent/safety/* |
Safety policy, HALT file |
skills/evolve/* |
Core evolve engine |
agent/contracts/* |
Skill interface contracts |
Any PR touching these paths requires explicit human review and cannot be auto-merged, regardless of the current complexity level.
| Limit | Default | Config Key |
|---|---|---|
| PRs per cycle | 1 | safety.max_prs_per_cycle |
| Files per PR | 20 | safety.max_files_per_pr |
| Lines per PR | 500 | safety.max_lines_per_pr |
Exceeding any limit automatically escalates the PR to Level 3 for human review. Limits can be tightened but should not be loosened without careful consideration.
Actions for a domain are skipped when that domain's confidence score is below
safety.confidence_threshold (default 0.6). Confidence falls when PRs are
rejected, preventing the agent from repeatedly acting on domains where its
judgment is unreliable. Scores recover gradually as PRs are merged.
A cycle lock prevents concurrent evolve runs. If a cycle crashes or hangs, the
lock expires after safety.lock_timeout_minutes (default 30). This prevents
a stale lock from permanently blocking all future cycles.
Every memory.contrarian_check_interval cycles (default 10), the engine
generates a counter-argument to its dominant strategy and stores it as a memo
with type "contrarian". This is a deliberate cognitive safeguard: it forces
the agent to question its own priorities periodically, reducing the risk of
tunnel vision or runaway focus on a single domain.
| Setting | Default | Config Key |
|---|---|---|
| Daily limit | $10 USD | cost.daily_limit_usd |
| Warning threshold | 80% | cost.warning_threshold_pct |
| Per-skill cost cap | varies | safety.cost_limit_usd per contract |
At 80% of the daily limit a warning is logged. When the limit is reached the
cycle is halted for the remainder of the day. Adjust the limit in config to match
your budget; setting it to 0 disables the hard stop (not recommended).
Individual skills can also declare a safety.cost_limit_usd in their contract
(e.g., scan-health defaults to $0.02, dev-cycle to $0.10). This provides
a per-invocation cap in addition to the global daily limit.
Prevents the engine from running indefinitely in observe-only mode without producing actionable output:
| Threshold | Cycles | Action |
|---|---|---|
| Warn | 5 | Memo: "Observation saturation detected" |
| Boost | 10 | Action queue items +5.0, implementation boosted |
| Halt | 15 | HALT file auto-created (requires human review) |
Thresholds are configurable via config.saturation. Set auto_halt: false
to disable the automatic HALT at 15 cycles (warning and boost still fire).
Prevents alert-driven domain monopoly:
- Alert bonus is dampened for recently-executed domains (linear decay over
signals.alert_cooldown_hours, default 4 hours). - After
signals.max_consecutive_alert_cycles(default 3) consecutive alert-driven selections, the alert auto-acknowledges. - Critical severity alerts bypass the dampener entirely.
Prevents any single domain from monopolizing cycle execution:
- Formula:
-B × (domain_share - 1/N)where B =scoring.balance_weight(default 5.0). - A domain running 50% of cycles in a 5-domain setup (expected: 20%) gets -1.5 penalty per cycle until it drops closer to fair share.
- Domains running less than expected get a proportional bonus.
Opt-in via safety.enable_rollback: true. Creates pre-action checkpoints
(last 5 retained) for post-merge recovery:
- Auto-revert: if health check fails after auto-merge, reverts the merge commit, restores state, and creates HALT file.
- Manual:
/ooda-config rollback {cycle}restores from checkpoint.
-
Start at Level 0. Run at least three cycles and review the decision logs before advancing to a higher level.
-
Keep
first_cycle_observe_only: true. The first cycle should never produce a PR. Verify that the agent's observations make sense for your project before allowing any action. -
Use
--dry-runbefore enabling autonomous mode. Run/evolve --dry-runto see what actions the agent would take without actually creating PRs or making changes. -
Keep an explicit
skill_allowlistin production. The default config ships with all built-in skills listed. Review and trim it to only the skills your project actually needs. -
Monitor
agent/state/evolve/state.jsonregularly. Anomalies in domain scores, confidence values, or action counts are early indicators of unexpected behavior. -
Back up state files before major config changes. State is stored in
agent/state/. Copy it before editingconfig.jsonor changing complexity levels so you can roll back if needed. -
Review the first few cycles' decision logs manually. Even after advancing past Level 0, periodic manual review of evolve output keeps you informed of what the agent is prioritizing.
-
Never store secrets in
config.json. Use$ENV_VARreferences and pass values through environment variables. Treatconfig.jsonas you would a.envfile: never commit it. -
Spot-check Adaptive Lens files periodically. Review
agent/state/*/lens.jsonafter the first 10-20 cycles to verify that learned thresholds and focus items match your expectations. Delete a lens file to reset a domain to base behavior.
Only the latest release of OODA-loop receives security patches. If you are running an older version, upgrade before reporting a vulnerability or applying a fix.