Skip to content

Conversation

@fitzpr
Copy link
Contributor

@fitzpr fitzpr commented Feb 1, 2026

Description

This PR adds security testing capabilities for AI agent systems, specifically targeting Moltbot/ClawdBot vulnerabilities discovered in January 2026.

What's included:

  • MoltbotScenario - A scenario for testing known Moltbot CVEs (cron injection, credential theft, file exfiltration, hidden instruction injection)
  • MoltbotStrategy - Strategy enum for selecting vulnerability types to test
  • Test objectives focused on real-world AI agent vulnerabilities
  • Integration with PyRIT's existing scoring system

Design decisions:

  • Initially implemented with a custom AgentCommandInjectionConverter, but removed it in commit 2 after recognizing it violated PyRIT's converter philosophy
  • Converters should transform text, not generate attack payloads
  • Attack payloads now belong in test objectives where they should be
  • Simplified architecture: Objective → PromptSendingAttack → Target (no converter layer)

Files changed:

  • pyrit/scenario/scenarios/airt/moltbot_scenario.py - Main scenario implementation
  • pyrit/scenario/scenarios/airt/__init__.py - Exports

Tests and Documentation

Tests:

  • ✅ Manual testing against Moltbot endpoint
  • ✅ All tests passed (endpoints appear to be patched/web UIs)
  • ✅ Verified scenario imports correctly: from pyrit.scenario.scenarios.airt import MoltbotScenario, MoltbotStrategy
  • ✅ Confirmed all 4 strategies work: ALL, CRON_INJECTION, CREDENTIAL_THEFT, FILE_EXFILTRATION, HIDDEN_INSTRUCTION

Robert Fitzpatrick added 2 commits February 1, 2026 18:02
This PR adds support for testing AI agent systems (Moltbot/Clawdbot) for known security vulnerabilities.

## Components Added

1. **MoltbotScenario** (pyrit/scenario/scenarios/airt/moltbot_scenario.py)
   - Tests known Moltbot/Clawdbot CVEs using Scenario pattern
   - Strategies: cron injection, credential theft, file exfiltration, hidden instructions
   - Follows same pattern as existing Cyber and Leakage scenarios
   - Uses existing PyRIT attack strategies (PromptSendingAttack)

2. **AgentCommandInjectionConverter** (pyrit/prompt_converter/agent_command_injection_converter.py)
   - Reusable converter for AI agent attack payloads
   - Supports 5 injection types: cron, credential_theft, file_read, command_exec, hidden_instruction
   - Configurable complexity levels (low/medium/high)
   - Works with any AI agent platform, not just Moltbot

3. **Unit Tests** (tests/unit/converter/test_agent_command_injection_converter.py)
   - Comprehensive tests for all injection types
   - Tests complexity levels and async conversion
   - 274 lines of test coverage

## Known Vulnerabilities Tested

- Cron job injection (30-second execution windows)
- Credential theft from ~/.clawdbot/ directory
- Backup file exfiltration (.bak.0-.bak.4 files)
- Hidden instruction injection via task descriptions

## Architecture Decision

Uses Scenario pattern (like Cyber/Leakage) rather than creating new orchestrator pattern.
Scenarios are designed to test KNOWN vulnerabilities, which fits perfectly for documented Moltbot CVEs.
- Remove AgentCommandInjectionConverter - too specific for PyRIT's converter philosophy
- Converters should transform text, not generate attack payloads
- Attack payloads now belong directly in test objectives
- Simplifies architecture: objectives → PromptSendingAttack (no converter layer)
- Aligns with PyRIT's pattern: converters transform, objectives define what to test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant