Idle agents trigger supervisor shutdown after ~30 minutes

## Problem

With 24 agents (16 manifest + 8 hands) and no active user messages, the daemon shuts itself down every ~30 minutes via SIGTERM from the internal supervisor:

```
INFO openfang_api::server: Received SIGTERM, shutting down...
INFO openfang_kernel::kernel: Shutting down OpenFang kernel...
INFO openfang_kernel::supervisor: Supervisor: initiating graceful shutdown
```

Before the shutdown, the heartbeat monitor logs warnings for every idle agent:

```
WARN openfang_kernel::heartbeat: Agent is unresponsive agent=cfo inactive_secs=210 timeout_secs=180
WARN openfang_kernel::heartbeat: Agent is unresponsive agent=researcher inactive_secs=210 timeout_secs=180
```

The 180s timeout and 30s heartbeat interval appear hardcoded — setting `[timeouts] heartbeat = 86400` in config.toml is accepted by `openfang config set` but has no effect on the monitor.

## Expected behavior

Idle agents should remain available indefinitely. A deployment with no active user traffic should not shut itself down.

## Environment

- v0.5.4 and v0.5.5 Linux x86_64 (both affected, v0.5.5 is actually worse — 7 shutdowns in 35 min vs ~1 per 30 min on v0.5.4)
- 24 agents, 8 hands (hand_interval=3600), 5 MCP servers
- Model routing via OpenAI-compatible proxy (not direct provider API)
- Docker with `restart: unless-stopped`

## Related

Issue #766 was closed as "resolved by heartbeat fixes" in v0.5.4, but the problem persists. The heartbeat still flags idle agents as unresponsive after 180s and eventually the supervisor decides to shut down.

## Workaround

External keepalive cron that sends a lightweight `/ping` message to agents every 2 minutes. This prevents the "unresponsive" warnings but doesn't fully prevent the supervisor SIGTERM (reduced from ~6/hour to ~2/hour).

## Questions

1. Is there a config key to increase or disable the heartbeat timeout? `[timeouts] heartbeat` doesn't seem to work.
2. Is the supervisor shutdown triggered by a threshold of unresponsive agents, or something else?
3. Should agents that haven't received user messages be considered "unresponsive"? They're available and ready — just idle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idle agents trigger supervisor shutdown after ~30 minutes #904

Problem

Expected behavior

Environment

Related

Workaround

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idle agents trigger supervisor shutdown after ~30 minutes #904

Description

Problem

Expected behavior

Environment

Related

Workaround

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions