Neko Agent

An AI-powered automation agent that interfaces with Neko servers to perform GUI automation tasks through WebRTC connections.

What is Neko Agent?

Neko Agent is a Python-based automation system that:

Connects to Neko servers via WebRTC for real-time GUI interaction
Uses AI vision models (ShowUI-2B/Qwen2VL) for visual reasoning and action planning
Captures training data automatically during automation sessions
Provides voice synthesis capabilities via F5-TTS and WebRTC audio
Supports various actions like clicking, typing, scrolling, and navigation

Quick Start

Prerequisites

Nix with flakes enabled for the reproducible development shells
NVIDIA GPU (optional) for fast inference – the agent will fall back to CPU/MPS automatically

Prepare your environment

# Copy the sample environment and fill in your credentials
cp .env.example .env
$EDITOR .env

# Enter a development shell (choose the one that fits your hardware)
nix develop            # Default CPU shell
# nix develop .#gpu    # CUDA-enabled shell

The flake also exposes additional shells when needed:

Shell	Purpose
`default`	Standard CPU-oriented development
`gpu`	CUDA 12.8 stack with binary PyTorch
`ai`	Adds common CLI tools (codex/claude/gemini) on top of the default shell
`neko`	Bundles Docker Compose helpers (`neko-services`) for the reference server
`cpu-opt`	Builds Python packages with `znver2` optimisations
`gpu-opt`	CUDA + `znver2` tuned builds
`docs`	mdBook + linkcheck stack for documentation work
`tee`	Tooling for the trusted execution environment pipeline

Start a Neko server (optional)

nix develop .#neko
neko-services up
# The reference stack listens on http://localhost:8080

Run the agent

# Basic usage
uv run src/agent.py --task "Navigate to google.com and search for 'AI automation'"

# REST login flow (the agent will mint a WebSocket token)
uv run src/agent.py \
  --task "Your automation task" \
  --neko-url "http://localhost:8080" \
  --username "user" \
  --password "password"

# Stay online and receive new tasks from chat
uv run src/agent.py --online --neko-url "http://localhost:8080" --username user --password password

# Health-check the configuration without starting the session
uv run src/agent.py --healthcheck

If you prefer python over uv, the commands work the same – the scripts have no side effects at import time.

Helpful `just` targets

The repository includes a justfile with shortcuts. The commands assume you have customised paths (for example NEKO_LOGFILE) to match your machine.

just uv-agent               # Load .env and run the agent via uv
just agent-task "look up cats"  # Fire-and-forget example task (edit paths first!)
just manual                 # Launch the manual REPL controller
just kill-all               # Stop leftover automation processes
just docker-build-generic   # Build the portable Docker image via Nix

Core Components

src/agent.py - Core automation agent with WebRTC integration
src/capture.py - Training data capture service using MosaicML Streaming
src/yap.py - Text-to-speech service with F5-TTS and voice management
src/train.py - Model training on captured data
src/manual.py - Manual control interface

Development Environments

Shell	Purpose	Features
`default`	Basic development	PyTorch CPU, Python deps
`ai`	Extended tooling	Adds Codex/Claude/Gemini CLI utilities
`gpu`	GPU development	CUDA 12.8, GPU-accelerated PyTorch
`cpu-opt`	CPU optimized	Znver2 compiler flags
`gpu-opt`	GPU optimized	Znver2 + CUDA sm_86
`docs`	Documentation	mdbook, linkcheck, mermaid
`neko`	Docker services	Neko server, Docker Compose
`tee`	TEE deployment	Phala Cloud CLI, VMM tools

Container Images

Build reproducible Docker images with attestation metadata:

# Build all images
just build-images

# Build specific variants
nix build .#neko-agent-docker-generic    # Portable CUDA
nix build .#neko-agent-docker-opt        # Optimized (znver2 + sm_86)

# Run with Docker
just docker-run-generic
just docker-run-optimized

TEE Deployment

Deploy to Trusted Execution Environments with attestation:

# Deploy to TEE with reproducible images
nix run .#deploy-to-tee

# Deploy to ttl.sh ephemeral registry
nix run .#deploy-to-ttl 1h

# Build and push to ttl.sh
nix run .#push-to-ttl 24h

Registry Options

# GitHub Container Registry
NEKO_REGISTRY=ghcr.io/your-org nix run .#deploy-to-tee

# Docker Hub
NEKO_REGISTRY=docker.io/your-org nix run .#deploy-to-tee

# Local registry
NEKO_REGISTRY=localhost:5000 nix run .#deploy-to-tee

# ttl.sh (anonymous, ephemeral)
NEKO_REGISTRY=ttl.sh NEKO_TTL=1h nix run .#deploy-to-tee

Local Registry

Start a local OCI registry for development:

# HTTP registry
nix run .#start-registry

# HTTPS registry with Tailscale certificates
nix run .#start-registry-https

# Expose via Tailscale Funnel
nix run .#start-tailscale-funnel

# Expose via Cloudflare Tunnel
nix run .#start-cloudflare-tunnel

Training Data Collection

Enable automatic training data capture:

# Terminal 1: Start capture service
uv run src/capture.py

# Terminal 2: Run the agent (capture will automatically observe chat + frames)
uv run src/agent.py --task "Your task description"

# Train on collected data
uv run src/train.py

Voice Output

Enable voice feedback during automation:

# Terminal 1: Start TTS service (uses F5-TTS)
uv run src/yap.py

# Terminal 2: Run the agent – audio is enabled by default (set NEKO_AUDIO=0 to disable)
uv run src/agent.py --task "Narrate what you see"

The agent streams audio from the Neko session; YAP injects synthesised speech over the same WebRTC connection when it sees /yap chat commands.

Configuration

Create a .env file in the project root (see .env.example for the exhaustive list):

# Neko server connection
NEKO_URL=http://localhost:8080
NEKO_USER=user
NEKO_PASS=password

# Agent behaviour
NEKO_TASK="Search the weather"
NEKO_MODE=web               # web | phone
NEKO_MAX_STEPS=8
NEKO_AUDIO=1                # 0 disables audio negotiation
REFINEMENT_STEPS=5
NEKO_RTCP_KEEPALIVE=0
NEKO_FORCE_EXIT_GUARD_MS=0
NEKO_SKIP_INITIAL_FRAMES=5
NEKO_LOGLEVEL=INFO
NEKO_LOG_FORMAT=text        # text | json
NEKO_METRICS_PORT=9000
# FRAME_SAVE_PATH="./tmp/frame.png"   # optional last-frame snapshot (relative paths land in /tmp/neko-agent)
# CLICK_SAVE_PATH="./tmp/actions"     # optional directory for annotated frames

# GPU configuration
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
TORCH_CUDA_ARCH_LIST=8.6

# Capture settings (optional)
CAPTURE_OUT=./data/mds
CAPTURE_FPS=2.0
CAPTURE_REMOTE=s3://your-bucket/training-data

# TTS settings (optional)
YAP_VOICES_DIR=./voices
YAP_SR=48000
YAP_PARALLEL=2
YAP_MAX_CHARS=350

Documentation

View the full documentation:

# Start documentation server
nix develop .#docs
cd docs && mdbook serve --open

# Or use the app
nix run .#docs-serve

GPU Management

List and manage GPUs for TEE deployment:

# List available GPUs
just gpus

# Deploy with specific GPUs
just deploy-gpu docker-compose.yml "0a:00.0 1a:00.0"

# Deploy without GPU
just deploy docker-compose.yml

Verification

Test your setup:

# Check dependencies
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

# Test agent connection
uv run src/agent.py --healthcheck

# Show running processes
just ps

# Inspect the bundled Neko stack (from the `neko` shell)
neko-services status

Architecture

The system is designed around WebRTC-based communication with Neko containers, allowing for:

Real-time video frame analysis and action execution
Automated training data collection in MosaicML format
Voice feedback through WebRTC audio streams
Scalable deployment patterns for production use

Security & Attestation

All container images are built with reproducible timestamps and include attestation metadata for TEE verification. The deployment system supports:

Digest pinning for supply chain security
Reproducible builds with Nix
TEE attestation via TDX quotes
Multi-registry support for availability

Contributing

Fork the repository
Create a feature branch
Use the appropriate development shell
Run tests and ensure builds pass
Submit a pull request

License

See LICENSE file for details.

Metrics

Prometheus metrics are exposed on NEKO_METRICS_PORT (or $PORT when running on hosting platforms):

neko_frames_received_total
neko_actions_executed_total{action_type="..."}
neko_parse_errors_total
neko_navigation_steps_total
neko_inference_latency_seconds
neko_resize_duration_seconds
neko_reconnects_total

Scrape them with:

curl http://localhost:${NEKO_METRICS_PORT:-9000}/metrics

Support

Documentation: nix run .#docs-serve
Issues: Open an issue on the repository
Development: nix develop .#gpu for the full CUDA-enabled environment

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
docker-compose		docker-compose
docs		docs
extern		extern
nix		nix
overlays		overlays
src		src
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile
noted.md		noted.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neko Agent

What is Neko Agent?

Quick Start

Prerequisites

Prepare your environment

Start a Neko server (optional)

Run the agent

Helpful `just` targets

Core Components

Development Environments

Container Images

TEE Deployment

Registry Options

Local Registry

Training Data Collection

Voice Output

Configuration

Documentation

GPU Management

Verification

Architecture

Security & Attestation

Contributing

License

Metrics

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

G-structure/neko_agent

Folders and files

Latest commit

History

Repository files navigation

Neko Agent

What is Neko Agent?

Quick Start

Prerequisites

Prepare your environment

Start a Neko server (optional)

Run the agent

Helpful just targets

Core Components

Development Environments

Container Images

TEE Deployment

Registry Options

Local Registry

Training Data Collection

Voice Output

Configuration

Documentation

GPU Management

Verification

Architecture

Security & Attestation

Contributing

License

Metrics

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Helpful `just` targets

Packages