Skip to content

isair/jarvis

Repository files navigation

Jarvis

A 100% private AI voice assistant that lives on your computer (works offline). Talk naturally as if Jarvis is a third person in the room — say its name anywhere in your sentence and get conversational, context-aware responses. It remembers everything, always knows the current location and time, can search the web, read your screen, control Chrome, track nutrition, and much more with support for unlimited MCPs and tools without context rot. Sensitive info is automatically redacted before anything is saved to disk.

🔒 100% local processing. No subscriptions. No data harvesting. Automatic redaction of sensitive info. Free offline dictation included.


Support Jarvis GitHub Sponsors Ko-fi


Jarvis Face

Memory Viewer - Conversations Memory Viewer - Nutrition

Why Jarvis?

🔒 Your data stays yours - 100% local AI processing. No cloud, no subscriptions, no data harvesting. Automatic redaction of sensitive info. This is non-negotiable.

🗣️ A third person in the room - Unlike voice assistants that only respond to rigid commands, Jarvis understands conversations. It maintains a short temporary rolling context of what's being discussed, so when you ask "Jarvis, what do you think?" it knows exactly what you're talking about. Have it chime into discussions with friends, help debug code while you talk through problems, or weigh in on decisions.

🧠 Never forgets - Unlimited memory across conversations. Adapts tone naturally to the topic. Learns your preferences over time.

🎙️ Free dictation - Hold a hotkey, speak, release — your words appear in any app as text. Like WisprFlow, but free, offline, and private. No subscription, no cloud transcription.

🔌 Extensible - MCP integration connects Jarvis to thousands of tools: smart home, GitHub, Slack, databases, and more. Smart tool selection means adding more tools won't slow things down.

📊 Transparent progress - We track what works (and what doesn't) with automated evals. See current accuracy →

🚧 Known limitations: Jarvis is under active development. Primary development happens on macOS. Windows/Linux support may lag behind. We're building in the open, issues and contributions welcome!

  • Voice-only for now—no text chat interface yet (#35)
  • No mobile apps (#17)
  • "Stop" commands during speech sometimes get filtered as echo (#24)
  • Dictation is not available on macOS 26+ (Tahoe) due to a pynput incompatibility (#172)
See it in action (example conversations)

Chiming into conversations (the magic moment):

👤 Alice: I wonder what the weather will be like tomorrow
👤 Bob: Yeah, we should check before planning the picnic
👤 Alice: Jarvis, what do you think?
  📝 Heard: "What do you think Jarvis?"
  🧠 Intent (wake word): directed → "what do you think about the weather for the picnic"

✨ Working on it: what do you think about the weather for the picnic
  🧰 Tool: getWeather…
  💬 Generating response...

🤖 Jarvis
Tomorrow looks great for a picnic! Sunny with highs around 22°C...

Jarvis understood the entire conversation and gave a contextual answer — no need to repeat the question.

Browser automation via MCP:

  📝 Heard: "Open YouTube Jarvis."
  🧠 Intent (wake word): directed → "open YouTube"

✨ Working on it: open YouTube
  💬 Generating response...

🤖 Jarvis
I have opened YouTube for you.

Health-aware advice:

  📝 Heard: "Should I order pizza tonight considering my health goals Jarvis?"
  🧠 Intent (wake word): directed → "should I order pizza tonight considering my health goals"

✨ Working on it: should I order pizza tonight considering my health goals
  🧰 Tool: fetchMeals…
  💬 Generating response...

🤖 Jarvis
Looking at your week — you mentioned wanting to lose 5kg by June...
Today you've had about 860 calories so far. Given your 1,800 calorie target,
you've got room for pizza! But maybe consider thin crust to save calories.

Personalised news search:

  📝 Heard: "What are some news from today that might interest me Jarvis?"
  🧠 Intent (wake word): directed → "what are some news from today that might interest me"

✨ Working on it: what are some news from today that might interest me
  🧰 Tool: searchMemory…
  🧰 Tool: webSearch…
  💬 Generating response...

🤖 Jarvis
Here's a quick snapshot of today's headlines...

Natural wake word placement (say "Jarvis" anywhere):

  📝 Heard: "Give me a random topic to discuss Jarvis."
  🧠 Intent (wake word): directed → "give me a random topic to discuss"

✨ Working on it: give me a random topic to discuss
  💬 Generating response...

🤖 Jarvis
How about the Fermi Paradox? Given the vast number of stars...

Echo detection (Jarvis ignores its own speech):

🤖 Jarvis
I have opened YouTube for you.

👂 Listening for follow-up (3s)...
  📝 Heard: "I have opened YouTube for you."
  🔇 Heard (echo): "i have opened youtube for you."
💤 Returning to wake word mode

Quick Install

1. Install Prerequisites

Platform Requirement
All Ollama
Windows Visual C++ Redistributable (most PCs already have this)

2. Download Jarvis

Get the latest from GitHub Releases:

Platform Download Run
Windows Jarvis-Windows-x64.zip Extract → Run Jarvis.exe
macOS Jarvis-macOS-arm64.zip Extract → Move to Applications → Right-click → Open
Linux Jarvis-Linux-x64.tar.gz tar -xzf → Run ./Jarvis/Jarvis

Jarvis starts listening automatically — just say "Jarvis" and talk!

Setup - Initial Check Setup - Model Selection Setup - Whisper Setup - Dictation Setup - MCP Servers Setup - Complete

Real-time Logs

Features

  • Conversational Awareness - Understands ongoing discussions. Ask "Jarvis, what do you think?" and it knows what you're talking about. Works naturally in multi-person conversations.
  • Unlimited Memory - Never forgets. Searches across all your conversation history. Memory Viewer GUI included.
  • Adaptive Tone - Automatically surgical for code, pragmatic for business, encouraging for wellbeing — no manual mode switching
  • Smart Tool Selection - Embedding-based relevance filtering picks only the tools needed per query — add unlimited MCP tools without performance degradation
  • Built-in Tools - Screenshot OCR, web search (with auto-fetch), weather, file access, nutrition tracking, location awareness
  • Natural Voice - Say "Jarvis" anywhere in your sentence, interrupt with "stop", follow up without repeating the wake word
  • Dictation Mode - Free, offline alternative to WisprFlow — hold a hotkey, speak, release to paste text into any app
  • MCP Integration - Connect to thousands of external tools (Home Assistant, GitHub, Slack, etc.)

System Requirements

Hardware VRAM Model
Most users 8GB+ gemma4:e2b (default)
Better quality 16GB+ gemma4:e4b
High-end 24GB+ gpt-oss:20b

Note: VRAM requirements include the intent judge model (gemma4:e2b) which is always loaded alongside the chat model for voice intent classification. The default model shares this, so no extra VRAM is needed.

The setup wizard will guide you through model selection and installation on first launch.

Configuration

Most users won't need to change anything. Open ⚙️ Settings from the tray menu to configure Jarvis through a graphical interface — no JSON editing required. Settings are saved to ~/.config/jarvis/config.json.

Settings Window Settings - MCP Servers

Speech Recognition (Whisper)

Language Modes

  • Multilingual (default, 99 languages): "whisper_model": "medium"
  • English Only (slightly better English accuracy): "whisper_model": "medium.en"

Model Sizes

Model English Multilingual Download VRAM Speed
Tiny tiny.en tiny ~75 MB ~1 GB ~10x
Base base.en base ~140 MB ~1 GB ~7x
Small small.en small ~465 MB ~2 GB ~4x
Medium medium.en medium ~1.5 GB ~5 GB ~2x
Large V3 Turbo - large-v3-turbo ~1.5 GB ~6 GB ~8x

Speed is relative to the original large model. Source

GPU Acceleration (Windows)

If you have an NVIDIA GPU, Jarvis can use CUDA for much faster speech recognition. The Windows installer offers an optional CUDA download during setup. For development:

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

CUDA is detected automatically — no configuration needed.

Voice Interface (Advanced)

LLM Intent Judge - Jarvis uses gemma4:e2b for intelligent voice intent classification (echo detection, query extraction, stop commands). This model is automatically installed alongside your chosen chat model during setup. The intent judge cannot be disabled but gracefully falls back to simpler text matching if Ollama is unavailable.

Dictation Mode — Free WisprFlow Alternative

Hold a hotkey to record speech, release to paste the transcription into any app. Works everywhere — your editor, browser, chat, terminal. Completely local, completely free.

Dictation History Setup Wizard - Dictation

Platform Default hotkey
Windows Ctrl + Win
macOS Ctrl + Option
Linux Ctrl + Alt
  • 🔒 100% offline — your speech never leaves your machine (unlike cloud dictation services)
  • 🧠 Shared Whisper model — uses the same speech recognition as voice input, no extra memory
  • Zero latency startup — no server round-trip, transcription starts the moment you release
  • 📋 Universal paste — works in any app that accepts Ctrl+V / Cmd+V
  • 🔇 Non-intrusive — main voice listener pauses automatically during dictation
  • Hands-free mode — double-tap the hotkey to keep recording without holding; press again or hit Escape to stop
  • 🧹 Filler word removal — optional LLM-powered cleanup removes "um", "uh", "like", "you know" while preserving meaning
  • 📖 Custom dictionary — define "wrong -> right" replacements for jargon, names, and technical terms
  • 📜 History window — browse, copy, or delete past dictations from the system tray
  • 🎛️ Easy setup — configure dictation during the setup wizard or anytime in Settings (hotkey dropdown, filler removal toggle, custom dictionary editor)

Customise the hotkey in Settings or config.json:

{
  "dictation_hotkey": "ctrl+alt",
  "dictation_filler_removal": true,
  "dictation_custom_dictionary": [
    "jarvis -> Jarvis",
    "pytorch -> PyTorch"
  ]
}

Note: macOS requires Accessibility permissions for the global hotkey. Linux requires X11 (limited Wayland support).

Text-to-Speech

Piper TTS (default) - Neural TTS that auto-downloads on first use (~60MB):

  • Works out of the box - no setup required
  • High-quality British English male voice (en_GB-alan-medium)
  • Fast local synthesis with exact duration tracking

To use different Piper voices, download from HuggingFace and set:

{
  "tts_piper_model_path": "~/.local/share/jarvis/models/piper/en_GB-alan-medium.onnx"
}

Chatterbox - AI voice with emotion control (requires running from source):

{ "tts_engine": "chatterbox" }

Voice cloning with Chatterbox - add a 3-10 second .wav sample:

{
  "tts_engine": "chatterbox",
  "tts_chatterbox_audio_prompt": "/path/to/voice.wav"
}
Location Detection

Jarvis can provide location-aware responses (weather, local time, etc.) using a local GeoLite2 database — no cloud geolocation services are used.

IP detection chain (in order of preference):

  1. Manual IP — configure location_ip_address in settings
  2. UPnP — queries your local router (no traffic leaves LAN)
  3. Socket heuristic — determines which interface routes externally (no data sent)
  4. OpenDNS DNS query — single myip.opendns.com lookup to 208.67.222.222 (only external query)

If your ISP uses carrier-grade NAT (CGNAT), Jarvis automatically resolves your true public IP via the same OpenDNS DNS query. This can be disabled:

{
  "location_cgnat_resolve_public_ip": false
}

Setup: Register for a free MaxMind GeoLite2 account, download the City database (MMDB format), and save it to ~/.local/share/jarvis/geoip/GeoLite2-City.mmdb. The setup wizard will guide you through this.

MCP Tool Integration

Connect Jarvis to external tools via MCP servers:

{
  "mcps": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "your-token" }
    }
  }
}

Popular integrations:

  • Home Assistant - Voice control for smart home
  • Google Workspace - Gmail, Calendar, Drive, Docs
  • GitHub - Issues, PRs, workflows
  • Notion - Knowledge management
  • Slack/Discord - Team communication
  • Databases - MySQL, PostgreSQL, MongoDB
  • Composio - 500+ apps in one integration

See full MCP setup guide below.

MCP Integrations

Home Assistant - Smart home voice control
  1. Add MCP Server integration in Home Assistant (Settings → Devices & services)
  2. Expose entities you want to control (Settings → Voice assistants → Exposed entities)
  3. Create Long-lived Access Token (Profile → Security → Create token)
  4. Install proxy: uv tool install git+https://github.com/sparfenyuk/mcp-proxy
  5. Add to config:
{
  "mcps": {
    "home_assistant": {
      "command": "mcp-proxy",
      "args": ["http://localhost:8123/mcp_server/sse"],
      "env": { "API_ACCESS_TOKEN": "YOUR_TOKEN" }
    }
  }
}

"Jarvis, turn on the living room lights" / "set bedroom to 72°" / "run good night scene"

Google Workspace - Gmail, Calendar, Drive, Docs, Sheets
{
  "mcps": {
    "google_workspace": {
      "command": "npx",
      "args": ["-y", "google-workspace-mcp"],
      "env": {
        "GOOGLE_CLIENT_ID": "your-client-id",
        "GOOGLE_CLIENT_SECRET": "your-client-secret"
      }
    }
  }
}

Setup: taylorwilsdon/google_workspace_mcp

GitHub - Repos, issues, PRs, workflows
{
  "mcps": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOKEN": "your-token" }
    }
  }
}
Notion, Slack, Discord, Databases

Notion:

{ "mcps": { "notion": { "command": "npx", "args": ["-y", "@makenotion/mcp-server-notion"], "env": { "NOTION_API_KEY": "your-token" } } } }

Slack:

{ "mcps": { "slack": { "command": "npx", "args": ["-y", "slack-mcp-server"], "env": { "SLACK_BOT_TOKEN": "xoxb-...", "SLACK_USER_TOKEN": "xoxp-..." } } } }

Discord:

{ "mcps": { "discord": { "command": "npx", "args": ["-y", "discord-mcp-server"], "env": { "DISCORD_BOT_TOKEN": "your-token" } } } }

Databases: bytebase/dbhub (SQL), mongodb-mcp-server (MongoDB)

Composio - 500+ apps in one integration
{
  "mcps": {
    "composio": {
      "command": "npx",
      "args": ["-y", "@composiohq/rube"],
      "env": { "COMPOSIO_API_KEY": "your-key" }
    }
  }
}

Get API key at composio.dev

Troubleshooting

Common issues

Jarvis doesn't hear me - Check microphone permissions, speak clearly after "Jarvis"

Responses are slow - Ensure you have enough VRAM (8GB+ for default model; see System Requirements for other models)

Windows: App won't start - Extract full zip first, check Windows Defender

macOS: "App can't be opened" - Right-click → Open, or System Settings → Privacy & Security → Allow

Linux: No tray icon - sudo apt install libayatana-appindicator3-1

For Developers

Running from source
git clone https://github.com/isair/jarvis.git
cd jarvis

# macOS
bash scripts/run_macos.sh

# Windows (with Micromamba)
pwsh -ExecutionPolicy Bypass -File scripts\run_windows.ps1

# Linux
bash scripts/run_linux.sh

Running from source enables Chatterbox TTS (AI voice with emotion/cloning). Piper TTS works in both bundled and source modes.

Privacy hardening (stay 100% offline)
{
  "web_search_enabled": false,
  "mcps": {},
  "location_auto_detect": false,
  "location_cgnat_resolve_public_ip": false,
  "location_enabled": false
}

Verify: sudo lsof -i -n -P | grep jarvis (should only show 127.0.0.1 to Ollama)

Privacy & Storage

  • 100% offline - No cloud services required
  • Auto-redaction - Emails, tokens, passwords automatically removed
  • Local storage - Everything in ~/.local/share/jarvis

License

  • Personal use: Free forever
  • Commercial use: Contact us

Support

Report issues · Discussions · Sponsor

About

A 100% private AI voice assistant that lives on your computer (works offline). Talk naturally as if Jarvis is a third person in the room, and get conversational responses. It remembers everything, knows location and time, can check the web, control Chrome, track nutrition, and more with support for unlimited MCPs / tools without context rot.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Contributors

Languages