AI Language Assistant

An offline, real-time desktop application designed for advanced speech analysis. It provides live transcription, translation, simplification, and deep sentiment/emotion analysis, all powered by local AI models.

The application features a modern, responsive user interface built with ttkbootstrap, a splash screen for a professional loading experience, and a sophisticated streaming pipeline using Voice Activity Detection (VAD) for true real-time performance.

✨ Features

🎙️ Real-Time Transcription: Live speech-to-text using a high-accuracy Whisper model.
🌍 Multilingual Translation: Translates English speech into 22 official Indian languages using Meta AI's NLLB model.
✏️ Text Simplification: Simplifies complex English sentences into easy-to-understand text using a specialized Pegasus model.
😊 Text-based Sentiment Analysis: Automatically analyzes the transcribed text to determine if the sentiment is POSITIVE or NEGATIVE.
😠 Audio-based Emotion Recognition: Analyzes the tone of the speaker's voice to detect emotions like anger, sadness, happiness, or neutrality.
🖥️ Modern & Responsive UI: A beautiful, themeable interface that scales gracefully with window size, including dynamic font adjustments.
✈️ Fully Offline: After an initial setup, the entire application runs without an internet connection, ensuring privacy and accessibility.

🛠️ Tech Stack

This project is built entirely in Python and leverages a suite of powerful open-source libraries.

Core Application:
- Python 3.9+
- Tkinter & ttkbootstrap: For the modern, themeable graphical user interface.
- Threading: For concurrent processing to keep the UI responsive while AI models are running.
AI & Machine Learning:
- PyTorch: The core machine learning framework.
- Transformers (by Hugging Face): Runs the translation, simplification, sentiment, and emotion models.
- Faster-Whisper: A high-performance implementation of OpenAI's Whisper for transcription.
Audio Processing:
- SoundDevice: Captures live audio from the microphone.
- webrtcvad-wheels: A high-performance Voice Activity Detection (VAD) library.
- pydub: For robust audio format conversion.
- FFmpeg: An essential system dependency for advanced audio processing.
- librosa: For audio feature extraction required by the emotion model.

🚀 Installation & Setup Guide

Follow these four steps to get the application running on your local machine.

Step 1: Install FFmpeg Prerequisite

This is a crucial, one-time setup for an audio processing tool.

Windows (Recommended):
1. Open PowerShell as an Administrator.
2. Install the package manager Chocolatey from their official website.
3. Run: choco install ffmpeg
macOS (using Homebrew):
```
brew install ffmpeg
```

Linux (Debian/Ubuntu):

sudo apt update && sudo apt install ffmpeg

Step 2: Set Up the Python Environment

Download Project Files: Place app.py, cache_models.py, and requirements.txt into a new folder.
Create and Activate a Virtual Environment: Open a terminal in your project folder and run:
```
# Windows
python -m venv venv
.\venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate
```
Your terminal prompt should now start with (venv).
Install All Python Libraries: Run this single command:
```
pip install -r requirements.txt
```

Step 3: Pre-Download AI Models (One-Time)

This step downloads all the necessary AI models (several gigabytes) to your computer. This will take a long time but prevents slow downloads when you start the app.

In your terminal (with the virtual environment active), run the caching script:

python cache_models.py

Step 4: Run the Application

Once the models are cached, you are ready to launch the app.

Run the main application script from your terminal:
```
python app.py
```
A "Loading..." splash screen will appear instantly.
After a short wait, the main application window will open, fully functional.

📖 How to Use

Click the "🎤 Start" button. The application will begin listening. The button will pulsate with a red glow to indicate it's active.
Speak clearly into your microphone. The app uses VAD to detect when you start and stop speaking.
When you pause, the transcribed text, sentiment, and detected emotion will appear automatically. The translation will follow shortly after.
To simplify the last sentence you spoke, click the "✨ Simplify" button.
Click the "🛑 Stop" button to end the session.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cache_models.py		cache_models.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Language Assistant

✨ Features

🛠️ Tech Stack

🚀 Installation & Setup Guide

Step 1: Install FFmpeg Prerequisite

Step 2: Set Up the Python Environment

Step 3: Pre-Download AI Models (One-Time)

Step 4: Run the Application

📖 How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Language Assistant

✨ Features

🛠️ Tech Stack

🚀 Installation & Setup Guide

Step 1: Install FFmpeg Prerequisite

Step 2: Set Up the Python Environment

Step 3: Pre-Download AI Models (One-Time)

Step 4: Run the Application

📖 How to Use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages