An offline, real-time desktop application designed for advanced speech analysis. It provides live transcription, translation, simplification, and deep sentiment/emotion analysis, all powered by local AI models.
The application features a modern, responsive user interface built with ttkbootstrap, a splash screen for a professional loading experience, and a sophisticated streaming pipeline using Voice Activity Detection (VAD) for true real-time performance.
- 🎙️ Real-Time Transcription: Live speech-to-text using a high-accuracy Whisper model.
- 🌍 Multilingual Translation: Translates English speech into 22 official Indian languages using Meta AI's NLLB model.
- ✏️ Text Simplification: Simplifies complex English sentences into easy-to-understand text using a specialized Pegasus model.
- 😊 Text-based Sentiment Analysis: Automatically analyzes the transcribed text to determine if the sentiment is POSITIVE or NEGATIVE.
- 😠 Audio-based Emotion Recognition: Analyzes the tone of the speaker's voice to detect emotions like anger, sadness, happiness, or neutrality.
- 🖥️ Modern & Responsive UI: A beautiful, themeable interface that scales gracefully with window size, including dynamic font adjustments.
✈️ Fully Offline: After an initial setup, the entire application runs without an internet connection, ensuring privacy and accessibility.
This project is built entirely in Python and leverages a suite of powerful open-source libraries.
-
Core Application:
- Python 3.9+
- Tkinter & ttkbootstrap: For the modern, themeable graphical user interface.
- Threading: For concurrent processing to keep the UI responsive while AI models are running.
-
AI & Machine Learning:
- PyTorch: The core machine learning framework.
- Transformers (by Hugging Face): Runs the translation, simplification, sentiment, and emotion models.
- Faster-Whisper: A high-performance implementation of OpenAI's Whisper for transcription.
-
Audio Processing:
- SoundDevice: Captures live audio from the microphone.
- webrtcvad-wheels: A high-performance Voice Activity Detection (VAD) library.
- pydub: For robust audio format conversion.
- FFmpeg: An essential system dependency for advanced audio processing.
- librosa: For audio feature extraction required by the emotion model.
Follow these four steps to get the application running on your local machine.
This is a crucial, one-time setup for an audio processing tool.
-
Windows (Recommended):
- Open PowerShell as an Administrator.
- Install the package manager Chocolatey from their official website.
- Run:
choco install ffmpeg
-
macOS (using Homebrew):
brew install ffmpeg
-
Linux (Debian/Ubuntu):
sudo apt update && sudo apt install ffmpeg
-
Download Project Files: Place
app.py,cache_models.py, andrequirements.txtinto a new folder. -
Create and Activate a Virtual Environment: Open a terminal in your project folder and run:
# Windows python -m venv venv .\venv\Scripts\activate # macOS/Linux python3 -m venv venv source venv/bin/activate
Your terminal prompt should now start with
(venv). -
Install All Python Libraries: Run this single command:
pip install -r requirements.txt
This step downloads all the necessary AI models (several gigabytes) to your computer. This will take a long time but prevents slow downloads when you start the app.
In your terminal (with the virtual environment active), run the caching script:
python cache_models.pyOnce the models are cached, you are ready to launch the app.
- Run the main application script from your terminal:
python app.py
- A "Loading..." splash screen will appear instantly.
- After a short wait, the main application window will open, fully functional.
- Click the "🎤 Start" button. The application will begin listening. The button will pulsate with a red glow to indicate it's active.
- Speak clearly into your microphone. The app uses VAD to detect when you start and stop speaking.
- When you pause, the transcribed text, sentiment, and detected emotion will appear automatically. The translation will follow shortly after.
- To simplify the last sentence you spoke, click the "✨ Simplify" button.
- Click the "🛑 Stop" button to end the session.