NGen3 Terminal Chat Interface

██╗      ██████╗  ██████╗ █████╗ ██╗      ██████╗ ███████╗███╗   ██╗
██║     ██╔═══██╗██╔════╝██╔══██╗██║     ██╔════╝ ██╔════╝████╗  ██║
██║     ██║   ██║██║     ███████║██║     ██║  ███╗█████╗  ██╔██╗ ██║
██║     ██║   ██║██║     ██╔══██║██║     ██║   ██║██╔══╝  ██║╚██╗██║
███████╗╚██████╔╝╚██████╗██║  ██║███████╗╚██████╔╝███████╗██║ ╚████║
╚══════╝ ╚═════╝  ╚═════╝╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝

NGen3 Terminal Chat Interface

This project provides a full-featured, locally running terminal interface for interacting with the NGen3 Assistant, a flagship model from the NGen3 Series by TNSA AI. It is designed for users who want a powerful, customizable, offline chat experience with advanced debugging, context control, and VRAM-aware model loading.

🚀 Features

Local Model Runtime: Load merged NGen3 models directly from disk.
Dynamic Context Management: Automatically trims conversation history to prevent context overflows.
Custom Stopping Criteria: Supports multi-token stop sequences for clean and controlled outputs.
HTML-Safe Output: Responses are filtered to strip unsafe HTML using bleach.
VRAM Awareness: Displays GPU memory usage before and after model loading.
Configurable Generation Settings: Temperature, top-p, repetition penalty, and more.
Debug Mode: Shows full prompts sent to the model for transparent debugging.

📦 Requirements

Ensure the following Python packages are installed:

pip install torch transformers bleach

📁 Model Setup

Place your merged NGen3 model in a folder and set the path in the script:

MERGED_MODEL_PATH = r"C:/NGen3-7B/0625"

▶️ Running the Script

Run the script with:

python terminal_chat.py

If the model loads successfully, you will see:

Model path
Device information
Effective max context length
VRAM usage

Then the chat interface starts:

You: <your message>
NGen3: <model response>

🔧 Commands

exit, quit, bye → Quit the program
clear, /clear → Reset full conversation history

🧠 Conversation Handling

The script:

Uses a system prompt defining NGen3 Assistant
Manages token budget dynamically
Supports streaming responses
Cleans unwanted stop sequences and special tokens

⚠️ Notes

Ensure the GPU has enough VRAM for the merged model
Prefer BF16 when supported; fallback to FP16 or FP32 based on device
On CPU, inference will be slow

🧹 Unloading

When exiting, the script:

Clears CUDA cache
Frees memory
Resets model and tokenizer

🏁 Summary

This script is ideal for:

Offline inference
Debugging model behavior
Building local AI tools on top of NGen3
Testing merged or fine-tuned models

It gives you powerful control over your model’s inputs, outputs, and system behavior — all inside a simple terminal UI.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NGen3 Terminal Chat Interface

🚀 Features

📦 Requirements

📁 Model Setup

▶️ Running the Script

🔧 Commands

🧠 Conversation Handling

⚠️ Notes

🧹 Unloading

🏁 Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

tnsaai/LocalGen

Folders and files

Latest commit

History

Repository files navigation

NGen3 Terminal Chat Interface

🚀 Features

📦 Requirements

📁 Model Setup

▶️ Running the Script

🔧 Commands

🧠 Conversation Handling

⚠️ Notes

🧹 Unloading

🏁 Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages