Deploy your own LLM in minutes.
Hands-on workshop by MLSA KIIT — deploy and chat with an LLM using a stunning web UI.
A fully working LLM chat application with a polished dark-glass UI, markdown rendering, and dynamic model detection — pick your deployment path:
| Option | Platform | GPU | Cost | One-Click | |
|---|---|---|---|---|---|
| ☁️ | Modal + vLLM | Cloud | A10G | Pay-per-use | setup_modal.bat |
| 🏠 | Ollama | Your machine | CPU / local GPU | Free | setup_ollama.bat |
💡 Both options share the same chat interface — the UI auto-detects which backend it's connected to.
setup_modal.bat
setup_ollama.bat
That's it. The script handles everything — installing tools, dependencies, authentication, deployment, and opens the chat UI.
📋 For step-by-step manual setup, see SETUP.md.
Both backends expose the same endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET |
/ui |
Chat web interface |
GET |
/info |
Model name & backend info |
POST |
/generate |
Run inference |
curl -X POST http://localhost:8000/generate ^
-H "Content-Type: application/json" ^
-d "{\"prompt\":\"What is gravity?\",\"temperature\":0.7,\"max_tokens\":128}"Response:
{ "response": "Gravity is a fundamental force..." }| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
string |
(required) | The input prompt |
temperature |
float |
0.7 |
Creativity (0 = deterministic, 1.5 = creative) |
max_tokens |
int |
256 |
Max response length |
| Model | ID | Size |
|---|---|---|
| SmolLM2 (default) | HuggingFaceTB/SmolLM2-1.7B-Instruct |
1.7B |
| Phi-3.5 Mini | microsoft/Phi-3.5-mini-instruct |
3.8B |
| TinyLlama | TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
1.1B |
| StableLM 2 | stabilityai/stablelm-2-zephyr-1_6b |
1.6B |
| Model | ID | Size |
|---|---|---|
| Llama 3.2 | meta-llama/Llama-3.2-3B-Instruct |
3B |
| Gemma 2 | google/gemma-2-2b-it |
2B |
| Mistral | mistralai/Mistral-7B-Instruct-v0.3 |
7B |
🔒 See TROUBLESHOOTING.md for how to set up HuggingFace authentication.
llm_deployment/
│
├── 🔧 setup_modal.bat # One-click → cloud deploy
├── 🔧 setup_ollama.bat # One-click → local deploy
│
├── ☁️ modal_model.py # Modal app (vLLM + A10G GPU)
├── 🏠 ollama_model.py # FastAPI server → Ollama
│
├── 🎨 frontend/
│ └── index.html # Chat interface (markdown, dark UI)
│
├── 📋 SETUP.md # Detailed setup instructions
├── 🛠️ TROUBLESHOOTING.md # Common issues & fixes
└── 📖 README.md # This file
┌─────────────────────────────────────────────────┐
│ Chat UI (frontend/index.html) │
│ auto-detects backend · markdown rendering │
│ fetches model name from /info endpoint │
└──────────────┬──────────────┬───────────────────┘
│ │
*.modal.run localhost:8000
│ │
┌─────────▼──────┐ ┌───▼───────────┐
│ modal_model.py │ │ ollama_model │
│ vLLM · A10G │ │ FastAPI proxy │
│ system prompt │ └───────┬───────┘
└─────────────────┘ │
┌───────▼───────┐
│ Ollama │
│ local model │
└───────────────┘
Having issues? See TROUBLESHOOTING.md for solutions to common problems including:
- Gated HuggingFace models
- Modal API changes & deprecations
- Unicode encoding errors on Windows
- Model giving nonsensical responses
- Ollama service not running