🚀 InferenceX

LLM Deployment Workshop

Deploy your own LLM in minutes.

Hands-on workshop by MLSA KIIT — deploy and chat with an LLM using a stunning web UI.

🎯 What You'll Build

A fully working LLM chat application with a polished dark-glass UI, markdown rendering, and dynamic model detection — pick your deployment path:

	Option	Platform	GPU	Cost	One-Click
☁️	Modal + vLLM	Cloud	A10G	Pay-per-use	`setup_modal.bat`
🏠	Ollama	Your machine	CPU / local GPU	Free	`setup_ollama.bat`

💡 Both options share the same chat interface — the UI auto-detects which backend it's connected to.

⚡ Quick Start

☁️ Cloud (Modal)

setup_modal.bat

🏠 Local (Ollama)

setup_ollama.bat

That's it. The script handles everything — installing tools, dependencies, authentication, deployment, and opens the chat UI.

📋 For step-by-step manual setup, see SETUP.md.

📡 API

Both backends expose the same endpoints:

Method	Endpoint	Description
`GET`	`/ui`	Chat web interface
`GET`	`/info`	Model name & backend info
`POST`	`/generate`	Run inference

POST /generate

curl -X POST http://localhost:8000/generate ^
  -H "Content-Type: application/json" ^
  -d "{\"prompt\":\"What is gravity?\",\"temperature\":0.7,\"max_tokens\":128}"

Response:

{ "response": "Gravity is a fundamental force..." }

Parameter	Type	Default	Description
`prompt`	`string`	(required)	The input prompt
`temperature`	`float`	`0.7`	Creativity (0 = deterministic, 1.5 = creative)
`max_tokens`	`int`	`256`	Max response length

🧠 Supported Models

Ungated (no authentication needed)

Model	ID	Size
SmolLM2 (default)	`HuggingFaceTB/SmolLM2-1.7B-Instruct`	1.7B
Phi-3.5 Mini	`microsoft/Phi-3.5-mini-instruct`	3.8B
TinyLlama	`TinyLlama/TinyLlama-1.1B-Chat-v1.0`	1.1B
StableLM 2	`stabilityai/stablelm-2-zephyr-1_6b`	1.6B

Gated (requires HuggingFace token)

Model	ID	Size
Llama 3.2	`meta-llama/Llama-3.2-3B-Instruct`	3B
Gemma 2	`google/gemma-2-2b-it`	2B
Mistral	`mistralai/Mistral-7B-Instruct-v0.3`	7B

🔒 See TROUBLESHOOTING.md for how to set up HuggingFace authentication.

📂 Project Structure

llm_deployment/
│
├── 🔧 setup_modal.bat       # One-click → cloud deploy
├── 🔧 setup_ollama.bat      # One-click → local deploy
│
├── ☁️  modal_model.py        # Modal app (vLLM + A10G GPU)
├── 🏠 ollama_model.py       # FastAPI server → Ollama
│
├── 🎨 frontend/
│   └── index.html            # Chat interface (markdown, dark UI)
│
├── 📋 SETUP.md               # Detailed setup instructions
├── 🛠️ TROUBLESHOOTING.md     # Common issues & fixes
└── 📖 README.md              # This file

🛠️ How It Works

┌─────────────────────────────────────────────────┐
│           Chat UI (frontend/index.html)         │
│   auto-detects backend · markdown rendering     │
│   fetches model name from /info endpoint        │
└──────────────┬──────────────┬───────────────────┘
               │              │
       *.modal.run        localhost:8000
               │              │
     ┌─────────▼──────┐  ┌───▼───────────┐
     │  modal_model.py │  │ ollama_model  │
     │  vLLM · A10G    │  │ FastAPI proxy │
     │  system prompt  │  └───────┬───────┘
     └─────────────────┘          │
                          ┌───────▼───────┐
                          │    Ollama      │
                          │  local model   │
                          └───────────────┘

❓ Troubleshooting

Having issues? See TROUBLESHOOTING.md for solutions to common problems including:

Gated HuggingFace models
Modal API changes & deprecations
Unicode encoding errors on Windows
Model giving nonsensical responses
Ollama service not running

InferenceX · LLM Deployment Workshop

Organized by MLSA KIIT

Built with Modal · Ollama · vLLM

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
FAQ.md		FAQ.md
README.md		README.md
SETUP.md		SETUP.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
main.py		main.py
modal_model.py		modal_model.py
ollama_model.py		ollama_model.py
pyproject.toml		pyproject.toml
setup.sh		setup.sh
setup_modal.bat		setup_modal.bat
setup_ollama.bat		setup_ollama.bat
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 InferenceX

LLM Deployment Workshop

🎯 What You'll Build

⚡ Quick Start

☁️ Cloud (Modal)

🏠 Local (Ollama)

📡 API

POST /generate

🧠 Supported Models

Ungated (no authentication needed)

Gated (requires HuggingFace token)

📂 Project Structure

🛠️ How It Works

❓ Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 InferenceX

LLM Deployment Workshop

🎯 What You'll Build

⚡ Quick Start

☁️ Cloud (Modal)

🏠 Local (Ollama)

📡 API

POST /generate

🧠 Supported Models

Ungated (no authentication needed)

Gated (requires HuggingFace token)

📂 Project Structure

🛠️ How It Works

❓ Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages