A Retrieval-Augmented Generation (RAG) API that allows you to upload PDF documents and ask questions about their content. It uses FastAPI for the backend, Pinecone for vector storage, Sentence Transformers for embeddings, and Groq for fast LLM inference.
graph TD
subgraph Client
User[User]
end
subgraph API["FastAPI Server"]
UploadEndpoint["/upload_pdf"]
AskEndpoint["/ask"]
PDFProcessor["PDF Processor"]
RAGEngine["RAG Engine"]
end
subgraph Services
Embedder["Embedding Model<br/>(all-MiniLM-L6-v2)"]
Pinecone["Pinecone Vector DB"]
LLM["LLM<br/>(Groq)"]
end
%% Upload Flow
User -- Upload PDF --> UploadEndpoint
UploadEndpoint -- Extract & Chunk --> PDFProcessor
PDFProcessor -- Text Chunks --> UploadEndpoint
UploadEndpoint -- Generate Embeddings --> Embedder
Embedder -- Vectors --> UploadEndpoint
UploadEndpoint -- Upsert Vectors --> Pinecone
%% Ask Flow
User -- Ask Question --> AskEndpoint
AskEndpoint -- Process Query --> RAGEngine
RAGEngine -- Embed Query --> Embedder
Embedder -- Query Vector --> RAGEngine
RAGEngine -- Retrieve Context --> Pinecone
Pinecone -- Relevant Chunks --> RAGEngine
RAGEngine -- Prompt + Context --> LLM
LLM -- Answer --> RAGEngine
RAGEngine -- Response --> AskEndpoint
AskEndpoint -- JSON Response --> User
- PDF Ingestion: Upload PDF files to extract text and chunk it for processing.
- Vector Search: Uses Pinecone to store and retrieve relevant text chunks based on semantic similarity.
- Question Answering: Generates answers using Groq's cloud-hosted LLMs (default: Llama 3.3 70B), grounded in the retrieved context.
- Modular Design: Clean separation of concerns (API, Config, PDF Processing, RAG Engine).
- Framework: FastAPI
- Vector Database: Pinecone
- Embeddings:
sentence-transformers/all-MiniLM-L6-v2 - LLM: Groq API (supports Llama 3.3, Mixtral, and other models)
- PDF Processing: pypdf
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install dependencies:
pip install -r requirements.txt pip install python-dotenv
-
Set up Environment Variables: Copy the
.env.examplefile to.envand add your API keys:cp .env.example .env
Then edit
.envand add your keys:PINECONE_API_KEY=your_pinecone_api_key_here GROQ_API_KEY=your_groq_api_key_here
Get your Groq API key from: https://console.groq.com/keys
-
Start the server:
uvicorn main:app --reload
-
API Endpoints:
GET /: Health check. Returns a welcome message.POST /upload_pdf: Upload a PDF file to be indexed.- Body:
form-datawith keypdfand file value.
- Body:
POST /ask: Ask a question about the uploaded PDF.- Body: JSON object
{"query": "Your question here"}
- Body: JSON object
You can also run this application using Docker.
-
Build the Docker image:
docker build -t pdf-rag-app . -
Run the Docker container: Make sure you have your
.envfile set up as described in the Installation section.docker run -p 8000:8000 --env-file .env pdf-rag-app
The API will be available at
http://localhost:8000.
main.py: FastAPI application and route definitions.config.py: Configuration and initialization of services (Pinecone, Models).pdf_processor.py: Logic for extracting and chunking text from PDFs.rag_engine.py: Core RAG logic (Retrieval + Generation).requirements.txt: Python dependencies.