GitSpeak is an AI-powered Code Quality Intelligence Agent that analyzes code repositories and generates actionable, developer-friendly reports. It goes beyond simple linting to understand code structure, detect real issues, and provide practical insights to help developers understand their codebases.
Live Demo: https://gitspeak-langgraph-mcp1.onrender.com
Video:
- Languages Supported: Python, JavaScript, TypeScript, Java, C/C++, Go, Ruby, PHP, Swift, Shell scripts, R, HTML/CSS, SQL, YAML/JSON, Markdown
- Enhanced Python Support: Advanced AST parsing for precise function and class extraction
- Jupyter Notebook Support: Cell-by-cell analysis of .ipynb files
- Configuration Files: Analysis of Dockerfiles, Makefiles, requirements.txt, package.json
- Accepts URL: GitHub/GitLab
- Private Repository Support: GitHub token integration for private repo access and analysis about pull requests
- Public Repository Fallback: Git clone for public repos when no token available
- Caching System: Persistent storage of analyzed repositories.Used SqliteSaver for that.
- Project Structure Analysis: Automatic detection of project type, frameworks, and languages
Code files Analysis
- Ask about any code files
- Tries to analyze files & releanship with each other
Security Vulnerabilities
- Authentication and authorization flaws
- Input validation issues
- Cryptographic weaknesses
- Dependency vulnerabilities
- Secret exposure detection
Performance Bottlenecks
- CPU and memory inefficiencies
- Database query optimization
- I/O blocking operations
- Algorithmic complexity issues
- Caching opportunities
- For Report,Charts and Mermaid diagram Geneartion
- For summary Generation of whole Project
- Natural Language Processing: Conversational interface for code queries
- Context-Aware Responses: Maintains conversation history and context
- Follow-up Support: Handles clarifications and detailed explanations
- Multi-Domain Queries: Can answer questions spanning security, performance, and quality
- Live Web Application: Deployed on Render with Chainlit UI
- Interactive Interface: Modern web UI with real-time streaming responses
- Image Support: Inline display of charts and diagrams
- Session Management: Persistent conversations and repository caching
- Chart Generation: QuickChart.io integration for metrics visualization
- Bar charts for comparisons
- Line charts for trends
- Pie/Doughnut charts for distributions
- Radar charts for multi-dimensional scores
- Mermaid Diagrams: Advanced diagram generation
- Flowcharts for process visualization
- Class diagrams for OOP relationships
- Sequence diagrams for interaction flows
- Mindmaps for concept hierarchies
- Pull Request Analysis: Comprehensive PR review and impact assessment
- CI/CD Integration: Analysis of workflow runs and check results
- Commit History: Examination of code changes and their implications
- API Access: Full GitHub API integration with rate limiting and error handling
- Vector Store: FAISS-based similarity search for large codebases
- Embeddings: OpenAI text-embedding-3-large for semantic code understanding
- Chunk Management: Intelligent relevance scoring of code units and retry mechanism using refine question node to get betetr code chunks.
- Python AST Analysis: Deep parsing of Python code structures
- Function/Class Extraction: Precise identification of code units with metadata
- Structural Insights: Understanding of code relationships and dependencies
-
Multi-Agent Architecture: Four specialized agents working in coordination
- Code Analyst Agent: Architecture and design pattern analysis
- Security Agent: Vulnerability detection and threat modeling
- Performance Agent: Bottleneck identification and optimization
- Quality Agent: Maintainability assessment and visualization
-
LangGraph Workflow: State-based agent orchestration
-
Dynamic Routing: Intelligent agent selection based on query intent
-
Structured Outputs: Pydantic models for consistent data handling
- Multi-Dimensional Scoring: Security, Performance, Maintainability, Complexity scores (0-100)
- Issue Categorization: Critical, High, Medium, Low priority classification
- Overall Quality Score: Composite metric for codebase health
- Executive Dashboards: High-level metrics with professional styling
- Interactive Charts: Responsive visualizations with tooltips and legends in QuickChart
- Architecture Diagrams: Visual representation of code structure using Mermaid diagrams
flowchart TD
START([User Starts Session]) --> SETUP_CHECK{Existing Repos?}
SETUP_CHECK -->|Yes| FIND_REPOS[find_existing_repositories]
SETUP_CHECK -->|No| ASK_URL[Ask for GitHub URL]
FIND_REPOS --> LOAD_CACHED[load_retriever_data and load_session_data]
LOAD_CACHED --> SETUP_SESSION[setup_repository_session]
ASK_URL --> VALIDATE_URL{Valid GitHub/GitLab URL?}
VALIDATE_URL -->|No| ASK_URL
VALIDATE_URL -->|Yes| BUILD_NEW[setup_new_repository]
BUILD_NEW --> REPO_HASH[get_repo_hash]
REPO_HASH --> CHECK_CACHE{Cache Exists?}
CHECK_CACHE -->|Yes| LOAD_CACHED
CHECK_CACHE -->|No| TOKEN_DECISION{GitHub Token Available?}
TOKEN_DECISION -->|GITHUB_TOKEN exists| API_CLONE[clone_repo_via_api with aiohttp + zipfile]
TOKEN_DECISION -->|No token| GIT_CLONE[git.Repo.clone_from for public repos]
API_CLONE --> EXTRACT_ZIP[zipfile.ZipFile.extractall]
GIT_CLONE --> LOCAL_PATH[Create local_path]
EXTRACT_ZIP --> LOCAL_PATH
LOCAL_PATH --> ANALYZE_STRUCTURE[generate_folder_structure and analyze_project_structure]
ANALYZE_STRUCTURE --> EXTRACT_CODE[extract_code_units with AST parsing]
EXTRACT_CODE --> BUILD_EMBEDDINGS[OpenAI Embeddings text-embedding-3-large]
BUILD_EMBEDDINGS --> CREATE_FAISS[FAISS.from_texts Vector store creation]
CREATE_FAISS --> SAVE_DATA[save_retriever_data and save_session_data]
SAVE_DATA --> SETUP_SESSION
SETUP_SESSION --> READY[Repository Ready]
READY --> USER_QUERY([User Sends Message])
USER_QUERY --> WORKFLOW_START[create_workflow StateGraph AgentState]
WORKFLOW_START --> NODE_REWRITER[question_rewriter with SystemMessage + HumanMessage]
NODE_REWRITER --> NODE_CLASSIFIER[question_classifier OffTopic model]
NODE_CLASSIFIER --> TOPIC_ROUTER{on_topic_router}
TOPIC_ROUTER -->|yes| NODE_RETRIEVER[retriever_node GLOBAL_RETRIEVERS]
TOPIC_ROUTER -->|no| NODE_OFFTOPIC[off_topic cl.Message.send]
NODE_RETRIEVER --> NODE_BATCH[batch_relevant_code_chunk BatchCodeGrading model]
NODE_BATCH --> BATCH_ROUTER{router}
BATCH_ROUTER -->|proceed_to_generate yes| NODE_SELECTION[agent_selection AgentSelection model]
BATCH_ROUTER -->|proceed_to_generate no| NODE_REFINE[refine_question Rephrase query]
BATCH_ROUTER -->|rephrase_count >= 2| NODE_CANNOT[cannot_answer cl.Message.send]
NODE_REFINE --> NODE_RETRIEVER
NODE_SELECTION --> NODE_PRESENT_CODE[present_teller_code_analyst PresentTellerOutput model]
NODE_PRESENT_CODE --> NODE_CODE_ANALYST[code_analyst_agent PR decision logic]
NODE_CODE_ANALYST --> PR_CHECK{PR Analysis Needed?}
PR_CHECK -->|Yes + Token| PR_FETCH[get_pr_tool aiohttp GitHub API]
PR_CHECK -->|No or No Token| PR_SKIP[Skip PR Analysis]
PR_FETCH --> CODE_STREAM[code_analyst_llm.astream CODE_ANALYST_PROMPT]
PR_SKIP --> CODE_STREAM
CODE_STREAM --> CODE_ROUTER{code_analyst_router}
CODE_ROUTER -->|goto_security_agent| NODE_PRESENT_SEC[present_teller_security cl.Message.send]
CODE_ROUTER -->|goto_performance_agent| NODE_PRESENT_PERF[present_teller_performance cl.Message.send]
CODE_ROUTER -->|goto_quality_agent| NODE_PRESENT_QUAL[present_teller_quality cl.Message.send]
CODE_ROUTER -->|No more agents| WORKFLOW_END
NODE_PRESENT_SEC --> NODE_SECURITY[security_agent security_llm.astream SECURITY_AGENT_PROMPT]
NODE_SECURITY --> SEC_ROUTER{security_router}
SEC_ROUTER -->|goto_performance_agent| NODE_PRESENT_PERF
SEC_ROUTER -->|goto_quality_agent| NODE_PRESENT_QUAL
SEC_ROUTER -->|No more agents| WORKFLOW_END
NODE_PRESENT_PERF --> NODE_PERFORMANCE[performance_agent performance_llm.astream PERFORMANCE_AGENT_PROMPT]
NODE_PERFORMANCE --> PERF_ROUTER{performance_router}
PERF_ROUTER -->|goto_quality_agent| NODE_PRESENT_QUAL
PERF_ROUTER -->|No more agents| WORKFLOW_END
NODE_PRESENT_QUAL --> NODE_QUALITY[quality_agent quality_llm structured output QualityAnalysisOutput model]
NODE_QUALITY --> VIZ_DECISION{Visualization Needed?}
VIZ_DECISION -->|should_create_chart| CHART_TOOL[QuickChartTool.create_chart requests.get quickchart.io]
VIZ_DECISION -->|should_create_diagram| MERMAID_TOOL[MermaidTool.create_diagram generate_mermaid_syntax mermaid.ink]
VIZ_DECISION -->|Both needed| BOTH_TOOLS[Create Chart + Diagram]
VIZ_DECISION -->|Neither needed| TEXT_RESPONSE[Text Response Only]
CHART_TOOL --> DISPLAY_CHART[cl.Image Base64 decode Display inline]
MERMAID_TOOL --> DISPLAY_DIAGRAM[cl.Image Base64 decode Display inline]
BOTH_TOOLS --> DISPLAY_BOTH[Display Chart + Diagram]
TEXT_RESPONSE --> DISPLAY_TEXT[cl.Message.send Formatted text]
DISPLAY_CHART --> SAVE_SESSION[save_session_data Update conversation_history Increment analysis_count]
DISPLAY_DIAGRAM --> SAVE_SESSION
DISPLAY_BOTH --> SAVE_SESSION
DISPLAY_TEXT --> SAVE_SESSION
SAVE_SESSION --> WORKFLOW_END[Workflow Complete]
NODE_OFFTOPIC --> WORKFLOW_END
NODE_CANNOT --> WORKFLOW_END
WORKFLOW_END --> USER_QUERY
%% Styling — explicit color ensures visibility in both light and dark mode
classDef setup fill:#2e7d32,stroke:#a5d6a7,stroke-width:2px,color:#ffffff
classDef clone fill:#e65100,stroke:#ffcc80,stroke-width:2px,color:#ffffff
classDef langgraph fill:#1565c0,stroke:#90caf9,stroke-width:3px,color:#ffffff
classDef agents fill:#6a1b9a,stroke:#ce93d8,stroke-width:2px,color:#ffffff
classDef tools fill:#f57f17,stroke:#ffe082,stroke-width:2px,color:#000000
classDef decision fill:#33691e,stroke:#aed581,stroke-width:2px,color:#ffffff
classDef storage fill:#ad1457,stroke:#f48fb1,stroke-width:2px,color:#ffffff
classDef terminal fill:#4e342e,stroke:#bcaaa4,stroke-width:2px,color:#ffffff
class START,READY,USER_QUERY,WORKFLOW_END terminal
class FIND_REPOS,LOAD_CACHED,BUILD_NEW,REPO_HASH,SAVE_SESSION,SAVE_DATA setup
class API_CLONE,GIT_CLONE,EXTRACT_ZIP,LOCAL_PATH,ANALYZE_STRUCTURE,EXTRACT_CODE,BUILD_EMBEDDINGS,CREATE_FAISS clone
class WORKFLOW_START,NODE_REWRITER,NODE_CLASSIFIER,NODE_RETRIEVER,NODE_BATCH,NODE_SELECTION langgraph
class NODE_CODE_ANALYST,NODE_SECURITY,NODE_PERFORMANCE,NODE_QUALITY,NODE_PRESENT_CODE,NODE_PRESENT_SEC,NODE_PRESENT_PERF,NODE_PRESENT_QUAL agents
class CHART_TOOL,MERMAID_TOOL,BOTH_TOOLS,PR_FETCH tools
class SETUP_CHECK,VALIDATE_URL,CHECK_CACHE,TOKEN_DECISION,TOPIC_ROUTER,BATCH_ROUTER,CODE_ROUTER,SEC_ROUTER,PERF_ROUTER,VIZ_DECISION,PR_CHECK decision
class LOAD_CACHED storage
- Framework: LangGraph for agentic workflows
- LLMs: OpenAI GPT-4o models with specialized configurations
- Vector Database: FAISS with OpenAI embeddings (text-embedding-3-large)
- Web Framework: Chainlit for interactive UI
- Visualization: QuickChart.io for charts, Mermaid.ink for diagrams
- Deployment: Render cloud platform
- Monitoring: LangSmith for observability and tracing
- Storage: SQLite for checkpoints and session persistence
- Python 3.10
- OpenAI API key
- GitHub token (optional, for private repos)
OPENAI_API_KEY=openai_api_key
GITHUB_TOKEN=your_github_token # Optional
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT=gitspeak #any name
LANGCHAIN_ENDPOINT=https://api.smith.langchain.comgit clone https://github.com/vinu0404/mini_mini.git
cd mini_mini
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
chainlit run main.pylangchain-openai: LLM integrationlangchain-community: Vector stores and toolslanggraph: Agentic workflow orchestrationchainlit: Web UI frameworkfaiss-cpu: Vector similarity searchpydantic: Data validation and structured outputsgitpython: Git repository handlingaiohttp: Async HTTP clientnbformat: Jupyter notebook parsinglangsmith: Observability and monitoring
- Start the application:
chainlit run main.py
- Navigate to
http://localhost:8000 - Paste a GitHub/GitLab repository URL
- Ask questions about the codebase
- "Show me a security analysis dashboard for this repository"
- "What are the performance bottlenecks in the authentication module?"
- "Create a flowchart showing the data flow in this application"
- "Generate a quality report with metrics visualization"
- "Analyze the latest pull request for security vulnerabilities"
GitSpeak generates rich visualizations (charts and diagrams) that cannot be displayed in terminal environments. The Chainlit web interface provides:
- Inline image display for charts and diagrams
- Real-time streaming responses
- Better user experience for complex interactions
- Session management and conversation history
The system employs four specialized agents for comprehensive analysis:
- Code Analyst: Always executed first for structural understanding
- Security Agent: Invoked for security-related queries
- Performance Agent: Activated for performance concerns
- Quality Agent: Generates final reports with visualizations
- Repository Level: Entire repositories cached after initial analysis
- Session Level: Conversation history and context maintained
- Vector Store: Persistent FAISS indices for fast retrieval
- Graceful fallback for private repos without tokens
- Syntax error handling in AST parsing
- API rate limiting and timeout management
- Structured error reporting with actionable suggestions
GitSpeak integrates LangSmith for comprehensive monitoring:
- Trace Analysis: End-to-end request tracking
- Performance Metrics: Response times and token usage
- Error Monitoring: Exception tracking and debugging
- Usage Analytics: Query patterns and user behavior
- Context-Aware Chart Selection: Automatically chooses appropriate chart types based on query intent
- Professional Styling: Executive-level dashboards with proper color schemes and typography
- Mermaid Integration: Complex diagram generation with syntax validation and auto-correction
- Dynamic Selection: Routes to relevant agents based on query analysis
- Priority Ordering: Executes agents in logical sequence
- Context Sharing: Passes memory between agents for comprehensive analysis
- Pull Request Intelligence: Analyzes PR changes, CI/CD status, and review comments
- Diff Analysis: Understands code changes and their implications
- Token Management: Graceful degradation for public-only access
Live Application: https://gitspeak-langgraph-mcp1.onrender.com
Solution: Implemented agent selection node using boolean state variables (goto_security_agent, goto_performance_agent, goto_quality_agent) in the agent_selection node. The system analyzes query intent and sets appropriate routing flags, allowing the LangGraph workflow to dynamically route to relevant agents based on the specific nature of each user query.
Solution: Enforced strict Mermaid syntax compliance by constraining the LLM output through Pydantic models with detailed field validation. The MermaidDiagramData model ensures proper syntax structure before generating URLs for mermaid.ink API calls
Solution: Created a dedicated batch_relevant_code_chunk node that employs a structured LLM evaluation to classify each retrieved code chunk as "relevant" or "not relevant" based on query intent. Only relevant chunks are passed to subsequent agents, significantly improving analysis accuracy and reducing noise in the multi-agent workflow.
- Multi-language support (6+ languages)
- Quality issue detection (Security, Performance, Code Quality)
- Interactive Q&A system
- Comprehensive reporting
- Web deployment with modern UI
- Rich visualizations (charts + diagrams)
- GitHub/GitLab integration
- RAG implementation with FAISS
- AST parsing for structural analysis
- Advanced agentic patterns with LangGraph
- Automated severity scoring
- Developer-friendly visualizations
- Dual visualization system (charts + diagrams)
- Intelligent agent routing
- Pull request analysis capabilities
- Advanced error handling
- Comprehensive monitoring with LangSmith