📌 GraphRank

Distributed Social Graph Influence & Recommendation Engine

GraphRank is a high-performance distributed systems project designed to simulate the "social brain" of professional networking platforms. It processes millions of data points to identify influencers, detect communities, and rank content feeds in real-time.

📖 Overview

In modern social networks, "who you know" and "what you see" are determined by massive graph computations. GraphRank tackles this challenge by combining big data engineering with graph theory.

The system processes:

100,000+ Synthetic Users
1,000,000+ Social Edges (Connections)
500,000+ Interaction Logs (Likes, Shares, Comments)

🛠 Tech Stack

Category	Technology
Data Processing	PySpark (Distributed Batch Processing)
Backend API	FastAPI (Asynchronous Python)
Graph Logic	NetworkX & Custom Adjacency Optimizations
Databases	PostgreSQL (Structured), Redis (Caching Layer)
DevOps	Docker, Docker Compose

🧠 Core Algorithms & Logic

1. Influence Scoring

We calculate user "importance" using a multi-factor weighted formula: $$Influence = 0.4(PageRank) + 0.3(Engagement) + 0.2(Centrality) + 0.1(Recency)$$

2. Feed Ranking Engine

Posts are ranked dynamically to ensure high-quality content discovery using a time-decay model: $$Score = (Weight \cdot Engagement) \times e^{-\lambda \cdot \Delta t}$$ Where $\lambda$ represents the decay constant for content freshness.

3. Recommendation System

"People You May Know" is driven by:

Jaccard Similarity: To find overlap in mutual connections.
Community Detection: Using the Louvain Algorithm to identify industry clusters.

🏗 System Architecture

Synthetic Layer: Generates pseudo-realistic social data.
Spark Layer: Aggregates raw logs and builds the weighted graph.
Graph Engine: Calculates PageRank and clustering coefficients.
API Layer: Serves ranked feeds and recommendations via REST endpoints.

👥 The Team & Responsibilities

Sania (Data Analyst)

Algorithm Design: Lead on PageRank implementation and similarity metrics.
Feature Engineering: Defining interaction weights and engagement scoring.
Validation: Evaluating model performance using Precision@K and Recall.

Ynez (SDE)

System Infrastructure: Dockerization, Redis integration, and API architecture.
Data Pipelines: Optimizing PySpark jobs for large-scale joins and broadcast variables.
Latency Optimization: Ensuring sub-150ms response times for feed generation.

🚀 Quick Start (Development)

Clone the repo:
```
git clone 
cd graphrank
```


2. **Spin up the environment:**
```bash
docker-compose up --build

Run the Spark Pipeline:

docker exec -it graphrank_spark spark-submit /jobs/process_graph.py

📊 Performance Targets

Scale: Support up to 1.5M interaction records.
Latency: < 150ms for recommendation API calls.
Efficiency: 40% reduction in processing time through Spark parallelization.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🌐 API Reference

Base URL (Local Sandbox): http://localhost:8000

Get Health

GET /api/health

Returns pipeline and API status metric.

Top Influencers

GET /api/top-influencers?limit=10

Queries PostgreSQL for users sorted rapidly by pagerank_score.

Recommendations

GET /api/recommendations/{user_id}

Queries people-you-may-know logic relying heavily on a Redis Cache layer to serve predictions < 10ms.

🏗 Operations & Load Testing

Environment Variables

Configure .env in root with the following definitions matching your local Docker mapping:

POSTGRES_USER=admin
POSTGRES_PASSWORD=admin
POSTGRES_DB=graphrank
DB_HOST=postgres
REDIS_HOST=redis
REDIS_PORT=6379

Load Testing with Locust

To certify the Redis layer performs under high concurrency, initiate the swarm logic:

locust -f locustfile.py --host=http://localhost:8000

Navigate to http://localhost:8089 to launch the tests.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
__pycache__		__pycache__
api		api
config		config
data		data
data_gen		data_gen
db		db
frontend-nextjs		frontend-nextjs
frontend		frontend
graph_engine		graph_engine
load		load
models		models
scripts		scripts
spark_jobs		spark_jobs
.env		.env
Dockerfile		Dockerfile
Dockerfile.daemon		Dockerfile.daemon
LICENSE		LICENSE
README.md		README.md
debug_run.log		debug_run.log
docker-compose.yml		docker-compose.yml
docker_build.log		docker_build.log
docker_error.log		docker_error.log
docker_status.txt		docker_status.txt
error.log		error.log
info_docker.txt		info_docker.txt
locustfile.py		locustfile.py
mock_api.py		mock_api.py
nginx.conf		nginx.conf
out.txt		out.txt
out_docker.txt		out_docker.txt
out_docker2.txt		out_docker2.txt
output.log		output.log
requirements.txt		requirements.txt
result.txt		result.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📌 GraphRank

Distributed Social Graph Influence & Recommendation Engine

📖 Overview

🛠 Tech Stack

🧠 Core Algorithms & Logic

1. Influence Scoring

2. Feed Ranking Engine

3. Recommendation System

🏗 System Architecture

👥 The Team & Responsibilities

Sania (Data Analyst)

Ynez (SDE)

🚀 Quick Start (Development)

📊 Performance Targets

📜 License

🌐 API Reference

Get Health

Top Influencers

Recommendations

🏗 Operations & Load Testing

Environment Variables

Load Testing with Locust

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📌 GraphRank

Distributed Social Graph Influence & Recommendation Engine

📖 Overview

🛠 Tech Stack

🧠 Core Algorithms & Logic

1. Influence Scoring

2. Feed Ranking Engine

3. Recommendation System

🏗 System Architecture

👥 The Team & Responsibilities

Sania (Data Analyst)

Ynez (SDE)

🚀 Quick Start (Development)

📊 Performance Targets

📜 License

🌐 API Reference

Get Health

Top Influencers

Recommendations

🏗 Operations & Load Testing

Environment Variables

Load Testing with Locust

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages