Skip to content
View kritibehl's full-sized avatar

Highlights

  • Pro

Block or report kritibehl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kritibehl/README.md
╔══════════════════════════════════════════════════════════════════╗
║  kriti@github:~$ whoami                                          ║
║                                                                  ║
║  Kriti Behl — reliability / infrastructure engineer              ║
║  builds systems that reason about when other systems are wrong   ║
╚══════════════════════════════════════════════════════════════════╝

Temporal SDK Azure SDK UF Open to work


$ ls -la projects/

project what it catches stack proof
Faultline stale workers corrupting distributed job state Go · PostgreSQL · Docker 0.0% duplicate commits vs 1.0–2.5% naive · 1,500+ failure scenarios · 0 invariant violations
KubePulse deployment regressions that health probes miss Go · Terraform · K8s · Prometheus p95 drifted 333% with probes still green · blocked AMD MI300X rollout at +608% latency regression
DetTrace++ the exact event where a system became incorrect Python · Go · C++17 · Swift UART IRQ · timer missed-tick · GPIO race · retry storms · cascading failures
FairEval-Suite AI models that regress under real serving load Python · Gemini API blocked candidate with 47.1% p95 regression despite score parity · live on HuggingFace
AutoOps-Insight recurring CI failures before they hit production Python · PostgreSQL · React pattern-grouped failures · confidence-scored release decisions
AccelSim-Lite throughput and latency bottleneck transitions C++ · Python 87% throughput gain · 39% latency reduction · bottleneck shift proven via what-if

$ cat oss.log

[2026-03-02]  PR #2200  MERGED  Temporal Go SDK  fix: goroutine leak in test runtime shutdown
[2026-03-12]  PR #2212  MERGED  Temporal Go SDK  fix: propagation headers missing in mock matcher
[2026-04-20]  PR #2298  MERGED  Temporal Go SDK  fix: async future reports ready while callers blocked
[in review ]  PR #26051 REVIEW  Azure Go SDK     fix: azcore retry policy, errors.Join
[in review ]  PR #26106 REVIEW  Azure Go SDK     feat: W3C Trace Context (traceparent/tracestate)

$ ./identity.sh

> I build systems that reason about when other systems are wrong.

  Faultline   ── proves correctness when distributed workers fail
  KubePulse   ── catches regressions that health probes never see
  DetTrace++  ── isolates the exact event where behavior became wrong
  FairEval    ── blocks AI releases that regress under real serving load
  AutoOps     ── detects CI failures before they become production outages

  M.S. Computer & Information Science · University of Florida · GPA 3.8 ▌

$ cat skills.conf

[languages]
primary  = Go
other    = Python, Java, C++, SQL, Bash, JavaScript

[backend]
apis     = REST, Node.js, Express, PostgreSQL
patterns = fault injection, fencing tokens, distributed job execution

[infrastructure]
cloud    = AWS (EKS, ECS, VPC)
tools    = Kubernetes, Terraform, Docker, Git, GitHub Actions

[observability]
stack    = Prometheus, Grafana, Datadog
practice = SLO enforcement, incident response, replay-based debugging

[focus]
core     = correctness under failure, resilience validation,
           deterministic replay, firmware-style trace analysis

$ ping kriti

email      kriti0608@gmail.com
linkedin   linkedin.com/in/kriti-behl
portfolio  kriti-portfolio-six.vercel.app
location   Gainesville FL → open to full relocation
status     actively searching · SRE / Backend / Infra · OPT/STEM OPT

Seeking SRE · Backend · Infrastructure · Platform roles · M.S. completed Dec 2025

Pinned Loading

  1. faultline faultline Public

    Crash-safe distributed job execution with fencing tokens, lease recovery and deterministic failure validation.

    Python 5

  2. AutoOps-Insight AutoOps-Insight Public

    Reliability analytics for CI failures — recurring signature detection, release-risk reporting, Prometheus metrics, API/CLI and dashboard

    Python 3

  3. dettrace dettrace Public

    Deterministic replay and distributed incident forensics for first-failure and blast-radius analysis.

    C++ 3

  4. KubePulse KubePulse Public

    Kubernetes resilience validation for real recovery behavior, probe integrity and rollout scorecards.

    Python 2