Skip to content

Swapnil Surdi

I build production AI systems — RAG pipelines, agentic fleets, and the backend infrastructure that keeps them fast, cheap, and reliable.

[email protected] github linkedin

100K+

req/day @ 99.9%

30s → <1s

RAG retrieval

−30–50%

LLM cost (mcp-cache)

8min → 30s

MRI loads

02 — activity

Activity

github · swapnilsurdi

16 repos · 9 stars · 698 contributions/yr

maysmtwtfs2026-05-01: 0 contributions2026-05-02: 0 contributions2026-05-03: 0 contributions2026-05-04: 3 contributions2026-05-05: 16 contributions2026-05-06: 0 contributions2026-05-07: 0 contributions2026-05-08: 4 contributions2026-05-09: 7 contributions2026-05-10: 4 contributions2026-05-11: 14 contributions2026-05-12: 15 contributions2026-05-13: 8 contributions2026-05-14: 5 contributions2026-05-15: 4 contributions2026-05-16: 6 contributions2026-05-17: 6 contributions2026-05-18: 8 contributions2026-05-19: 40 contributions2026-05-20: 12 contributions2026-05-21: 4 contributions2026-05-22: 25 contributions2026-05-23: 0 contributions2026-05-24: 1 contribution2026-05-25: 23 contributions2026-05-26: 4 contributions2026-05-27: 12 contributions2026-05-28: 19 contributions2026-05-29: 12 contributions2026-05-30: 11 contributions2026-05-31: 36 contributionsjunsmtwtfs2026-06-01: 18 contributions2026-06-02: 28 contributions2026-06-03: 17 contributions2026-06-04: 11 contributions2026-06-05: 33 contributions2026-06-06: 14 contributions2026-06-07: 3 contributions2026-06-08: 2 contributions2026-06-09: 1 contribution2026-06-10: 1 contribution2026-06-11: 0 contributions2026-06-12: 0 contributions2026-06-13: 0 contributions2026-06-14: 0 contributions2026-06-15: 0 contributions2026-06-16: 0 contributions2026-06-17: 0 contributions2026-06-18: 0 contributions2026-06-19: 0 contributions2026-06-20: 0 contributions2026-06-21: 0 contributions2026-06-22: 0 contributions2026-06-23: 0 contributions2026-06-24: 0 contributions2026-06-25: 0 contributions2026-06-26: 0 contributions2026-06-27: 0 contributions2026-06-28: 0 contributions2026-06-292026-06-30
lessmore
claude code · this machinepeak 467m

5.3b tokens total · ~146m/day (30d avg)

maysmtwtfs2026-05-01: 1,397,858 tokens2026-05-02: no activity2026-05-03: no activity2026-05-04: 35,733,883 tokens2026-05-05: 12,358,174 tokens2026-05-06: no activity2026-05-07: no activity2026-05-08: 32,943,538 tokens2026-05-09: 310,103,454 tokens2026-05-10: 380,237,558 tokens2026-05-11: 138,700,920 tokens2026-05-12: 29,992,332 tokens2026-05-13: 38,959,876 tokens2026-05-14: 46,230,135 tokens2026-05-15: 46,173,627 tokens2026-05-16: 4,947,316 tokens2026-05-17: 5,625,430 tokens2026-05-18: 191,318,015 tokens2026-05-19: 260,879,517 tokens2026-05-20: 70,397,308 tokens2026-05-21: 24,518,188 tokens2026-05-22: 158,798,428 tokens2026-05-23: 53,917,295 tokens2026-05-24: 55,627,671 tokens2026-05-25: 219,370,163 tokens2026-05-26: 124,184,958 tokens2026-05-27: 311,607,762 tokens2026-05-28: 466,697,725 tokens2026-05-29: 234,821,816 tokens2026-05-30: 222,506,163 tokens2026-05-31: 333,351,372 tokensjunsmtwtfs2026-06-01: 248,734,105 tokens2026-06-02: 323,497,818 tokens2026-06-03: 122,616,939 tokens2026-06-04: 115,710,613 tokens2026-06-05: 269,559,487 tokens2026-06-06: 79,369,337 tokens2026-06-07: 157,012,781 tokens2026-06-08: 40,876,417 tokens2026-06-09: 59,038,968 tokens2026-06-10: 54,454,806 tokens2026-06-11: no activity2026-06-12: no activity2026-06-13: no activity2026-06-14: no activity2026-06-15: no activity2026-06-16: no activity2026-06-17: no activity2026-06-18: no activity2026-06-19: no activity2026-06-20: no activity2026-06-21: no activity2026-06-22: no activity2026-06-23: no activity2026-06-24: no activity2026-06-25: no activity2026-06-26: no activity2026-06-27: no activity2026-06-28: no activity2026-06-292026-06-30
lessmore

as of 10 jun 2026

03 — selected work

Selected work

All projects →

live · 3 nodes · 22 containers

LaunchLab Fleet

Three recycled laptops, each operated by its own headless Claude Code agent: a private 22-container homelab that monitors, heals, and reports on itself.

  • 288 watchdog runs/day, zero tokens
  • 4.18s → 18ms status query
  • 22 containers

npm · @hapus/mcp-cache · ★9

MCP-Cache: a transparent cache for any MCP server

A transparent proxy that caches oversized MCP tool responses and hands the model query tools — so any MCP server works past the 25K-token wall.

  • 25K → unlimited token wall
  • −30–50% LLM API cost
  • <200ms cached query

production · HIPAA · 4 yrs

Agentic RAG in regulated healthcare

Production agentic RAG over docs, code, Confluence, and Jira for a HIPAA/ISO 13485 platform — compliance retrieval 30s → sub-second, verification 60% faster.

  • 30s → <1s compliance retrieval
  • 60% faster verification

04 — writing

A FastAPI service on a fixed 1 vCPU went 1.68 to 69.6 RPS by adding async — before any hardware, workers, or DB tuning. A staged k6 study of throughput.

  • #python
  • #fastapi
  • #async
  • #performance
  • #benchmarking

Choosing an LLM by feel ships regressions you can't see. Picking models with an eval framework instead — latency, cost, accuracy, fit — from production.

  • #llm-evals
  • #rag
  • #llmops
  • #production-ai

A fleet pattern for 24/7 AI agents: one agent per machine as gatekeeper, a star topology, a chat room as the bus, and a subscription instead of metered keys.

  • #ai-agents
  • #claude-code
  • #fleet
  • #self-hosted
  • #architecture

Looking for the full picture — roles, stack, and the numbers behind the work?

View resume →