Swapnil Surdi
I build production AI systems — RAG pipelines, agentic fleets, and the backend
infrastructure that keeps them fast, cheap, and reliable.
[email protected] · github · linkedin
ingestion serving · rag sync · cron traces · metrics · logs sources docs · wiki · code chunking embedding model client gateway authn · rate limit api semantic cache hit · miss query embedding app services vector db bm25 keyword fusion · rerank cross-encoder context assembly prompt · citations llm claude gpt gemini guardrails pii · safety evals ndcg · mrr watchdog alerts ingestion sources embed vector db serving · rag client api retrieve rerank llm guardrails traces · metrics · logs −30–50%
LLM cost (mcp-cache)
github · swapnilsurdi
16 repos · 9 stars · 698 contributions/yr
may s m t w t f s 2026-05-01: 0 contributions 2026-05-02: 0 contributions 2026-05-03: 0 contributions 2026-05-04: 3 contributions 2026-05-05: 16 contributions 2026-05-06: 0 contributions 2026-05-07: 0 contributions 2026-05-08: 4 contributions 2026-05-09: 7 contributions 2026-05-10: 4 contributions 2026-05-11: 14 contributions 2026-05-12: 15 contributions 2026-05-13: 8 contributions 2026-05-14: 5 contributions 2026-05-15: 4 contributions 2026-05-16: 6 contributions 2026-05-17: 6 contributions 2026-05-18: 8 contributions 2026-05-19: 40 contributions 2026-05-20: 12 contributions 2026-05-21: 4 contributions 2026-05-22: 25 contributions 2026-05-23: 0 contributions 2026-05-24: 1 contribution 2026-05-25: 23 contributions 2026-05-26: 4 contributions 2026-05-27: 12 contributions 2026-05-28: 19 contributions 2026-05-29: 12 contributions 2026-05-30: 11 contributions 2026-05-31: 36 contributions jun s m t w t f s 2026-06-01: 18 contributions 2026-06-02: 28 contributions 2026-06-03: 17 contributions 2026-06-04: 11 contributions 2026-06-05: 33 contributions 2026-06-06: 14 contributions 2026-06-07: 3 contributions 2026-06-08: 2 contributions 2026-06-09: 1 contribution 2026-06-10: 1 contribution 2026-06-11: 0 contributions 2026-06-12: 0 contributions 2026-06-13: 0 contributions 2026-06-14: 0 contributions 2026-06-15: 0 contributions 2026-06-16: 0 contributions 2026-06-17: 0 contributions 2026-06-18: 0 contributions 2026-06-19: 0 contributions 2026-06-20: 0 contributions 2026-06-21: 0 contributions 2026-06-22: 0 contributions 2026-06-23: 0 contributions 2026-06-24: 0 contributions 2026-06-25: 0 contributions 2026-06-26: 0 contributions 2026-06-27: 0 contributions 2026-06-28: 0 contributions 2026-06-29 2026-06-30 less more
claude code · this machine peak 467m
5.3b tokens total · ~146m/day (30d avg)
may s m t w t f s 2026-05-01: 1,397,858 tokens 2026-05-02: no activity 2026-05-03: no activity 2026-05-04: 35,733,883 tokens 2026-05-05: 12,358,174 tokens 2026-05-06: no activity 2026-05-07: no activity 2026-05-08: 32,943,538 tokens 2026-05-09: 310,103,454 tokens 2026-05-10: 380,237,558 tokens 2026-05-11: 138,700,920 tokens 2026-05-12: 29,992,332 tokens 2026-05-13: 38,959,876 tokens 2026-05-14: 46,230,135 tokens 2026-05-15: 46,173,627 tokens 2026-05-16: 4,947,316 tokens 2026-05-17: 5,625,430 tokens 2026-05-18: 191,318,015 tokens 2026-05-19: 260,879,517 tokens 2026-05-20: 70,397,308 tokens 2026-05-21: 24,518,188 tokens 2026-05-22: 158,798,428 tokens 2026-05-23: 53,917,295 tokens 2026-05-24: 55,627,671 tokens 2026-05-25: 219,370,163 tokens 2026-05-26: 124,184,958 tokens 2026-05-27: 311,607,762 tokens 2026-05-28: 466,697,725 tokens 2026-05-29: 234,821,816 tokens 2026-05-30: 222,506,163 tokens 2026-05-31: 333,351,372 tokens jun s m t w t f s 2026-06-01: 248,734,105 tokens 2026-06-02: 323,497,818 tokens 2026-06-03: 122,616,939 tokens 2026-06-04: 115,710,613 tokens 2026-06-05: 269,559,487 tokens 2026-06-06: 79,369,337 tokens 2026-06-07: 157,012,781 tokens 2026-06-08: 40,876,417 tokens 2026-06-09: 59,038,968 tokens 2026-06-10: 54,454,806 tokens 2026-06-11: no activity 2026-06-12: no activity 2026-06-13: no activity 2026-06-14: no activity 2026-06-15: no activity 2026-06-16: no activity 2026-06-17: no activity 2026-06-18: no activity 2026-06-19: no activity 2026-06-20: no activity 2026-06-21: no activity 2026-06-22: no activity 2026-06-23: no activity 2026-06-24: no activity 2026-06-25: no activity 2026-06-26: no activity 2026-06-27: no activity 2026-06-28: no activity 2026-06-29 2026-06-30 less more
as of 10 jun 2026
▣ live · 3 nodes · 22 containers
Three recycled laptops, each operated by its own headless Claude Code agent: a private 22-container homelab that monitors, heals, and reports on itself.
288 watchdog runs/day, zero tokens 4.18s → 18ms status query 22 containers ▣ npm · @hapus/mcp-cache · ★9
A transparent proxy that caches oversized MCP tool responses and hands the model query tools — so any MCP server works past the 25K-token wall.
25K → unlimited token wall −30–50% LLM API cost <200ms cached query ▣ production · HIPAA · 4 yrs
Production agentic RAG over docs, code, Confluence, and Jira for a HIPAA/ISO 13485 platform — compliance retrieval 30s → sub-second, verification 60% faster.
30s → <1s compliance retrieval 60% faster verification A FastAPI service on a fixed 1 vCPU went 1.68 to 69.6 RPS by adding async — before any hardware, workers, or DB tuning. A staged k6 study of throughput.
#python #fastapi #async #performance #benchmarking Choosing an LLM by feel ships regressions you can't see. Picking models with an eval framework instead — latency, cost, accuracy, fit — from production.
#llm-evals #rag #llmops #production-ai A fleet pattern for 24/7 AI agents: one agent per machine as gatekeeper, a star topology, a chat room as the bus, and a subscription instead of metered keys.
#ai-agents #claude-code #fleet #self-hosted #architecture
Looking for the full picture — roles, stack, and the numbers behind the work?
View resume →