Swapnil Surdi

I build production AI systems — RAG pipelines, agentic fleets, and the backend infrastructure that keeps them fast, cheap, and reliable.

[email protected] github linkedin

100K+

req/day @ 99.9%

30s → <1s

RAG retrieval

−30–50%

LLM cost (mcp-cache)

8min → 30s

MRI loads

02 — activity

Activity

github · swapnilsurdi

16 repos · 9 stars · 698 contributions/yr

lessmore

claude code · this machinepeak 467m

5.3b tokens total · ~146m/day (30d avg)

lessmore

as of 10 jun 2026

03 — selected work

Selected work

All projects →

▣ live · 3 nodes · 22 containers

LaunchLab Fleet

Three recycled laptops, each operated by its own headless Claude Code agent: a private 22-container homelab that monitors, heals, and reports on itself.

288 watchdog runs/day, zero tokens
4.18s → 18ms status query
22 containers

▣ npm · @hapus/mcp-cache · ★9

MCP-Cache: a transparent cache for any MCP server

A transparent proxy that caches oversized MCP tool responses and hands the model query tools — so any MCP server works past the 25K-token wall.

25K → unlimited token wall
−30–50% LLM API cost
<200ms cached query

▣ production · HIPAA · 4 yrs

Agentic RAG in regulated healthcare

Production agentic RAG over docs, code, Confluence, and Jira for a HIPAA/ISO 13485 platform — compliance retrieval 30s → sub-second, verification 60% faster.

30s → <1s compliance retrieval
60% faster verification

04 — writing

Writing

All posts →

41× from one keyword

Jun 10, 2026

A FastAPI service on a fixed 1 vCPU went 1.68 to 69.6 RPS by adding async — before any hardware, workers, or DB tuning. A staged k6 study of throughput.

#python
#fastapi
#async
#performance
#benchmarking

Evals before vibes

Jun 10, 2026

Choosing an LLM by feel ships regressions you can't see. Picking models with an eval framework instead — latency, cost, accuracy, fit — from production.

#llm-evals
#rag
#llmops
#production-ai

Three laptops, one subscription

Jun 10, 2026

A fleet pattern for 24/7 AI agents: one agent per machine as gatekeeper, a star topology, a chat room as the bus, and a subscription instead of metered keys.

#ai-agents
#claude-code
#fleet
#self-hosted
#architecture

Looking for the full picture — roles, stack, and the numbers behind the work?

View resume →