Payment-handle validation at 200 TPS

The problem

In 2026 I worked with a payments platform serving India (anonymized here). The hot path is real-time payment-handle validation — checking a VPA before money moves. Normal load sits around 50 TPS, but the platform had to be ready for peaks of 200 TPS. The multiplier that makes that hard: each transaction fans out into 3–4 downstream API calls, so 200 TPS at the front door means roughly 800 requests per second of downstream traffic.

The stack: FastAPI services on AWS Fargate (2 vCPU tasks) with Aurora Serverless v2 in ap-south-1. At 50 TPS, per-request database reads for reference data and auth are invisible. At four times that, with the fan-out on top, they become the bottleneck — and every wasted connection and duplicate write gets multiplied too.

The architecture

the cheapest request is the one you never make — two in-process TTL caches absorb read-mostly lookups so a 200 TPS peak (≈800 req/s downstream) never stampedes Aurora.

Three workstreams, all on the hot path.

In-memory TTL caching for reference data and auth. The lookups that ran on every request — reference data that changes rarely, auth material consulted constantly — moved into per-task in-memory caches with TTLs. No Redis tier: this data is read-mostly, kilobytes in scale, and tolerates short staleness, so in-process is both the fastest and the operationally cheapest option.

This is where the ORM bit me. SQLAlchemy entities must be expunged from their session before they’re cached. Cache a live entity and it keeps a reference to a session that’s about to close; a later request touches a lazy-loaded attribute on the cached object and raises DetachedInstanceError — intermittently, and only on cache hits, which makes it look haunted. The fix is to expunge objects from the session (or map them to plain DTOs) before they enter the cache, so nothing cached holds a pointer to a dead session.

Connection-pool tuning. Aurora has a finite connection budget, and every Fargate task brings its own pool — the real number is tasks × (pool size + overflow). I sized pools against the 2 vCPU task profile and the scale-out ceiling, so a traffic spike adds tasks without stampeding the database, then validated the sizing under load instead of trusting the spreadsheet.

Idempotency on writes. Payments retry — client timeouts, gateway retries, users double-tapping. Every write path got idempotency keys, so a retried request returns the original result instead of creating a second one. At peak, retries aren’t an edge case; they’re a standing fraction of traffic, and “mostly doesn’t double-write” is not a property a payments system gets to have.

Decisions that mattered

In-process cache over a cache tier. Adding Redis would have added a network hop and an operational dependency to serve data measured in kilobytes. TTL staleness is acceptable for reference data; anything correctness-critical stays on the database. Pick the boring tier that matches the data.

Detach before you cache. The DetachedInstanceError lesson generalizes: an ORM entity is not a value, it’s a live view over a session. A cache boundary must hold values. Expunge or convert to DTOs at that boundary, every time.

Size for the fan-out, not the front door. The headline number is 200 TPS, but the system that has to survive is the ~800 req/s behind it. Capacity planning is multiplication, and the multiplier hides downstream.

Validate the target before the peak. The load tests ran at the 200 TPS profile with the full 3–4× downstream fan-out, ahead of peak events — finding the pool exhaustion and cache misses in a test harness instead of during a payment spike.

Numbers

200 TPS peak target — designed and load-validated
~800 req/s of downstream calls at peak (3–4 API calls per transaction)
~50 TPS normal load, i.e. 4× headroom engineered into the hot path
AWS Fargate 2 vCPU tasks + Aurora Serverless v2, ap-south-1

I’m deliberately not quoting measured production-peak latencies here. The honest claim — and the deliverable — is that the hot path was designed for and load-validated at the 200 TPS target before it was needed.

Lessons

The cheapest request is the one you never make. The biggest capacity win wasn’t exotic — it was refusing to re-read unchanging data hundreds of times a second.

ORM session lifecycles and cache lifetimes run on different clocks. Any object that outlives a request must be detached from request-scoped machinery, or it will fail later, intermittently, in the way that’s hardest to debug.

And honest framing is an engineering deliverable. “Load-validated for 200 TPS” is a claim I can defend in a postmortem; an extrapolated production number is not. Payments work, more than most domains, rewards saying exactly what you know — in your systems and in your write-ups.