Skip to content

▣ production · gcp · agentic

AI platform for a consumer health startup

Backend, infra, and AI for a consumer health startup: four FastAPI services on Cloud Run, a streaming Claude agent, and a nightly autonomous bug-fix agent.

Period
2026
Role
Consulting engineer — backend, infra, AI
Status
case study
  • 4 on Cloud Run services
  • ≤3/night, autonomous bug-fix PRs
  • #claude
  • #gcp
  • #fastapi
  • #agents
  • #observability
  • #ios

The problem

In 2026 I took a consulting engagement with a consumer health startup (anonymized here). They had a native Swift iOS app and ambitions for an AI layer over personal health data — and no backend, no infrastructure, and no platform team. I was the platform team.

The brief: production-grade auth, HealthKit ingestion, an AI agent that reasons over a user’s own health data, ML analysis, and observability — architected, built, and operated by one engineer, on health data, where sloppiness isn’t an option.

The architecture

Four FastAPI microservices on GCP Cloud Run:

  • Auth — Apple Sign-In with JWT issuance and refresh.
  • AI health agent — Claude behind SSE streaming, running a tool-use agentic loop over the user’s health data. The model decides which queries to run against the user’s data mid-conversation, instead of us pre-loading everything into context.
  • Data sync — HealthKit ingestion from the iOS app.
  • Analysis — an async ML pipeline dispatched through Cloud Tasks, so heavy work never blocks a request path.

Cloud SQL Postgres on private IP, accessed through asyncpg. All infrastructure in Terraform. Credentials in Secret Manager. Sentry instrumented across both the FastAPI backend and the native Swift iOS app, with alert rules and GitHub-integrated code mappings, so every error resolves to a file and line in the repo.

Then the part that makes this a 2026 story: a nightly autonomous bug-fix agent. A Claude agentic loop wakes up, reads the day’s Sentry errors, pulls the stack traces, reads the implicated source, writes targeted fixes, and opens up to three pull requests a night. Every PR waits for human review before merge.

Decisions that mattered

Cloud Run over Kubernetes. One person can’t babysit a cluster. Scale-to-zero, revision-based rollouts, and per-service deploys gave me production behavior with a fraction of the operational surface.

An agentic loop, not a context dump. Health data is large, longitudinal, and personal. Streaming Claude over SSE with tool use means the model fetches only the slices a conversation needs — better answers, lower token spend, and far less sensitive data moving per request.

Observability before automation. The bug-fix agent came last, deliberately. Its output quality is bounded by its input quality: Sentry alert rules and GitHub code mappings had to exist first, because an agent working from vague traces just guesses confidently. With file-and-line-accurate errors tied to source, it patches the actual fault.

Hard caps and a human merge gate. At most three PRs a night, every one human-reviewed. The cap keeps the morning review to a coffee’s worth of reading; the gate keeps accountability with a person. The agent is a force multiplier, not an autonomous committer — that line is the design.

Boring controls from day one. Private-IP database, Secret Manager, everything in Terraform. On health data, retrofitting hygiene is expensive; starting with it is nearly free.

Numbers

  • 4 FastAPI microservices in production on Cloud Run
  • ≤3 autonomous bug-fix PRs per night, 100% human-reviewed before merge
  • 2 platforms instrumented end-to-end in Sentry: Python backend and Swift iOS
  • 1 engineer covering backend, infrastructure, and the AI layer

Lessons

A one-person platform team is now a real configuration — if you spend agents where the toil lives. Overnight triage-and-patch was the highest-leverage slot: the errors are concrete, the context is local, and the human gate makes mistakes cheap. I wake up to proposed fixes instead of a triage queue.

The merge gate is the architecture. “Agent opens PRs, human merges” sounds like a limitation; it’s actually what makes the system deployable on a health product. Autonomy you can audit beats autonomy you have to trust.

And the dependency order matters: infrastructure as code, then observability, then agents. Each layer is what makes the next one trustworthy — Terraform makes the environment reproducible, Sentry makes failures legible, and only then is an agent acting on those failures an asset rather than a liability.