Until this month, if you wanted production agents that ran for hours, kept state across runs, and called tools without your team rebuilding the same scaffolding every project, someone had to write that scaffolding. Sandbox isolation. A durable event log. Retry policies. Secret management. Authentication for the tools the agent calls. None of that is hard in the way a research paper is hard. It is the kind of work that takes a senior engineer six weeks to get right, and another six months to keep working as the platforms underneath it shift.
On April 8, Anthropic launched Claude Managed Agents in public beta. The whole runtime layer is now sold as two metered line items: $0.08 per session-hour for the runtime itself, billed to the millisecond, plus standard token rates for whatever the agent thinks. Idle time is free. The sandbox, the persistent file system, the append-only event log, the secret store, and the auth layer are all hosted by Anthropic.
That changes the answer to a question I've been asked twice a week for the last year. "Should we build our own agent platform, or pay a vendor?" The answer used to depend. After April 8, for almost everyone in the small and mid-market band, the answer is "buy."
What you're actually paying for
Most of the "build a custom agent stack" quotes I've seen are pricing runtime and orchestration, not the model. The model bill is identical either way. What moves with the build-vs-buy decision is the work around the model.
Anthropic's managed runtime gives you sessions that survive across calls (a durable, append-only event log that lives outside the model's context window), a sandboxed environment where the agent can read files, run bash, browse the web, and execute code, and an auth layer that handles the tool credentials so you do not have to. Rate limits are generous for the typical SMB workload (60 requests per minute on create endpoints, 600 per minute on reads). Multi-agent and outcome-tracking are gated behind a research-preview signup, which is fine for now.
That set of features is what a homegrown stack actually has to ship. Not the LLM call. Not the prompt template. The plumbing.
The build math, with real numbers
Take a realistic SMB scenario. Four agents in production: one for inbound sales qualification, one for support triage, one for invoice chasing, one for vendor onboarding. Each agent runs hundreds of sessions a month, mostly idle, with bursty active windows when a ticket comes in or a new lead lands.
Building this in-house with a senior engineer who has done it before takes four to six weeks of focused work. Sandbox isolation, state persistence, retry semantics, secret rotation, basic observability, alerting on stuck sessions. At a fully loaded rate of $150 per hour, that is $24,000 to $36,000 of one-time engineering cost. If the engineer has not done it before, double it and add a quarter for the first production incident.
Then ongoing infrastructure. A small VM cluster, a Postgres instance for state, a Redis instance for queues, log shipping to whatever your team uses. Realistic floor on a major cloud is $300 to $500 a month. Add backups, add a staging environment if you have any sense, you are at $500 to $800 a month before anyone has touched the keyboard.
Maintenance is the part most build-quotes leave out. Patching dependencies, handling the next breaking change in the agent framework you picked, chasing down a session that hung because Postgres had a connection-pool problem. Ten to fifteen hours a month of engineer time is realistic. Call it $1,500 to $2,200 a month, every month, forever.
First year, all in: roughly $42,000 to $72,000 for the runtime layer alone. Token costs sit on top of that and would also sit on top of the managed option, so they wash.
The buy math, same scenario
On Managed Agents, the runtime cost is $0.08 per session-hour of actual running time. Idle does not count. Four agents averaging 80 to 120 active hours a month per agent (that is a lot, given that idle is free) is 320 to 480 session-hours, which is $25 to $40 a month. Round up to $50 to leave headroom.
Engineer time to wire it up, end to end, for the same four agents: one to two weeks once your team has done it once. $6,000 to $12,000 of one-time work.
Ongoing maintenance is the part that flips. There is no runtime to patch. Auth, sandbox, and state are someone else's problem. What is left is updating prompts, tweaking tool definitions, and watching dashboards. Two to four hours of engineer work a month is normal. $300 to $600 a month.
First year, all in: roughly $10,000 to $20,000. That is between four and seven times less than the build path, with a faster path to first production.
The failure mode no one is pricing in yet
Session-hour pricing is the line item people stare at. The line item that will surprise you is tokens. Standard rates apply ($3 per million input tokens and $15 per million output tokens for Claude Sonnet 4.6, more for Opus). Anthropic's own worked example is a one-hour Opus session with 50,000 input and 15,000 output tokens, which costs $0.705 total. The session-hour was $0.08 of that. The rest was tokens.
Now imagine an agent that's set to think harder when it is uncertain, runs a web-search tool ($10 per 1,000 searches) on every ambiguous case, and has a long system prompt with a fat context. Per session, you can be at five to ten times the simple example. That is fine when volume is small. It is not fine when a misbehaving agent gets stuck in a tool-call loop at 2 a.m. and you have no spend cap configured.
Practical guard rails to set on day one. A daily token budget per agent, enforced at the Anthropic key level. Alerting when an agent crosses 1.5x its 30-day rolling token average in a single day. A hard maximum on session length, with the session terminated when it hits the ceiling. None of that is exotic. All of it gets skipped on the first deploy and learned the hard way on the second invoice.
Two cases where building still wins
First, regulated workloads where the data cannot leave your environment. Healthcare with PHI, finance with non-public material information, defense or government with classification rules. Anthropic's sandbox is hosted infrastructure. If your compliance posture says agent state has to live inside your VPC, build-your-own is not a preference, it is a requirement. The math does not matter.
Second, teams that already operate a mature agent runtime and are running real volume on it. If you have an internal platform team running ten or more agents, with custom observability tooling and on-call rotation, the marginal agent on your existing platform is close to free. Switching to managed for the eleventh agent does not pay back in the same way it does for a team starting from scratch. That is a small population of companies. Most teams I talk to are not it.
Everyone else is in the buy band. Including most of the teams that got quoted a build last quarter.
A 30-minute exercise to figure out which side you sit on
Open a spreadsheet. Three columns: agent name, expected active hours per month, expected tokens per session. Fill it in for every agent you want in production this year. If you have not built one yet, estimate per agent a typical workflow.
Multiply active hours by $0.08 to get session-hour cost. Multiply tokens by current Anthropic rates. Add the two. That is your monthly managed-agents floor.
Now ask your engineering lead what it would cost to ship the same set of agents on custom infrastructure, with state, sandboxing, and observability included. Get a number for upfront work and a number for monthly maintenance. Multiply the maintenance by twelve and add the upfront cost. That is your annual build cost.
If the build path is more than three times the buy path, and you are not in one of the two cases above, buy. The savings are not the only reason. The bigger reason is that the engineer who would have built the runtime is now free to work on the parts of your product that do not have a vendor.
If you're weighing a managed-agents rollout against a build path that's already underway, or trying to figure out where the spend caps and guard rails should live, reach out. We run these calls for clients most weeks.