2026-06-16

Your agents have PhDs, yet they're sharing a laptop.

The thinking scales with your API spend. The experiments still run on your laptop. That's the bottleneck nobody's talking about.

The thinking scales with your API spend. The experiments still run on your laptop. That's the bottleneck nobody's talking about.

Everyone is managing a team now — a rather accomplished one, full of agents with PhDs. The thinking scales with your API spend. The experiments still run on your laptop.

You can get the best models from Anthropic, OpenAI, Google. You can give each agent the right context so it understands your work, your data, your goal. Those two pieces are nearly solved. What you get is an army that can reason, design experiments, and plan at speed — driven by tens of thousands of GPUs running the models you use.

And then every experiment has to run on your laptop.

The resources for thinking come from API keys. But API keys buy you compute for thinking, not for action. So your agents can design at blazing speed, and then wait. The bottleneck isn't the model. It's the execution layer underneath it — and for biological agents, that layer barely exists.

Andrej Karpathy described the same problem for software agents a while back: "The code was the easiest part. Most of the work was in the browser, clicking things." He was talking about deployment pipelines — browser dashboards that don't have APIs, so even capable agents end up in a cursor-and-click loop. Coding agents are ahead of biological agents largely because software has structured APIs, version control, and testable outputs. Biology has messy databases, browser-only interfaces, and tools that expect a human to sit at a terminal. The agent equivalent of "clicking things" in biology is: wait for the laptop.

Take a simple example. An agent proposes running three variant callers on a single sequencing sample — GATK, DeepVariant, Strelka2. Standard practice: no single caller dominates across all variant types, so you run the ensemble and compare. Each takes four to five hours. On a laptop they go one at a time: fifteen hours, machine fully occupied, agent waiting. On a cluster they run in parallel: five hours, and the agent can start planning the annotation step while results are still coming in. This is not a stress test. This is Tuesday morning.

That's the manageable case. Some of what your agents will design can't run on a laptop at all — GPU molecular dynamics, free energy calculations that need 256 GB of memory. Not slowly. Not eventually.

This problem was solved once before — by the teams that built shared compute clusters for exactly this purpose. They called it HPC. The insight was simple: when many jobs need to run concurrently, you need a scheduler to allocate resources fairly, prevent collisions, and avoid memory crashes. Your laptop is just a fixed-capacity cluster with no scheduler. The difference is that today we don't have to accept fixed capacity. We can give our agents elastic cloud compute — scale up for the experiments, scale to zero when they're done.

Clusterra is that HPC layer, built for agents. It is a managed scheduler that fairly allocates elastic cloud compute and storage to each agent's experiments — so agents don't collide, each gets the right resources, and nothing crashes from memory limits — and it runs the scientific tools your agents reach for: molecular dynamics, free energy perturbation, cryo-EM reconstruction, variant calling, and more, from a catalog of open-licensed software with nothing to install or license. Independent experiments run in parallel, dependent ones chain, and the cluster scales to zero when no one is using it. All of it runs inside your own AWS account, under a hard spend cap, so there are no surprise bills. It is the cluster and scheduler an HPC team would have built for you — except elastic, in your cloud, and operated for you, so no one has to wear the cluster-admin hat. Clusterra doesn't remove the runtime of the science; the experiments still take the time they take. What it removes is the infra, the queue, the ops, the sharing fights, and the fixed-capacity ceiling.

SaaS looks like the obvious answer — managed tool endpoints, the same model as getting thinking tokens from the AI labs. But biology doesn't work that way. The open-source field moves faster than any SaaS vendor's release cycle. Look at protein structure prediction: AlphaFold2 set the bar in 2021, AlphaFold3 raised it in 2024 but came access-restricted, and within roughly a year open-licensed models — Boltz-1, then Boltz-2 — had largely closed the gap. A SaaS frozen on a 2023 model now looks dated while the open tools moved on. Locking into a managed endpoint means locking into someone else's update schedule. The right answer is a constantly refreshed catalog of the latest open-licensed tools you run yourself — nothing to license, nothing to install, nothing to wait for.

Because every run goes through Clusterra's scheduler, it carries something else with it: a full provenance and audit trail. When a single human runs a few experiments by hand, they more or less remember what they did. When agents run thousands of them, you lose the thread of what actually ran. And to trust a result — to re-run it, build on it, or defend it in a publication, an IND filing, or a go/no-go decision — you need to know exactly what produced it: which tool, which version, which inputs and parameters, on which data, yielding which output. Agents generating volume make this harder, not easier: you didn't run it, an agent did, possibly at 3am, while you were asleep.

This is also the leash. Letting an agent drive real compute — not just propose it, but execute it — is only safe if every action it takes is captured, repeatable, and reproducible. So Clusterra carries a reproducibility and repeatability track on every run: what was executed, with what, on what, producing what, recorded as you go, so the whole campaign stays auditable and re-runnable. To be honest about what this is: it's provenance and audit of execution. It tells you precisely what ran and lets you run it again. It does not validate the science for you — that judgment stays yours.

Anthropic put it cleanly in their June 2026 research on biological agents: "We want models to be creative when they generate hypotheses, design experiments, or reason about mechanisms. But the layer underneath that creativity has to be boringly reliable." That's not a product pitch. It's a constraint. Creative reasoning on top of unreliable execution is just expensive confusion.

Clusterra is the boringly reliable execution layer. The elastic HPC cluster that lets the experiments actually run, and the audit trail that lets you trust, repeat, and build on what ran — all inside your own AWS account, under your own spend cap. Your PhD-agent team can already think, design, and plan. This is what lets them do the work.