2026-03-16

One Cluster for Drug Discovery: Why Computational Biology Teams Shouldn't Manage Three Systems

Drug discovery teams run Nextflow, GROMACS, AlphaFold, and ML training on 2-3 separate systems. Here's how a single managed Slurm cluster handles all of them — at 3x lower cost than AWS HealthOmics.

If you lead computational biology at a Series A–B drug discovery startup, your infrastructure probably looks something like this:

Nextflow pipelines (genomics, transcriptomics) running on AWS Batch or Seqera Platform
Molecular dynamics (GROMACS, AMBER, OpenMM) on AWS ParallelCluster or Rescale
ML workloads (AlphaFold, ESMFold, property prediction) on EC2 instances or SageMaker

Three systems. Three billing models. Three sets of credentials. Three failure modes. And one computational scientist — probably you — managing all of it.

This is the "fragmented HPC" problem, and it's the #1 operational pain point for 15–50 person drug discovery teams. Not because each system is bad — Batch is fine for Nextflow, ParallelCluster works for MPI — but because the combined operational overhead is crushing.

The Real Cost of Fragmentation

It's not just the AWS bill. It's the invisible tax:

Ops overhead: Each system has its own upgrade cycle, security patches, network configuration, and monitoring. ParallelCluster alone requires managing AMIs, VPC subnets, shared storage, and Slurm configuration. Multiply by three systems.

Context switching: Your GROMACS simulation fails at 2 AM. You debug it in ParallelCluster's Slurm logs. Then a Nextflow pipeline fails — different logs, different system, different mental model. Then a SageMaker training job runs out of GPU memory — yet another dashboard.

Cost opacity: You get three separate AWS bills. Attributing compute cost to "Project X — lead optimization" requires manually cross-referencing Batch task IDs, ParallelCluster job IDs, and SageMaker training job ARNs. Nobody does this accurately.

Team friction: The wet-lab biologist who needs to run a quick AlphaFold prediction has to learn SageMaker. The computational chemist who wants to add a Nextflow pre-processing step has to learn Batch. Knowledge silos form around infrastructure, not science.

What If It Was One Cluster?

Clusterra runs all of these workloads on a single managed Slurm cluster. Same scheduler. Same console. Same cost tracking. Same credentials.

Here's what that looks like:

Nextflow Pipelines

Same nf-core pipelines you already run — just point at Clusterra's Slurm executor:

profiles {
  clusterra {
    executor = 'slurm'
    queue = 'cpu-workers'
  }
}

nextflow run nf-core/sarek -profile clusterra --input samples.csv

Karpenter provisions the right instance types automatically. Scale-to-zero when done.

Molecular Dynamics (MPI)

GROMACS and AMBER are native Slurm workloads. Multi-node MPI works out of the box:

#!/bin/bash
#SBATCH --job-name=md-simulation
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --partition=cpu-workers
#SBATCH --time=48:00:00

srun gmx_mpi mdrun -deffnm production -nsteps 50000000

No separate ParallelCluster deployment. No separate AMI management. Same cluster, same Slurm scheduler.

GPU Workloads (AlphaFold, ML Training)

Submit GPU jobs via Slurm's native GRES scheduling:

#!/bin/bash
#SBATCH --job-name=alphafold-predict
#SBATCH --partition=gpu-workers
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G

python run_alphafold.py --fasta_paths=target.fasta --output_dir=./results

Karpenter provisions GPU nodes (g5, p4d, etc.) on Spot when available. Scale to zero when no GPU jobs are queued.

Cost Comparison: Unified Cluster vs Three Systems

For a typical drug discovery team running ~$10,000/month in compute across all workloads:

System	Monthly Cost	Notes
3-system setup
Seqera + Batch (Nextflow)	~$4,000	Seqera compute markup + Batch
ParallelCluster (GROMACS)	~$3,500	Controller + On-Demand nodes
SageMaker (AlphaFold/ML)	~$3,000	ML instance premium
Subtotal	~$10,500	Plus ops time across 3 systems

Clusterra (unified)
One managed cluster on Spot	~$3,500	All workloads on one cluster
Savings	~$7,000/month (67%)	Plus recovered engineering time

The savings come from three sources: 1. Spot pricing passed through (vs Seqera/SageMaker markups) 2. No idle infrastructure (scale-to-zero vs ParallelCluster controller always-on) 3. No platform fees for separate orchestration layers

vs AWS HealthOmics

AWS HealthOmics is trying to solve a similar problem — unified compute for multi-modal biotech. But:

	HealthOmics	Clusterra
Compute	Proprietary "omics instances" (~3x EC2)	AWS Spot pass-through
MPI	Experimental (GROMACS via Nextflow DSL)	Native Slurm MPI
Non-containerized	No	Yes
GPU	Limited instance selection	Any EC2 GPU type via GRES
Lock-in	Complete (proprietary instances)	K8s-portable
Pricing (16 vCPU, 32 GB)	~$1.20/hr	~$0.33/hr

HealthOmics is 3x more expensive for comparable workloads, and doesn't support native MPI — which is essential for molecular dynamics.

vs Rescale

Rescale solves the same unified-platform problem, but for enterprises:

	Rescale	Clusterra
Target	Large pharma, 100+ people	Series A–B, 15–50 people
Pricing	Custom quotes, $50k–$500k/year	BYOC or usage-based SaaS
Min deal	$50k+ ACV	Self-serve, no minimum
Setup	Weeks (enterprise onboarding)	Minutes (self-serve)

If you can afford Rescale, it's a great platform. If you're a 20-person startup, it's not an option.

The Slurm Advantage

Why Slurm as the unified scheduler? Because computational biology teams already know it.

Most comp bio PhDs learned Slurm in grad school. The commands are muscle memory: sbatch, squeue, scancel, sinfo. Moving from an academic cluster to Clusterra is frictionless — same workflow, better infrastructure.

And Slurm natively handles the diversity of workloads in drug discovery: - Batch jobs: Nextflow tasks, data processing scripts - MPI jobs: Multi-node molecular dynamics, quantum chemistry - GPU jobs: Deep learning training, structure prediction - Array jobs: Docking screens, parameter sweeps - Interactive: Jupyter notebooks via srun --pty

No other single scheduler handles all of these as well.

What You Give Up

Transparency matters. Here's what a unified Clusterra cluster doesn't give you (yet):

Data Studios (Seqera): Interactive Jupyter/RStudio environments connected to pipeline outputs. We're working on this.
Enterprise compliance: SOC2, HIPAA, GxP certifications. Not yet available — if you're in a regulated environment, this may be a blocker.
AWS support SLA: Clusterra is a startup. AWS HealthOmics and PCS come with AWS support tiers.
Pre-built pipeline gallery: Seqera's nf-core integration is more polished. You bring your own pipelines to Clusterra.

Who This Is For

If you're a computational biology team that: - Runs 2+ of: Nextflow, GROMACS/AMBER, AlphaFold/ML - Spends $3k–$30k/month on compute - Has 1–3 people managing infrastructure - Wants to spend time on science, not ops

Then consolidating to a single managed Slurm cluster saves money and engineering time.

Try it: Sign up at clusterra.cloud. Run a Nextflow pipeline, then submit a GROMACS job — same cluster, same console, same bill.

Built by the former Product Manager for AWS Batch and AWS Parallel Computing Service.

Managing multiple compute systems for drug discovery? We'd love to hear about your setup — hello@clusterra.cloud.