Getting started
From zero to a running job in about fifteen minutes. You will connect your AWS account via a cross-account IAM role, let Clusterra provision the compute-side cluster, then submit your first Slurm job from the console.
1. Create your workspace
Visit console.clusterra.cloud and sign in with Google. Your first sign-in creates a workspace using your email domain as the tenant slug. If your domain already has a workspace, you will be invited to join it.
2. Deploy the cross-account IAM role
Clusterra runs your slurmd workers, the edge-agent, and Karpenter inside your AWS account. That requires a cross-account IAM role the central control plane can assume via STS with an external ID.
From the console, choose Connect cluster → Deploy role. The console will render a CloudFormation quick-create link pre-filled with:
- The Clusterra central account ID (the trusted principal).
- An external ID scoped to your cluster (
clusterra-clusXXXX). - The permissions needed to provision a K3s cluster, an EFS filesystem, an S3 bucket for user files, and a VPC peering connection back to the central account.
aws_iam_role + aws_iam_policy_attachment. The
console shows the full policy JSON so you can vendor it into your own
IaC.
3. Connect the cluster
Once the role exists, paste its ARN into the console. Clusterra makes
a pre-flight sts:AssumeRole call to verify the trust
policy and external ID are correct — this catches typos before
the twelve-minute provision begins.
Behind the scenes the central API launches a Kubernetes Job that runs
customer-provision.sh against your account. It:
- Creates a VPC, subnets, and a K3s control-plane EC2 instance.
- Mounts an EFS filesystem that all slurmd pods will share.
- Publishes the K3s join token to SSM Parameter Store.
- Installs Karpenter, Cilium, the edge-agent, and an initial set of slurmd Deployments (one per instance-shape/size combo).
- Brings up VPC peering back to the central account.
- Registers the cluster with the central ArgoCD instance so fleet-level changes can be rolled out per tenant.
When the cluster status flips to running in the console, the central slurmctld sees zero workers — that is normal. Workers scale up on demand when jobs land.
4. Submit your first job
From the Jobs tab, click New job. You have two options:
- From a template. Pick GROMACS, AMBER, or Nextflow from the template catalog and fill in a few parameters. Clusterra renders a Slurm submit request and ships it to slurmrestd.
- Raw. Paste a shell script and set CPUs, memory, partition, and walltime directly.
A minimal raw submission looks like this:
{
"script": "#!/bin/bash\n#SBATCH --cpus-per-task=2\n#SBATCH --mem=4G\nsrun hostname",
"job": {
"name": "hello-world",
"nodes": "1",
"current_working_directory": "/mnt/efs",
"environment": ["PATH=/usr/bin:/bin"]
}
}
The pending job triggers the scaling loop: the central API looks at the resource request, picks a slurmd shape (compute / general / memory family, xs through xlarge), and tells the edge-agent to scale that Deployment up. Karpenter notices the Pending pod, provisions an EC2 node, Cilium hooks it into the VPC, and slurmd registers with slurmctld. Jobs typically start within 60–120 seconds on a cold cluster.
5. Watch it run
Once the job is RUNNING, click into it for live stdout/stderr. Clusterra reads the log file directly from EFS and streams it over a Server-Sent Events connection, so what you see in the console is always current.
When the job finishes, the console shows the exit code, walltime, and a cost estimate based on the EC2 rate for the nodes it ran on.
What to read next
- User guide — the full job lifecycle, including log streaming, file uploads, and quotas.
- Architecture — how the two sides talk, where state lives, what happens during a repave.
- API reference — if you want to drive Clusterra from CI instead of the console.