Clusterra Architecture: What Runs in Your AWS Account vs Ours
When teams evaluate Clusterra, one of the first questions is understandably about boundaries: What runs in our AWS account, what runs in yours, and how do they connect?
When teams evaluate Clusterra, one of the first questions is understandably about boundaries: What runs in our AWS account, what runs in yours, and how do they connect?
Clusterra is built around a split architecture that keeps compute, scheduling, and data entirely in the customer’s AWS account, while providing a centralized control plane for identity, coordination, and visibility.
This post explains that split and how the two sides communicate.

What Runs in Your AWS Account
All infrastructure that actually runs jobs or touches data lives in the customer’s AWS account.
This includes:
- Slurm clusters provisioned using AWS ParallelCluster (head node, compute nodes, schedulers, autoscaling)
- Slurmrestd, running on the head node
- All job execution and scheduling
- All storage (FSx for Lustre, EBS, S3, etc.)
- Customer-owned IAM roles, VPCs, and networking
Clusterra is deployed into the customer account using infrastructure-as-code (OpenTofu / ParallelCluster). There is no requirement to expose head nodes publicly or allow inbound access from the internet.
Clusterra does not:
- execute jobs on your behalf
- access job payloads or file systems
- require SSH access to your cluster nodes
What Runs in Clusterra’s AWS Account
Clusterra operates a centralized control plane that provides coordination across users and clusters.
This includes:
- Web console
- Public APIs used by the console, CLI, and integrations
- User authentication and identity (OIDC with Okta / Entra ID)
- Customer configuration metadata
- Event processing and delivery
- Cost and usage aggregation
The control plane does not have direct network access to customer clusters or storage.
How Clusterra Connects to Your Cluster
Clusterra interacts with customer clusters through a modern, serverless networking layer built on AWS VPC Lattice.
At a high level:
- Requests from users hit the Clusterra control plane
- A Clusterra-managed Lambda (the bridge) is invoked
- The bridge connects to the customer cluster privately via AWS VPC Lattice
- All communication terminates at
slurmrestdon the head node
For reporting status back to Clusterra (Upstream):
The cluster uses a hybrid, agentless approach to push events:
1. Job state changes are sent directly to the Clusterra API via lightweight non-blocking HTTP hooks (using standard curl &).
2. Infrastructure events (like node scaling) are captured by CloudWatch and routed securely via Amazon EventBridge.
This architecture removes the need for polling agents, long-running daemons, or complex message queues in your account.
There is no public endpoint on the cluster, no VPN management, and no inbound access from the internet.
Authentication and Authorization
Clusterra does not manage SSH keys or static Linux users.
Instead:
- Users authenticate using customer-managed OIDC (Okta / Entra ID)
- Requests are translated into short-lived Slurm JWTs
slurmrestdvalidates these tokens and forwards requests toslurmctld
This model avoids:
- distributing SSH keys
- maintaining long-lived service credentials
- manual user provisioning on cluster nodes
Access is enforced at the Slurm API layer, not via shell access to the head node.
Events and State Visibility
Clusterra treats cluster state as event-driven.
The platform emits structured events for:
- Job lifecycle (submitted, pending, running, completed, failed)
- Node lifecycle (provisioning, active, drained, terminated)
- User activity
These events enable integrations such as:
- Slack notifications
- CI/CD hooks
- Automated workflows around job completion or failure
- Cost and usage alerts
Events are metadata-only; job inputs and outputs remain in the customer account.
Why This Architecture Matters
This separation is intentional and conservative.
It enables:
- Clear security boundaries: No SSH access, no root privileges, no broad cross-account trust.
- Data sovereignty: Compute and data never leave the customer’s AWS account.
- Operational safety: Centralized visibility without centralizing execution.
- Easier security reviews: Narrow interfaces are easier to reason about than full cluster access.
Summary
Clusterra centralizes control and visibility, not compute.
- Your AWS account runs Slurm, jobs, and storage
- Clusterra runs identity, APIs, events, and coordination
- The two connect privately through a scoped bridge to
slurmrestd
This model allows teams to operate Slurm clusters safely at scale without changing how jobs are run or who owns the infrastructure.
Footnote: Clusterra is built by the former Product Manager for AWS Batch and AWS Parallel Computing Service, informed by operating large-scale HPC systems in production.