What Clusterra Deploys in Your AWS Account: Components, Costs, and Permissions
A detailed breakdown of every AWS resource Clusterra deploys in your account, what permissions they require, and how much they cost. For AWS account owners who want to know exactly what they're running.
When you connect a ParallelCluster to Clusterra, we deploy a small set of AWS resources into your account. This post is a complete inventory — every resource, every permission, every cost.
No hand-waving. If you're the AWS account owner reviewing this for security approval, this is your reference.
TL;DR: The Complete Inventory
| Resource | Purpose | Monthly Cost Estimate |
|---|---|---|
| VPC Lattice Service | Exposes Clusterra API | ~$18 + $0.025/GB |
| SQS Queue | Receives job/node events | ~$0.40/million messages |
| SQS Dead Letter Queue | Captures failed messages | ~$0.40/million messages |
| Lambda Function | Ships events to Clusterra API | ~$0.20/million invocations |
| CloudWatch Event Rules (3) | Captures EC2/ASG events | Free |
| IAM Roles (3) | Permissions for above resources | Free |
Total estimated cost: $18-22/month for a typical cluster with moderate job volume.
Architecture Overview
Architecture Overview
(Architecture diagram pending update to reflect VPC Lattice integration)
Resource #1: VPC Lattice Service & Target Group
What it does: Exposes slurmrestd on your head node to the Clusterra control plane securely.
Why it's needed: Replaces complex NLB + PrivateLink setups. VPC Lattice provides application-layer networking that connects Clusterra to your private cluster without exposing it to the internet.
resource "aws_vpclattice_service" "slurm_api" {
name = "clusterra-svc-${var.cluster_id}"
auth_type = "NONE" # Auth handled by Slurm JWTs
}
resource "aws_vpclattice_target_group" "slurm_api" {
name = "clusterra-tg-${var.cluster_id}"
type = "INSTANCE"
config {
port = 6830
vpc_identifier = var.vpc_id
}
}
Cost breakdown: - Service association: ~$0.025/hour × 730 hours = ~$18.25/month (approximated based on region) - Data processing: ~$0.025/GB
Security notes: - No public IP or internet gateway required - Access controlled via IAM Auth Policies - Traffic stays on the AWS backbone
Resource #3: SQS Queue
What it does: Receives events from Slurm hooks and CloudWatch.
Why it's needed: Decouples event generation from event shipping. Hooks write to SQS (fast, async), Lambda processes later.
resource "aws_sqs_queue" "events" {
name = "clusterra-events-${cluster_name}"
visibility_timeout_seconds = 60
message_retention_seconds = 86400 # 1 day
receive_wait_time_seconds = 20 # Long polling
}
Cost breakdown: - First 1M requests/month: Free - After: $0.40/million requests - Typical cluster with 10K jobs/month: < $1/month
Security notes: - No public access - Only Lambda and ParallelCluster instance roles can read/write - Messages are event metadata only — no job content
Resource #4: SQS Dead Letter Queue
What it does: Captures messages that fail processing after 3 attempts.
Why it's needed: Prevents event loss. Failed messages go here for debugging instead of being discarded.
resource "aws_sqs_queue" "events_dlq" {
name = "clusterra-events-${cluster_name}-dlq"
message_retention_seconds = 604800 # 7 days
}
Cost: Same as main queue — typically negligible unless you have chronic failures.
Resource #5: Lambda Function
What it does: Reads from SQS, batches events, POSTs to Clusterra API.
Why it's needed: Serverless event shipper. No agent to install or maintain.
resource "aws_lambda_function" "event_shipper" {
function_name = "clusterra-event-shipper-${cluster_name}"
runtime = "python3.11"
handler = "handler.handler"
timeout = 30
memory_size = 128
environment {
variables = {
CLUSTER_ID = var.cluster_id
TENANT_ID = var.tenant_id
CLUSTERRA_API_URL = "https://api.clusterra.cloud"
}
}
}
Cost breakdown: - Requests: $0.20/million invocations - Duration: $0.0000166667/GB-second - Typical: 128MB × 0.5s × 10K invocations = < $0.10/month
Security notes:
- Execution role has minimal permissions (SQS read only)
- Only makes outbound HTTPS calls to api.clusterra.cloud
- No VPC attachment — uses public internet for API calls
Resource #6: CloudWatch Event Rules (3)
What it does: Captures EC2 and ASG events, forwards to SQS.
Why it's needed: Head node state changes and compute node lifecycle without polling.
# Rule 1: EC2 instance state changes
resource "aws_cloudwatch_event_rule" "ec2_state" {
name = "clusterra-ec2-state-${cluster_name}"
event_pattern = jsonencode({
source = ["aws.ec2"]
detail-type = ["EC2 Instance State-change Notification"]
})
}
# Rule 2: ASG launch/terminate
resource "aws_cloudwatch_event_rule" "asg_events" {
name = "clusterra-asg-${cluster_name}"
event_pattern = jsonencode({
source = ["aws.autoscaling"]
detail-type = ["EC2 Instance Launch Successful", "EC2 Instance Terminate Successful"]
})
}
# Rule 3: Spot interruptions
resource "aws_cloudwatch_event_rule" "spot_interruption" {
name = "clusterra-spot-${cluster_name}"
event_pattern = jsonencode({
source = ["aws.ec2"]
detail-type = ["EC2 Spot Instance Interruption Warning"]
})
}
Cost: Free (EventBridge rules don't cost anything)
IAM Permissions: What Clusterra Can Access
IAM Permissions: What Clusterra Can Access
(Diagram pending update)
IAM Role #1: Cross-Account Role (for Clusterra)
This role allows Clusterra's API to call your slurmrestd:
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": "arn:aws:secretsmanager:*:*:secret:${jwt_secret_name}"
}
What it CAN do: - Read the JWT secret to authenticate with slurmrestd - Nothing else
What it CANNOT do: - Access EC2, S3, EFS, FSx, or any other resource - SSH to any instance - Read job scripts or outputs - Access your VPC networking
IAM Role #2: Lambda Execution Role
{
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes"
],
"Resource": "arn:aws:sqs:*:*:clusterra-events-*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
IAM Role #3: ParallelCluster Instance Role Addition
We add this policy to your existing ParallelCluster instance role:
{
"Effect": "Allow",
"Action": "sqs:SendMessage",
"Resource": "arn:aws:sqs:*:*:clusterra-events-*"
}
This is the only change to your existing ParallelCluster setup.
What Events We Collect
What Events We Collect
(Diagram pending update)
Events are metadata only — never job content, scripts, or outputs.
| Event Type | Data Collected |
|---|---|
job.started |
job_id, user, partition, node, timestamp |
job.completed |
job_id, exit_code, timestamp |
job.failed |
job_id, exit_code, state, timestamp |
node.launched |
instance_id, ASG name, timestamp |
node.terminated |
instance_id, timestamp |
node.spot_interrupted |
instance_id, action, timestamp |
cluster.state.started |
instance_id (head node), timestamp |
cluster.state.stopped |
instance_id, timestamp |
What We Do NOT Have Access To
To be explicit:
| Resource | Clusterra Access |
|---|---|
| Head node SSH | ❌ None |
| Compute node SSH | ❌ None |
| EFS/FSx filesystems | ❌ None |
| S3 buckets | ❌ None |
| Job scripts | ❌ None |
| Job outputs | ❌ None |
| VPC networking | ❌ None |
| EC2 instance control | ❌ None |
The only things Clusterra can do: 1. Call slurmrestd API (via PrivateLink + JWT) 2. Receive events you explicitly send to SQS
Deployment: One Terraform Apply
All resources are deployed via OpenTofu/Terraform:
# Clone and configure
git clone https://github.com/clusterra/clusterra-connect
cd clusterra-connect
cp terraform.tfvars.example terraform.tfvars
# Edit with your values
# Deploy
tofu init
tofu apply
Takes ~5 minutes. Creates all resources in your account.
Summary
| Category | Details |
|---|---|
| Resources created | 7 (NLB, VPC Endpoint, 2 SQS, Lambda, 3 CloudWatch Rules) |
| Monthly cost | ~$16-20 |
| Permissions granted | SQS send/receive, Secrets Manager read (JWT only) |
| Data sent to Clusterra | Event metadata only |
| Data NOT accessible | SSH, filesystems, job content, EC2 control |
Every resource is tagged with ManagedBy: OpenTOFU for easy identification.
Questions about specific permissions or resources? Email security@clusterra.cloud — we're happy to provide detailed scoping for your security review.