2026-01-23

SSM Terminal in the Browser: Why Clusterra Gives You a Real Head-Node Shell Without SSH Hell

Clusterra takes a different path: give users a real, interactive shell on the head node — but without exposing SSH, managing keys, or risking credential sprawl.

Traditional HPC clusters force a choice nobody wants: either lock down the head node (no SSH = painful workflows) or open it up (SSH keys everywhere = security nightmare).

Most teams end up with the nightmare. Keys get copied to authorized_keys on the head node (and sometimes compute nodes). Onboarding means manual account creation and key distribution. Offboarding? Pray the keys get revoked before the contractor leaves. Debugging means hoping your VPN/SSH tunnel holds. And every new user adds another long-lived credential to rotate or lose.

Clusterra takes a different path: give users a real, interactive shell on the head node — but without exposing SSH, managing keys, or risking credential sprawl.

The mechanism: AWS Systems Manager (SSM) Session Manager, embedded directly in the Clusterra web console via xterm.js.

No bastion hosts. No public head-node ports. No ~/.ssh/authorized_keys files. Just click "Terminal" and get a native bash session as your mapped Slurm user.

How It Actually Works (Step by Step)

1. Login via your IdP

You authenticate through Okta, Entra ID, or Google (OIDC federation). Clusterra receives a Cognito JWT with your email/sub.

2. Scoped IAM credentials on the fly

Clusterra uses AssumeRoleWithWebIdentity to get temporary AWS credentials tied to your identity. The role is narrowly scoped: only SSM StartSession on your cluster's head node instance ID, plus TerminateSession for your own sessions.

Example IAM policy (provisioned during one-click deploy):

{
  "Effect": "Allow",
  "Action": ["ssm:StartSession"],
  "Resource": [
    "arn:aws:ec2:*:*:instance/${head-node-instance-id}",
    "arn:aws:ssm:*:*:document/AWS-StartInteractiveCommand"
  ],
  "Condition": { "StringEquals": { "ssm:SessionDocumentAccessCheck": "true" } }
}

3. SSM starts the session

Clusterra calls SSM StartSession API → AWS returns a WebSocket endpoint. The console connects via xterm.js, rendering a full terminal in-browser.

4. You're in as your Slurm user

Clusterra maps your OIDC email/sub to the corresponding Slurm user (via short-lived JWT or dynamic association). You land in bash with your home dir (/fsx/home/your.email/), POSIX permissions enforced natively on FSx/EFS.

From here you can: - nano /fsx/scripts/train.sh or vim your job script - sbatch /fsx/scripts/train.sh - squeue -u $USER - sinfo, scontrol show job 12345 - tail -f /fsx/jobs/12345/slurm-12345.out - Even ssh to a compute node if needed (Slurm-internal)

5. Session ends cleanly

Idle timeout (default 20 min), explicit close, or browser tab close → session terminates. All activity logged to CloudWatch Logs/S3 for audit.

Why This Beats Traditional SSH (and Other Workarounds)

No credential sprawl — Zero SSH keys to generate, distribute, or rotate. Revoke access? Just remove from Okta group → instant.
Zero exposed ports — Head node has no public SSH (port 22 closed). SSM uses HTTPS outbound from the instance to AWS.
Scoped & auditable — Each user gets only their cluster. Sessions are logged (who, when, commands). Compliance teams love it.
No VPN hassle — Works from anywhere (office, home, coffee shop) without corporate VPN.
Interactive debugging without friction — Test srun --pty for single-node debug, check I/O with df -h, verify data paths — all before submitting.
Seamless with API/CLI — Edit script in terminal → submit via console form (just paste path) or CLI (clusterra job submit --script-path /fsx/scripts/train.sh).

Real Workflow Wins for AI/ML/Research Teams

Imagine a Bengaluru fine-tuning team:

Priya needs to tweak a Llama script for better mixed-precision. She opens the terminal, edits /fsx/scripts/finetune.sh, runs torchrun --nproc_per_node=1 test_snippet.py interactively to validate, then sbatch the full run.
Raj debugs an OOM: squeue, sees job 56789 running on gpu-3, scontrol show job 56789 for node, then ssh gpu-dy-g5-1 (internal) to nvidia-smi live.
New intern joins: Added to Okta group → terminal access in <5 min, no admin touching the cluster.

No more "can you add my key?" tickets. No more "I lost my key, can you regenerate?" No more shared accounts blurring who did what.

How It Ties into the API Simplification

This isn't a side feature—it's core to making the API feel native.

Simpler job submissions: POST /v1/clusters/{id}/jobs now prefers "script_path": "/fsx/jobs/train.sh" over inline "script" bodies. Less escaping, better version control (git clone in terminal).
Console as first-class citizen: The terminal is embedded in the same UI that shows partitions, job list, quotas. Click a job → "Open Terminal" → jump to its working dir.
No file browser needed (yet): Users create/edit files directly in vim/nano. V2 can add a proper file explorer if demand grows.

Try It in Minutes

Head to https://console.clusterra.cloud, sign in to the free demo cluster (no credit card, no AWS account needed), and click "Terminal."

You'll land in a live bash session on a shared demo head node. Try: - sinfo to see partitions - nano hello.sh → write a quick sbatch script - sbatch hello.sh - squeue

See how fast it feels compared to SSH setup.

Built by the former Product Manager for AWS Batch and AWS Parallel Computing Service.

If you're tired of SSH key roulette or clunky HPC web UIs, this changes the game. Questions on setup, scoping, or how it fits your team? Drop a note—we're iterating based on early user feedback.