2026-01-19

Clusterra’s Security & Access Model: No SSH, No Static Users

Slurm clusters were built at a time when a small group of trusted users shared infrastructure. This approach works when clusters are small and teams are static, but breaks down as teams grow.

The problem with SSH-centric clusters

Slurm clusters were built at a time when a small group of trusted users shared infrastructure. Access was granted by logging into the head node, identities were represented as Linux users, and most operational control flowed through the shell.

This approach works when clusters are small and teams are static. It breaks down as soon as clusters are shared across growing teams or automation. SSH access spreads, user accounts accumulate, and permissions become something you infer rather than something you can reason about.

Clusterra starts from a different premise: users should not need access to machines in order to use a scheduler.

Moving the access boundary to the scheduler

Slurm already defines a natural security boundary. It understands users, jobs, partitions, priorities, and quotas. None of these concepts require shell access to a head node.

Clusterra makes this boundary explicit by treating Slurm’s REST interface, slurmrestd, as the primary interaction surface. Job submission, inspection, and cancellation happen through the scheduler API. The operating system becomes an implementation detail rather than the gateway through which every action must pass.

Identity without static Linux users

Traditional Slurm setups encode identity as Linux users. Over time, this leads to manual account creation, UID drift, shared logins, and cleanup procedures that lag reality.

Clusterra externalizes identity. Users authenticate through customer-managed identity providers such as Okta or Entra ID.

When a user interacts with the cluster, Clusterra's Unified Token Provisioning system maps their identity to a short-lived Slurm token. Any necessary OS-level user mappings are provisioned just-in-time via automated systems (SSM), removing the need for administrators to manually manage /etc/passwd or sync LDAP directories. Offboarding is instant: revoke the SSO access, and the user can no longer request tokens.

What “no SSH” actually means

“No SSH” does not mean banning interactive access entirely. It means that SSH is no longer part of the normal user workflow.

Researchers and engineers do not need shell access to submit jobs, inspect queue state, or manage their work. Administrators may still retain SSH access for maintenance, upgrades, or break-glass debugging, but this access is no longer how day-to-day usage is mediated.

Removing SSH from the steady state sharply reduces credential sprawl and operational risk.

Authorization before jobs are queued

Once access is API-driven, authorization becomes explicit rather than implicit.

Clusterra evaluates whether an action is permitted before it reaches Slurm. If a user is not allowed to submit to a given partition or has exceeded their quota, the request fails immediately and the job is never queued. Permissions are enforced at the scheduler boundary, not inferred from shell access or shared accounts.

Execution remains local and unchanged

A core constraint in Clusterra’s design is that execution does not move.

Jobs still run inside the customer’s AWS account. Slurm still schedules and launches workloads. File systems and data never leave the cluster. Clusterra decides who may ask Slurm to perform an action, but it does not participate in how that action is executed.

If Clusterra is unavailable, running jobs continue unaffected and queued jobs remain queued.

Why this model scales

As teams grow, access models based on machine logins become increasingly fragile. Each new user increases operational overhead and risk.

By enforcing access at the scheduler boundary, Clusterra keeps identity centralized, authorization explicit, and trust narrowly scoped. “No SSH” and “no static users” are not constraints; they are what make shared Slurm clusters manageable at scale.

Summary

Clusterra secures Slurm by removing humans from machines and moving access control to the scheduler. By combining API-level interaction, externalized identity, and explicit authorization, it enables teams to share Slurm clusters without relying on SSH or static user management, while leaving execution semantics unchanged.