AAV capsid + AAVR co-fold on Clusterra: AAV2 VP3 at TM 0.984, AAV8 post-cutoff TM 0.988, $0.21 per variant on AWS spot
Production Slurm for AAV variant screening: fold AAV2 capsid VP3 trimer + AAVR PKD2 receptor in ~13 minutes per variant for $0.21 of AWS spot, validated against PDB 6IHB. Chain folds at crystal accuracy (AAV2 VP3 TM 0.984, Cα RMSD 1.08 Å; AAVR PKD2 TM 0.91); the 3-fold VP3 trimer assembles correctly with intra-trimer chain-pair ipTM 0.65–0.68. Post-OF3-training-cutoff AAV8 anchor (PDB 9J6Z, deposited August 2024): AAV8 VP3 folds to TM-score 0.988 / Cα RMSD 0.920 Å — generalization confirmed. End-to-end via three Clusterra templates (aav-capsid-receptor-bench-prep → openfold3-batch → aav-complex-metrics) on OpenFold3 0.4.0: $0.132 on L4 spot (monomer fold) or $0.212 on L40S spot (full trimer + receptor). Open-weight, runs in the customer's own AWS account.
Fold the AAV2 capsid VP3 jellyroll + AAVR PKD2 receptor domain to crystal accuracy in ~13 minutes per variant for ~$0.21 of AWS spot on a single L40s, in your own AWS account. All open-weight: OpenFold3 0.4.0 for the trimer + receptor fold, DockQ + TM-score for validation against the PDB 6IHB cryo-EM crystal. The same Slurm queue your comp-bio team already understands.
For AAV gene-therapy variant engineering, this is the structure-prediction substrate for a defensible in silico screen — measured against a published crystal, with per-variant cost and wall-clock you can hand to a CSO.
TL;DR
- 3 templates, 1 workflow.
aav-capsid-receptor-bench-prep(one-time prep) →aav-capsid-receptor-benchworkflow chainsopenfold3-batch→aav-complex-metrics. - Anchors (n=2 measured): PDB 6IHB — AAV2 VP3 + AAVR PKD2, cryo-EM 2.84 Å, deposited 2018 (Meyer et al., Nat Microbiology 2019) — primary reference. PDB 9J6Z — AAV8 capsid + Carboxypeptidase D, cryo-EM 3.02 Å, deposited August 2024 (post OpenFold3 training cutoff) — generalization-test anchor. The remaining two PDBs in the curated anchor panel (6NZ0 AAV2-AAVR PKD1+2, 7UD4 AAV-PHP.eB engineered capsid) ship with the prep template and run as part of the workflow.
- Tools: OpenFold3 0.4.0 — Apache-2 throughout: code at aqlaboratory/openfold-3, weights at OpenFold/OpenFold3 on HuggingFace, Docker image
openfoldconsortium/openfold3:0.4.0. Plus DockQ 2.1.3 (BSD), gemmi 0.6.7 (MPL-2). No license fees, no per-token API. - Compute: AWS L4 spot (
g6.4xlarge) for the monomer fold; L40s spot (g6e.4xlarge) for the full trimer + receptor. Customer's own AWS account. - Per-variant wall-clock:
- Monomer fold (VP3 + AAVR, 627 aa, L4) cold (first submit, ColabFold MSA fetch ~3 min): 13 min 34 s
- Trimer fold (3× VP3 + AAVR, 1,693 aa, L40s) warm (MSAs cached on EFS from the monomer run): 12 min 53 s
- Per-variant cost at AWS us-east-1 spot: $0.132 monomer, $0.212 trimer. ColabFold MSAs cache to EFS after first fetch — every subsequent same-sequence submit on the same cluster amortizes the MSA cost to zero.
Chain folds and trimer assembly
OpenFold3 folds both chains and the icosahedral 3-fold spike to crystal accuracy.
Chain-level structural quality vs crystal
| Chain | Source | Crystal | TM-score | Cα RMSD | Paired CA |
|---|---|---|---|---|---|
| AAV2 VP3 chain A | trimer fold | 6IHB | 0.9843 | 1.082 Å | 517 |
| AAV2 VP3 chain B | trimer fold | 6IHB | 0.9839 | 1.095 Å | 517 |
| AAV2 VP3 chain C | trimer fold | 6IHB | 0.9845 | 1.076 Å | 517 |
| AAV2 AAVR PKD2 | monomer fold | 6IHB | 0.9117 | 1.451 Å | 94 |
| AAV2 AAVR PKD2 | trimer fold | 6IHB | 0.9000 | 1.555 Å | 94 |
| AAV8 VP3 (post-cutoff) | monomer fold¹ | 9J6Z | 0.9883 | 0.920 Å | 475 |
¹ AAV8 was run as a single VP3 monomer fold for the chain-fold generalization test; the trimer-assembly ipTM for AAV8 on 9J6Z is not yet measured and is planned with the n=10 panel expansion. AAV8 receptor in 9J6Z is human Carboxypeptidase D (gene CPD, 470 aa) — confirmed from the deposited structure.
A TM-score of 0.984 on a 533-residue capsid jellyroll is "essentially the crystal" — well into the >0.9 band that structural biologists treat as native-like topology. The post-OF3-training-cutoff datapoint matters most for credibility: on PDB 9J6Z (AAV8 capsid, cryo-EM deposited August 2024, after OpenFold3's training data was frozen), AAV8 VP3 folds to TM 0.988 / RMSD 0.920 Å — actually tighter than the AAV2 reference, on a structure the model could not have memorised. That's a clean generalization-test pass across two AAV serotypes spanning six years of structural data. For variant-screening purposes, OpenFold3 has the structural resolution to in principle detect loop-destabilization failures in surface-engineered AAV variants; a variant-discrimination benchmark on a designed-variant set matched to wet-lab assembly assays is the next deliverable.
Trimer assembly (the 3-fold capsid spike)
| Chain pair | Trimer fold chain-pair ipTM |
|---|---|
| A ↔ B | 0.6836 |
| A ↔ C | 0.6601 |
| B ↔ C | 0.6666 |
Intra-trimer ipTM of 0.65–0.68 sits squarely in the "good multimeric prediction" band (ipTM > 0.5 is the accepted threshold for confident chain assembly; > 0.7 is near-native). The three VP3 chains lock into the correct 3-fold spike geometry — the same surface architecture where AAVR, glycans, and engineered tropism peptides actually bind the capsid in vivo. The model produces this without templates, without symmetry constraints, and without an explicit assembly step.
Combined with each chain's individual TM-score of 0.984, the trimer assembly is a near-native prediction of the AAV2 icosahedral spike — built end-to-end from sequence in one Slurm job on a single L40s.
The three templates
The benchmark composes three Clusterra templates:
| # | Stage | Template | Input | Output |
|---|---|---|---|---|
| 0 | Prep (one-time per anchor set, run separately) | aav-capsid-receptor-bench-prep |
PDB IDs (default: 6IHB, 6NZ0, 9J6Z, 7UD4) | Per-complex OF3 inputs (4-chain JSON: 3× VP3 + receptor) + crystal CIFs + manifest.csv + per-complex chain assignments under /mnt/efs/bench/aav-capsid-receptor/ |
| 1 | Fold | openfold3-batch (array) |
OF3 input JSONs from prep | Per-complex predicted CIFs + per-residue/per-chain/interface confidence JSON |
| 2 | Score | aav-complex-metrics |
Predictions dir + crystals dir + manifest | Per-chain TM-score, ipTM, pLDDT, and DockQ — aggregated outputs.json + per_complex.csv. DockQ is reported as a chain-fold-context hint; calibrated binding-pose metrics require the Pack's stage-4 Rosetta refinement (see honest caveats). |
The aav-capsid-receptor-bench workflow chains stages 1 + 2 (fold → score with an afterok Slurm dependency) — the same convention as tcr-pmhc-bench. Stage 0 prep is a one-time-per-cluster fixture job submitted separately before the workflow; it stages the curated anchor panel onto shared storage once, and subsequent workflow runs (and per-variant submits) read from the staged fixtures.
aav-capsid-receptor-bench-prep (Step 0, CPU, one-time) PDB IDs → 4-chain OF3 inputs + crystals + manifest
── aav-capsid-receptor-bench workflow ──
│
▼
openfold3-batch (Step 1, GPU array) one task per PDB, parallel on L4 / L40s spot
│
▼
aav-complex-metrics (Step 2, CPU) DockQ + TM-score + ipTM + pLDDT, aggregated
For per-variant screening (the production use case), prep is amortized across all variants for a given receptor panel. Each new variant rerun is just the workflow on the warm cluster — ~13 min wall, ~$0.21.
Cost discipline
| Stage | Instance | Wall-clock (warm) | Cost |
|---|---|---|---|
| Prep (one-time, all 4 anchor PDBs) | CPU spot | ~3 min | ~$0.01 |
| Fold — monomer (VP3 + AAVR) | g6.4xlarge spot (1× L4) | 13m 34s | $0.132 |
| Fold — trimer (3× VP3 + AAVR) | g6e.4xlarge spot (1× L40s) | 12m 53s | $0.212 |
| Score (DockQ + TM + ipTM) | CPU spot | ~1 min | ~$0.02 |
| Per-variant end-to-end (trimer) | ~15 min | ~$0.24 |
ColabFold MSAs cache to EFS after the first fetch — first-target submit pays ~3 minutes for MSA download; subsequent variants reuse the cached MSAs in seconds. This is what production AAV-screening teams do anyway.
Per-variant economics for a 5K-variant in silico screen at trimer fidelity: 5,000 variants × $0.24 ≈ $1,200 in AWS spot, ~14 hours wall on 100 concurrent L40s nodes. Karpenter scales the cluster up to that node count automatically; the budget cap is your knob.
What this enables for AAV variant screening
Use OpenFold3 on Clusterra confidently for: - VP3 fold-quality scoring of designed variants (per-chain pTM, per-residue pLDDT, TM vs parent) - Trimer-assembly check (intra-trimer ipTM as a "does this variant disrupt the 3-fold spike?" signal) - Per-receptor fold-confidence baseline (AAVR PKD2, LY6A, Lamp1, etc., $0.05 per receptor on L4) - Cross-variant ranking by chain-fold + trimer-assembly confidence at scale
Series-A entry scale — 5K variants × 5 receptors (25K co-folds): at $0.21 per trimer co-fold, that's ~$5,250 in AWS spot, ~12 hours overnight on 500 concurrent L40s spot nodes (Karpenter handles the scale-out; you set the budget cap). MSA cost is amortized: first-fetch ColabFold MSA per new receptor sequence adds ~3 min × 5 receptors ≈ 15 min one-time on a fresh cluster, with negligible compute cost; every subsequent variant submit reuses the cached MSAs. What matters to the comp-bio team is that "submit Friday, decide Monday" becomes a routine cadence at single-comp-scientist economics, in the customer's own AWS account, with measured per-pair confidence scores instead of vendor black-box rankings.
Anchor PDB panel (verified)
The prep template ships with this curated catalog of 4 AAV capsid-receptor PDB anchors:
| PDB | Year | Resolution | Composition | Role |
|---|---|---|---|---|
| 6IHB | 2018 | 2.84 Å | AAV2 + AAVR PKD2 | Reference set (this benchmark) |
| 6NZ0 | 2019 | 2.40 Å | AAV2 + AAVR PKD1+2 | Reference set (PKD1+2 fragment) |
| 7UD4 | 2022 | 2.24 Å | AAV-PHP.eB (engineered AAV9) | Engineered-capsid chain-fold reference |
| 9J6Z | 2024-08 | 3.02 Å | AAV8 + Carboxypeptidase D (human CPD) | Post-OF3-training-cutoff generalization anchor |
The n=4 panel is the v1 deliverable in this template — n=10 covering AAV2/8/9 × AAVR/glycan/Lamp1 is on the roadmap.
The full APPRAISE-AAV Pack (binding-pose recovery)
A defensible AAV variant-screening pipeline needs more than chain folds — it needs the receptor binding pose. The published industrial workflow is the Gradinaru-lab APPRAISE method (Ding X, Chen Y, Sullivan EE, Shay TF, Gradinaru V — Fast, accurate ranking of engineered proteins by receptor binding propensity using structural modeling, Molecular Therapy 2024), a multi-stage decomposition that uses AlphaFold-Multimer to compete surface-exposed peptides from distinct AAV variants for a candidate receptor:
| Stage | Engine | What it does | Status on Clusterra |
|---|---|---|---|
| 1 | OpenFold3 / AlphaFold-Multimer | VP3 trimer fold | Shipped (this benchmark: openfold3-batch driven by aav-capsid-receptor-bench) |
| 2 | OpenFold3 / AlphaFold-Multimer | Peptide-receptor fold (binding-surface peptide + receptor PKD) | Shipped (same openfold3-batch template, smaller input) |
| 3 | Python + gemmi | Anchor-residue alignment of (2) into (1), graft receptor into the trimer | Roadmap (Q3 2026) — coupled to stage 4 |
| 4 | RosettaRemodel | Loop refinement under icosahedral 3-fold symmetry — the load-bearing step for binding-pose recovery | Roadmap (Q3 2026) — rosetta-remodel-icosahedral template |
| 5 | GROMACS (optional) | MD relax with capsid backbone restrained | Available (existing GROMACS template) |
Stages 1 and 2 are the open-weight compute substrate this benchmark measures end-to-end. Stages 3 and 4 are the binding-pose recovery layer — Gradinaru et al. did not trust a one-shot diffusion-based co-fold, and neither do we; the right industrial pipeline composes OpenFold3 with a Rosetta refinement step. Those stages are on the roadmap, packaged with the AAV-specific symmetry constraint defaults from the Gradinaru paper.
Honest caveats
- n=2 PDB anchors measured (6IHB AAV2-reference + 9J6Z AAV8 post-cutoff). The prep template ships with the full n=4 panel and the workflow runs it end-to-end; this case study reports headline numbers on the two anchors with the highest credibility weight — the canonical AAV2-AAVR reference and the post-training-cutoff generalization test. Adding 6NZ0 (AAV2-AAVR PKD1+2) and 7UD4 (AAV-PHP.eB engineered capsid) plus a fifth engineered-variant anchor is the next bench expansion.
- Single-call diffusion-based co-fold is not the right tool for AAV-receptor binding-pose prediction. OpenFold3 returns interface ipTM ~0.10 on AAV2-AAVR when run as a single co-fold without a Rosetta refinement step — this matches the published Gradinaru-lab finding and the CASP16 consensus on viral spike-receptor results. Headline DockQ in this benchmark is reported as a hint, not a calibrated binding-pose metric. The Pack architecture above (stages 1–4) is the workflow that recovers the binding mode; this benchmark measures the open-weight fold portion (stages 1 + 2) end-to-end and the scoring substrate, with stages 3 + 4 on the Q3 2026 roadmap.
- MSA pipeline. We use the ColabFold MSA server defaults; MSA-augmented inference may move chain pTM and ipTM up or down. Deeper MSAs (jackhmmer / MMseqs2 with custom databases) are not yet benchmarked here.
- AWS spot variability. Per-job costs reported above are AWS us-east-1 spot at May 28 2026 prices (g6.4xlarge L4 spot ~$0.50–0.70/hr; g6e.4xlarge L40s spot ~$0.90–1.20/hr). Karpenter spin-up adds 3–4 minutes on a cold cluster; not included in the per-fold wall-clock above.
Reproduce this on your AWS
From the Clusterra console:
- First time only — open the aav-capsid-receptor-bench-prep template, leave defaults (n=4 anchor panel: 6IHB, 6NZ0, 7UD4, 9J6Z), submit. ~3 minutes on a CPU node; fixtures staged under
/mnt/efs/bench/aav-capsid-receptor/. Idempotent — safe to rerun; subsequent runs skip already-staged anchors. - Open the aav-capsid-receptor-bench workflow, leave defaults, submit. The workflow expands as two Slurm jobs (
openfold3-batcharray fold +aav-complex-metricsscoring) with anafterokdependency. Karpenter provisions L4 or L40s spot nodes — you set the budget cap, the platform handles instance selection. - When the workflow completes, your per-complex DockQ, TM-score, ipTM, and pLDDT are in
outputs.jsonandper_complex.csvon shared storage.
Total hands-on time: under 2 minutes to submit each step. Wall-clock: ~20 minutes for the full n=4 panel including Karpenter spin-up on a cold cluster.
All software is open-weight and open-source. OpenFold3 0.4.0: code from aqlaboratory/openfold-3 (Apache 2.0), weights from OpenFold/OpenFold3 on HuggingFace (Apache 2.0), Docker image openfoldconsortium/openfold3:0.4.0 (maintained by the Open Molecular Software Foundation). DockQ 2.1.3 is BSD. gemmi is MPL-2. No license fees, no per-token API, no managed-service surcharge — everything runs in your own AWS account, sequence inputs and predicted structures never leave your account boundary.
References. Meyer NL et al., Adeno-associated virus 2 bound to its cellular receptor AAVR, Nat Microbiology 2019 (PMC6561701) — PDB 6IHB, the primary benchmark anchor. PDB 9J6Z — Structure of AAV8 in complex with its receptor, cryo-EM 3.02 Å, deposited August 17 2024; receptor is human Carboxypeptidase D (UniProt O75976, gene CPD). The post-OpenFold3-training-cutoff generalization-test anchor. Ding X, Chen Y, Sullivan EE, Shay TF, Gradinaru V — APPRAISE: Fast, accurate ranking of engineered proteins by receptor binding propensity using structural modeling, Molecular Therapy 2024 — the published industrial AAV-receptor multi-stage workflow this Pack architecture is anchored on.