Boltz Predict¶
Runs a Boltz structure prediction job from a prepared Boltz YAML configuration and auxiliary files. It supports tuning diffusion and sampling parameters, optional MSA handling controls, and can optionally compute ligand-binding affinity if requested in properties.

Usage¶
Use after assembling your system with Boltz Sequence/Constraint/Template/Property nodes and combining them via Boltz List Combiner and Boltz YAML Combiner. Connect the resulting boltz_yaml and boltz_files here, set your seed and optional parameters, then run to obtain predicted structures (as PDB or mmCIF), model confidence metrics, and affinity results if requested.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| boltz_yaml | True | BOLTZ_YAML | Complete Boltz YAML configuration describing sequences, optional constraints, templates, and properties. Must include at least one sequence. | { "version": 1, "sequences": [ { "protein": { "id": "A", "sequence": "..." } } ], "properties": [ { "affinity": { "binder": "L" } } ] } |
| boltz_files | True | BOLTZ_FILES | Auxiliary file map referenced by the YAML (e.g., MSA .a3m and template .pdb files). Keys are filenames referenced in YAML; values are file contents. | { "msa_1.a3m": "...", "template_1.pdb": "...PDB text..." } |
| seed | True | INT | Base random seed for reproducibility. | 42 |
| recycling_steps | False | INT | Number of recycling iterations during structure refinement. | 3 |
| sampling_steps | False | INT | Number of sampling steps for the diffusion process. | 200 |
| diffusion_samples | False | INT | How many independent diffusion samples to generate. | 1 |
| max_parallel_samples | False | INT | Maximum number of samples processed in parallel (resource-dependent). | 5 |
| step_scale | False | FLOAT | Diffusion step scale (temperature-like factor). | 1.638 |
| output_format | False | STRING | Structure output format: pdb or mmcif. | pdb |
| num_workers | False | INT | Number of worker processes (0 disables multiprocessing). | 0 |
| max_msa_seqs | False | INT | Maximum number of MSA sequences to use. | 8192 |
| subsample_msa | False | BOOLEAN | Whether to subsample MSA sequences. | false |
| num_subsampled_msa | False | INT | Number of MSA sequences after subsampling (if enabled). | 1024 |
| use_potentials | False | BOOLEAN | Enable inference-time potentials for improved quality (may be slower). | false |
| write_full_pae | False | BOOLEAN | Write the full Predicted Aligned Error (PAE) matrix to outputs. | false |
| write_full_pde | False | BOOLEAN | Write the full Predicted Distance Error (PDE) matrix to outputs. | false |
| affinity_mw_correction | False | BOOLEAN | Apply molecular weight correction to affinity estimates (used only if affinity is requested in properties). | false |
| sampling_steps_affinity | False | INT | Sampling steps for affinity estimation (used only if affinity is requested). | 200 |
| diffusion_samples_affinity | False | INT | Number of diffusion samples for affinity estimation (used only if affinity is requested). | 5 |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| structures.pdb | PDB | Dictionary mapping structure filenames to PDB (or mmCIF) content for ranked predictions. | { "rank1.pdb": "...PDB text...", "rank2.pdb": "..." } |
| confidence.json | JSON | Dictionary of confidence-related outputs per sample (e.g., pLDDT/PAE data or filenames to JSON). | { "rank1_confidence.json": "{...}", "rank2_confidence.json": "{...}" } |
| affinity.json | JSON | Dictionary of affinity results when affinity is requested in properties; empty if not applicable. | { "rank1_affinity.json": "{...}" } |
Important Notes¶
- Affinity requires a ligand: If properties include affinity, at least one ligand sequence must be present; otherwise the node raises an error.
- YAML and files must align: Any filenames referenced in the YAML (e.g., MSA or template entries) must exist in boltz_files with matching keys.
- Output format: Structures are emitted in the selected format (pdb or mmcif), but the field name remains structures.pdb; use the contents accordingly.
- Parallelism and resources: Increasing max_parallel_samples and num_workers increases resource usage and may exhaust memory/compute on limited systems.
- MSA controls: You can cap or subsample MSAs via max_msa_seqs, subsample_msa, and num_subsampled_msa to manage speed and memory.
- Reproducibility: The seed is applied, but multiple diffusion samples and parallelism can still introduce variability across runs.
Troubleshooting¶
- Error: 'Boltz YAML must contain at least one sequence': Ensure you connected a valid boltz_yaml from Boltz YAML Combiner that includes the sequences list.
- Error: 'Affinity prediction requires at least one ligand sequence': Remove affinity properties or add a ligand sequence and binder chain.
- Empty or missing outputs: Verify sampling_steps and diffusion_samples are reasonable; check that boltz_files include required MSA/template files referenced by YAML.
- High memory or long runtimes: Reduce max_parallel_samples, num_workers, sampling_steps, or MSA sizes; disable use_potentials if not needed.
- Invalid file references: If you see key errors for MSA/template files, confirm the YAML references filenames that exist as keys in boltz_files.
- Unexpected structure format: If you chose mmcif output but expected PDB, set output_format to "pdb" or adapt downstream consumers to handle mmCIF.