Boltz Partial Diffusion¶
Runs partial diffusion-based structure refinement/generation using a Boltz YAML specification and reference structures. Supports single-chain and multimer systems, selectively fixing specified chains and optionally saving diffusion trajectories. Can also compute affinity if requested in properties.

Usage¶
Use this node after assembling a valid Boltz YAML and auxiliary files (MSAs/templates) with the Boltz YAML Combiner. Provide one or more reference structures in PDB format to anchor partial diffusion. Optionally set which chains remain fixed, the fraction of the diffusion process to run, sampling settings, and whether to save trajectories. If affinity is included in properties, enable the related options to obtain affinity predictions.
Inputs¶
| Field | Required | Type | Description | Example |
|---|---|---|---|---|
| boltz_yaml | True | BOLTZ_YAML | Boltz YAML configuration describing sequences (and optionally constraints/templates/properties). | {'version': 1, 'sequences': [{'protein': {'id': 'A', 'sequence': 'MSEQN...', '_sequence_name': 'protA', 'msa': 'msa_1.a3m'}}], 'properties': [{'affinity': {'binder': 'L'}}]} |
| boltz_files | True | BOLTZ_FILES | Auxiliary files referenced by the YAML (e.g., MSA A3M files, template PDB/CIF files) as a mapping from filename to file content. | {'msa_1.a3m': ' |
| reference_structures | True | PDB | Reference structure(s) in PDB format used to anchor the partial diffusion. Provide as a mapping name -> PDB text. | {'ref_model.pdb': ' |
| seed | True | INT | Base random seed. When multiple sequence entries are present, the seed is incremented per entry. | 42 |
| partial_diffusion_fraction | False | FLOAT | Fraction of the diffusion process to apply (0.0 = no diffusion, 1.0 = full diffusion). | 0.25 |
| save_trajectory | False | BOOLEAN | If true, saves the diffusion trajectory (intermediate structures) for analysis. | True |
| fixed_chains | False | STRING | Comma-separated chain IDs to keep fixed during partial diffusion. Must match chain IDs defined in YAML. | A,B |
| recycling_steps | False | INT | Number of recycling iterations during prediction/refinement. | 3 |
| sampling_steps | False | INT | Number of diffusion sampling steps. | 200 |
| diffusion_samples | False | INT | Number of independent diffusion samples to generate per input. | 1 |
| max_parallel_samples | False | INT | Maximum number of samples to run in parallel. | 5 |
| step_scale | False | FLOAT | Diffusion step scale (temperature-like parameter) controlling exploration. | 1.638 |
| output_format | False | ['pdb', 'mmcif'] | Output structure format. | pdb |
| num_workers | False | INT | Number of parallel workers (0 disables multiprocessing). | 0 |
| use_potentials | False | BOOLEAN | Use inference-time potentials to guide sampling (recommended for partial diffusion). | True |
| write_full_pae | False | BOOLEAN | If true, writes the full Predicted Aligned Error matrix to confidence output. | False |
| write_full_pde | False | BOOLEAN | If true, writes the full Predicted Distance Error matrix to confidence output. | False |
| max_msa_seqs | False | INT | Maximum number of MSA sequences to use. | 8192 |
| subsample_msa | False | BOOLEAN | Whether to subsample MSA sequences. | False |
| num_subsampled_msa | False | INT | Number of MSA sequences to keep when subsampling is enabled. | 1024 |
| affinity_mw_correction | False | BOOLEAN | Apply molecular weight correction to affinity predictions (effective when affinity property is requested). | False |
| sampling_steps_affinity | False | INT | Sampling steps dedicated to affinity prediction (when requested). | 200 |
| diffusion_samples_affinity | False | INT | Number of diffusion samples for affinity prediction (when requested). | 5 |
Outputs¶
| Field | Type | Description | Example |
|---|---|---|---|
| structures.pdb | PDB | Predicted/refined structure files as a mapping from name to PDB text. May include multiple ranked structures per input. | {'sequence_0_ranked_0.pdb': ' |
| confidence.json | JSON | Confidence metrics per output (e.g., pLDDT/PAE/PDE), keyed by name. | {'sequence_0_confidence.json': '{"plddt": [...], "pae": [...]}'} |
| trajectory.pdb | PDB | Saved diffusion trajectory PDB(s) when trajectory saving is enabled; otherwise may be empty. | {'sequence_0_trajectory.pdb': ' |
| affinity.json | JSON | Affinity prediction outputs when affinity is requested in properties; empty otherwise. | {'sequence_0_affinity.json': '{"KD": 1.2, "units": "uM"}'} |
Important Notes¶
- Reference structures required: You must provide at least one PDB reference structure; this anchors partial diffusion.
- Chain IDs must match: Any fixed_chains must correspond to chain IDs defined in the YAML sequences.
- Trajectory output: trajectory.pdb is only populated if save_trajectory is true.
- Affinity outputs: affinity.json is produced only when the YAML includes an affinity property and the system is set up for affinity prediction; including a ligand sequence is typically required.
- Multimer handling: The node automatically detects multimer vs. single-chain mode based on the number of unique chain IDs in the YAML.
- Per-sequence batching: If multiple sequence entries exist in the YAML, the node processes them sequentially and aggregates outputs, incrementing the seed for each.
- MSA/templates linkage: Ensure boltz_files contains all filenames referenced in boltz_yaml (e.g., msa_.a3m, template_.pdb).
- Partial diffusion fraction: Lower values keep the structure closer to the reference; higher values allow more deviation.
Troubleshooting¶
- Error: 'Fixed chains {...} not found in YAML sequences': Check fixed_chains values and ensure all chain IDs are defined in the YAML sequences.
- Empty trajectory output: Set save_trajectory to true to record intermediate diffusion states.
- Missing affinity output: Verify that boltz_yaml includes an affinity property and that the system includes a ligand sequence; enable affinity-related options if needed.
- Invalid YAML or files: Ensure boltz_yaml is a dictionary with at least one sequence and boltz_files is a dictionary mapping filenames to contents; all referenced files must exist in boltz_files.
- PDB parsing issues: Confirm reference_structures is a dict with at least one valid PDB text value.
- Unexpected few outputs for multimers: Verify that all chains and MSAs are correctly specified; check that fixed_chains does not unintentionally constrain all chains.
- Performance or timeout concerns: Reduce diffusion_samples, sampling_steps, or max_parallel_samples; consider lowering save_trajectory or enabling multiprocessing via num_workers.