Boltz Predict¶

Runs Boltz structure prediction from a prepared Boltz YAML configuration and its auxiliary files. Produces ranked structure files and confidence metrics, and optionally ligand-binding affinity predictions when requested in properties. Seeds are offset per sequence to ensure varied sampling across multiple sequences in one YAML.

Usage¶

Use this node after building sequences, constraints, templates, and properties and combining them into a valid Boltz YAML with its auxiliary files. Connect the YAML and files outputs from the Boltz YAML Combiner, set prediction parameters (sampling, recycling, step scale), and optionally enable affinity-related options if affinity is requested. The outputs include ranked structure files, confidence JSON, and optional affinity JSON.

Inputs¶

Field	Required	Type	Description	Example
boltz_yaml	True	BOLTZ_YAML	The Boltz YAML configuration generated by the Boltz YAML Combiner. Must include at least one sequence and adhere to Boltz schema.	{'version': 1, 'sequences': [{'protein': {'id': 'A', 'sequence': 'AAAA', 'msa': 'empty'}}]}
boltz_files	True	BOLTZ_FILES	Auxiliary files referenced by the YAML (e.g., A3M MSAs, template PDBs). Keys are filenames and values are file contents.	{'msa_1.a3m': '>A\nAAAA', 'template_1.pdb': 'ATOM ...'}
seed	True	INT	Base RNG seed for reproducibility. Each sequence in the YAML uses an offset of this seed to diversify samples.	42
recycling_steps	False	INT	Number of recycling iterations during prediction.	3
sampling_steps	False	INT	Number of diffusion sampling steps for structure generation.	200
diffusion_samples	False	INT	How many independent diffusion samples to generate per run.	1
max_parallel_samples	False	INT	Degree of parallelism for sampling (hardware dependent).	5
step_scale	False	FLOAT	Diffusion step scale (temperature-like control). Higher values generally increase exploration.	1.638
output_format	False	STRING	Desired structure file format for outputs. Options: pdb or mmcif.	pdb
num_workers	False	INT	Number of worker processes. Set to 0 to disable multiprocessing.	0
max_msa_seqs	False	INT	Maximum number of MSA sequences to use.	8192
subsample_msa	False	BOOLEAN	Whether to subsample MSA sequences.	False
num_subsampled_msa	False	INT	Number of MSA sequences to use if subsampling is enabled.	1024
use_potentials	False	BOOLEAN	Enable inference-time potentials to improve quality at the cost of speed.	False
write_full_pae	False	BOOLEAN	If true, outputs include full Predicted Aligned Error (PAE) matrices.	False
write_full_pde	False	BOOLEAN	If true, outputs include full Predicted Distance Error (PDE) matrices.	False
affinity_mw_correction	False	BOOLEAN	Applies molecular weight correction when computing affinity (only used if affinity property is requested).	False
sampling_steps_affinity	False	INT	Sampling steps specifically for affinity prediction (if requested).	200
diffusion_samples_affinity	False	INT	Number of diffusion samples specifically for affinity prediction (if requested).	5

Outputs¶

Field	Type	Description	Example
structures.pdb	PDB	Dictionary mapping ranked structure filenames to content. Filenames are prefixed by sequence names for multi-sequence YAMLs.	{'sequence_0_rank_001.pdb': 'ATOM ...', 'sequence_0_rank_002.pdb': 'ATOM ...'}
confidence.json	JSON	Dictionary of confidence-related outputs (e.g., pLDDT/PAE metrics) keyed by filenames.	{'sequence_0_confidence.json': '{"pLDDT": [...], "PAE": [...]}'}
affinity.json	JSON	Dictionary of affinity prediction outputs keyed by filenames. Only populated if an affinity property is present and at least one ligand exists in sequences.	{'sequence_0_affinity.json': '{"Kd_pred": 1.2e-8, "details": {...}}'}

Important Notes¶

YAML validity: The YAML must contain at least one sequence and follow the Boltz schema used by the Boltz YAML Combiner.
Affinity requirements: Affinity prediction is only allowed if the YAML includes an affinity property and at least one ligand sequence; otherwise the node will raise an error.
Multiple sequences: If YAML contains multiple sequences, the node runs them with seed offsets (seed + index) and prefixes output filenames with the detected sequence names.
Output format: The output structure format is controlled by output_format (pdb or mmcif); filenames reflect the chosen format.
Performance: max_parallel_samples, num_workers, and use_potentials significantly affect runtime and resource usage.
MSA handling: Ensure any MSA or template filenames referenced by the YAML exist in boltz_files; large MSAs may be truncated via max_msa_seqs or sampled if subsample_msa is enabled.

Troubleshooting¶

Error: Boltz YAML must contain at least one sequence: Check that boltz_yaml.sequences is present and non-empty from the YAML Combiner.
Error: Boltz files must be a dictionary: Provide the auxiliary files mapping as a dict of filename -> content.
Error: Affinity prediction requires at least one ligand sequence: Add a ligand to sequences and ensure an affinity property is included, or disable affinity-related options.
Empty or missing outputs: Verify sampling_steps and diffusion_samples are reasonable, and confirm that template/MSA filenames in YAML match keys in boltz_files.
Unexpected runtime or slow performance: Reduce diffusion_samples or sampling_steps, lower max_parallel_samples, or disable use_potentials to speed up.
Invalid parameter ranges: Ensure integers and floats are within the documented bounds (e.g., recycling_steps >= 1, step_scale within allowed range).