Skip to content

Boltz Predict

Runs Boltz structure prediction from a prepared Boltz YAML configuration and its auxiliary files. Produces ranked structure files and confidence metrics, and optionally ligand-binding affinity predictions when requested in properties. Seeds are offset per sequence to ensure varied sampling across multiple sequences in one YAML.
Preview

Usage

Use this node after building sequences, constraints, templates, and properties and combining them into a valid Boltz YAML with its auxiliary files. Connect the YAML and files outputs from the Boltz YAML Combiner, set prediction parameters (sampling, recycling, step scale), and optionally enable affinity-related options if affinity is requested. The outputs include ranked structure files, confidence JSON, and optional affinity JSON.

Inputs

FieldRequiredTypeDescriptionExample
boltz_yamlTrueBOLTZ_YAMLThe Boltz YAML configuration generated by the Boltz YAML Combiner. Must include at least one sequence and adhere to Boltz schema.{'version': 1, 'sequences': [{'protein': {'id': 'A', 'sequence': 'AAAA', 'msa': 'empty'}}]}
boltz_filesTrueBOLTZ_FILESAuxiliary files referenced by the YAML (e.g., A3M MSAs, template PDBs). Keys are filenames and values are file contents.{'msa_1.a3m': '>A\nAAAA', 'template_1.pdb': 'ATOM ...'}
seedTrueINTBase RNG seed for reproducibility. Each sequence in the YAML uses an offset of this seed to diversify samples.42
recycling_stepsFalseINTNumber of recycling iterations during prediction.3
sampling_stepsFalseINTNumber of diffusion sampling steps for structure generation.200
diffusion_samplesFalseINTHow many independent diffusion samples to generate per run.1
max_parallel_samplesFalseINTDegree of parallelism for sampling (hardware dependent).5
step_scaleFalseFLOATDiffusion step scale (temperature-like control). Higher values generally increase exploration.1.638
output_formatFalseSTRINGDesired structure file format for outputs. Options: pdb or mmcif.pdb
num_workersFalseINTNumber of worker processes. Set to 0 to disable multiprocessing.0
max_msa_seqsFalseINTMaximum number of MSA sequences to use.8192
subsample_msaFalseBOOLEANWhether to subsample MSA sequences.False
num_subsampled_msaFalseINTNumber of MSA sequences to use if subsampling is enabled.1024
use_potentialsFalseBOOLEANEnable inference-time potentials to improve quality at the cost of speed.False
write_full_paeFalseBOOLEANIf true, outputs include full Predicted Aligned Error (PAE) matrices.False
write_full_pdeFalseBOOLEANIf true, outputs include full Predicted Distance Error (PDE) matrices.False
affinity_mw_correctionFalseBOOLEANApplies molecular weight correction when computing affinity (only used if affinity property is requested).False
sampling_steps_affinityFalseINTSampling steps specifically for affinity prediction (if requested).200
diffusion_samples_affinityFalseINTNumber of diffusion samples specifically for affinity prediction (if requested).5

Outputs

FieldTypeDescriptionExample
structures.pdbPDBDictionary mapping ranked structure filenames to content. Filenames are prefixed by sequence names for multi-sequence YAMLs.{'sequence_0_rank_001.pdb': 'ATOM ...', 'sequence_0_rank_002.pdb': 'ATOM ...'}
confidence.jsonJSONDictionary of confidence-related outputs (e.g., pLDDT/PAE metrics) keyed by filenames.{'sequence_0_confidence.json': '{"pLDDT": [...], "PAE": [...]}'}
affinity.jsonJSONDictionary of affinity prediction outputs keyed by filenames. Only populated if an affinity property is present and at least one ligand exists in sequences.{'sequence_0_affinity.json': '{"Kd_pred": 1.2e-8, "details": {...}}'}

Important Notes

  • YAML validity: The YAML must contain at least one sequence and follow the Boltz schema used by the Boltz YAML Combiner.
  • Affinity requirements: Affinity prediction is only allowed if the YAML includes an affinity property and at least one ligand sequence; otherwise the node will raise an error.
  • Multiple sequences: If YAML contains multiple sequences, the node runs them with seed offsets (seed + index) and prefixes output filenames with the detected sequence names.
  • Output format: The output structure format is controlled by output_format (pdb or mmcif); filenames reflect the chosen format.
  • Performance: max_parallel_samples, num_workers, and use_potentials significantly affect runtime and resource usage.
  • MSA handling: Ensure any MSA or template filenames referenced by the YAML exist in boltz_files; large MSAs may be truncated via max_msa_seqs or sampled if subsample_msa is enabled.

Troubleshooting

  • Error: Boltz YAML must contain at least one sequence: Check that boltz_yaml.sequences is present and non-empty from the YAML Combiner.
  • Error: Boltz files must be a dictionary: Provide the auxiliary files mapping as a dict of filename -> content.
  • Error: Affinity prediction requires at least one ligand sequence: Add a ligand to sequences and ensure an affinity property is included, or disable affinity-related options.
  • Empty or missing outputs: Verify sampling_steps and diffusion_samples are reasonable, and confirm that template/MSA filenames in YAML match keys in boltz_files.
  • Unexpected runtime or slow performance: Reduce diffusion_samples or sampling_steps, lower max_parallel_samples, or disable use_potentials to speed up.
  • Invalid parameter ranges: Ensure integers and floats are within the documented bounds (e.g., recycling_steps >= 1, step_scale within allowed range).